normalize: Normalize the microbiome dataset prior to perform the...

Description Usage Arguments Author(s) References Examples

View source: R/normalize.R

Description

A critical aspect when working with microbiome data is to achieve a proper normalization to the retrieved counts, thus overpassing the variability in terms of sequencing efforts or coverage. There are several ways to do normalization, and we have implemented three well-known methods whose choice will depend on the research question investigated and researcher's preference. Optionally, if you don't feel comfortable with normalization methods implemented in this package or if your data are already normalized, you have the option of perform no normalization on your data (method=0).

Usage

1
normalize(numz = 0.5, method = 1)

Arguments

numz

The zero control across the features (columns) to keep after the normalization step. If you have 20 samples and declare a numz = 0.5 (default), the algorithm will remove those categories with equal and more than 10 zeros as counts. Although the permutation test deals very well with zeros, we recommend setting a restricted value in order to improve the statistics for the biomarker discovery (i.e numz < 0.3).

method

Describes the normalization method to be used. We implemented three different strategies to normalize the microbiome data: (1) corresponds with the relative proportion of counts to the features. After retrieve the relative abundance for every feature in very sample the normalization process generate the number of reads corresponding to the features per million reads; (2) corresponds with normalization method described by Anders & Huber (2010), which uses a size factor to correct differences in sequencing coverage; and (3) corresponds with normalization method described by Paulson et al., (2013), which refers to the Cumulative sum scaling normalization using a "l" parameter that determines the percentile of features to calculate the sum scaling factor. If the user decides not to perform normalization, it must declare (0) as method.

Author(s)

Alfonso Benitez-Paez

References

Benitez-Paez A. & Sanz Y. (2015). Permubiome: an R package to perform permutation based test for biomarker discovery in microbiome analyses. In press.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
## The function is currently defined as
function (numz = 0.5, method = 1) 
{
    load("permubiome.RData")
    df_norm <- df
    if (method == 1) {
        y <- array(, nrow(df_norm))
        for (j in 1:nrow(df_norm)) {
            y[j] <- sum(df_norm[j, 3:ncol(df_norm)])
        }
        for (l in 3:ncol(df_norm)) {
            for (m in 1:nrow(df_norm)) {
                df_norm[m, l] <- round((df_norm[m, l]/y[m]) * 
                  1e+06, digits = 0)
            }
        }
        for (i in ncol(df_norm):3) {
            if (sum(df_norm[, i] == "0") >= (nrow(df_norm) * 
                numz)) {
                df_norm[, i] <- NULL
            }
        }
    }
    else if (method == 2) {
        for (i in ncol(df_norm):3) {
            if (sum(df_norm[, i] == 0) >= (nrow(df_norm) * numz)) {
                df_norm[, i] <- NULL
            }
        }
        sfactor_matrix <- matrix(, ncol = ncol(df_norm) - 2, 
            nrow = nrow(df_norm))
        y <- array(, nrow(df_norm))
        for (m in 1:nrow(df_norm)) {
            for (l in 3:ncol(df_norm)) {
                sfactor_matrix[m, l - 2] <- signif((df_norm[m, 
                  l]/mean(df_norm[, l])), digits = 3)
            }
            y[m] <- median(sfactor_matrix[m, 1:ncol(sfactor_matrix)])
        }
        for (a in 3:ncol(df_norm)) {
            for (b in 1:nrow(df_norm)) {
                df_norm[b, a] <- round((df_norm[b, a] * y[b]), 
                  digits = 0)
            }
        }
    }
    else if (method == 3) {
        for (i in ncol(df_norm):3) {
            if (sum(df_norm[, i] == 0) >= (nrow(df_norm) * numz)) {
                df_norm[, i] <- NULL
            }
        }
        quantil <- as.numeric(readline("Type the 'l' parameter (percentile between 0.01  and 0.99) 
        to perform paulson's normalization (0.95 as default): "))
        if (is.numeric(quantil) != TRUE & quantil > 1) {
            quantile <- 0.95
        }
        y <- array(, nrow(df_norm))
        sfactor <- array(, nrow(df_norm))
        for (m in 1:nrow(df_norm)) {
            x <- array(, ncol(df_norm) - 2)
            for (l in 3:ncol(df_norm)) {
                if (df_norm[m, l] <= quantile(df_norm[m, 3:ncol(df_norm)], 
                  quantil, na.rm = T)) {
                  x[l - 2] <- df_norm[m, l]
                }
                else {
                  x[l - 2] <- NA
                }
                sfactor[m] <- sum(x, na.rm = T)
            }
        }
        for (a in 3:ncol(df_norm)) {
            for (b in 1:nrow(df_norm)) {
                df_norm[b, a] <- round(((df_norm[b, a]/median(sfactor)) * 
                  1e+06), digits = 0)
            }
        }
    }
    else if (method == 0) {
        head(df_norm)
        print(paste("Your dataset was not normalized according to method option: 0"))
    }
    else {
        print(paste("Select and appropiate method for normalization: 1 ('proportions'), 
        2 ('anders'), 3('paulson'), or 0 ('none')"))
    }
    print(paste("Your normalized data now contains:", ncol(df_norm) - 
        2, "normalize categories ready to analize"))
    save(df_norm, file = "permubiome.RData")
  }

permubiome documentation built on May 29, 2017, 11:27 a.m.