permutation: Permutation-based non-parametric analysis to infer...

Description Usage Arguments Author(s) References Examples

View source: R/permutation.R

Description

This function performs multiple simulations for every feature present in your dataset. All observations are randomly distributed between groups and median differences are calculated for all simulations. Differences calculated from simulations are fitted to the normal distribution and the probability of the observed difference is then calculated. A multiple testing correction is done to accurately discover the biomarker associated with your dataset classes.

Usage

1
permutation(nperm = 1000, write.output = TRUE)

Arguments

nperm

The number of permutations to be executed during the analysis (1000 as default). The higher the number of permutations, the more precise will be the p-value returned and the function becomes more time-consuming. We recommend to use 1000 as the minimum and using values higher than 100000 produces no additional information.

write.output

When "T" (as default), a sorted output file is generated and stored in the working directory. Control for the number of features to be present in the output is allowed with the "all" and "select" parameters prompted.

Author(s)

Alfonso Benitez-Paez

References

Benitez-Paez A. & Sanz Y. (2015). Permubiome: an R package to perform permutation based test for biomarker discovery in microbiome analyses. In press.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
## The function is currently defined as
function (nperm = 1000, write.output = TRUE) 
{
    Class<-NULL
    load("permubiome.RData")
    df_norm <- df_norm
    classes <- levels(df_norm$Class)
    group1 <- subset(df_norm, Class == classes[1])
    group2 <- subset(df_norm, Class == classes[2])
    categories <- colnames(df_norm)
    size1 <- nrow(group1)
    size2 <- nrow(group2)
    size <- size1 + size2
    pvalue_matrix <- matrix(, nrow = ncol(df_norm) - 2, ncol = 5, 
        byrow = T)
    colnames(pvalue_matrix) <- c("Category", paste(classes[1], 
        "(median)"), paste(classes[2], "(median)"), "p.value", 
        "p.adjust (fdr)")
    for (i in 3:(ncol(df_norm))) {
        category <- categories[i]
        diff <- median(group1[, i]) - median(group2[, i])
        x <- c(group1[, i], group2[, i])
        y <- array(, nperm)
        for (j in 1:nperm) {
            set <- sample(size, size2, replace = FALSE)
            diff_iter <- median(x[set]) - median(x[-set])
            y[j] <- diff_iter
            ref_score <- (diff - mean(y))/sd(y)
        }
        if (ref_score > 0) {
            pvalue.i <- 1 - pnorm(ref_score)
        }
        else {
            pvalue.i <- pnorm(ref_score)
        }
        padjust.i <- p.adjust(pvalue.i, method = "fdr", n = nrow(pvalue_matrix))
        if (pvalue.i <= 1) {
            print(paste(category, signif(pvalue.i, 4), signif(padjust.i, 
                4)))
        }
        else {
            print(paste(category, "1.000", signif(padjust.i, 
                4)))
        }
        if (pvalue.i != 0) {
            pvalue_matrix[(i - 2), 1] <- category
        }
        if (pvalue.i != 0) {
            pvalue_matrix[(i - 2), 2] <- round(median(group1[, 
                i]), digits = 0)
        }
        if (pvalue.i != 0) {
            pvalue_matrix[(i - 2), 3] <- round(median(group2[, 
                i]), digits = 0)
        }
        if (pvalue.i <= 1) {
            pvalue_matrix[(i - 2), 4] <- format(pvalue.i, digits = 7, 
                scientific = F)
        }
        else {
            pvalue_matrix[(i - 2), 2] <- 1
        }
        if (pvalue.i != 0) {
            pvalue_matrix[(i - 2), 5] <- format(padjust.i, digits = 7, 
                scientific = F)
        }
        invisible()
    }
    if (write.output == TRUE) {
        all <- readline("Do you want to include all fetures in the output? (yes/no) : ")
        if (substr(all, 1, 1) == "n") {
            select <- as.numeric(readline("Features under what level of significance do you want 
            to retrieve (i.e. 0.2) : "))
            significant <- subset(pvalue_matrix, pvalue_matrix[, 
                5] <= select)
            ordered <- significant[order(significant[, 5]), ]
        }
        else {
            significant <- subset(pvalue_matrix, pvalue_matrix[, 
                5] <= 1)
            ordered <- significant[order(significant[, 5]), ]
        }
        write.table(ordered, file = "permutation.output", quote = F, 
            row.names = F, sep = "\t")
        print(paste("Permutation test done and output table printed!"))
    }
    else {
        significant <- subset(pvalue_matrix, pvalue_matrix[, 
            4] <= 1)
        ordered <- significant[order(significant[, 4]), ]
        ordered
        print(paste("Permutation test done!"))
    }
  }

permubiome documentation built on May 29, 2017, 11:27 a.m.