screenExpr: Screens data from the original expression files

Description Usage Arguments Details Value Author(s) References Examples

Description

Screens for genes with sample variance above user-specified threshold. Consolidates data for duplicate genes by averaging across duplicates.

Usage

1
screenExpr(Yexpr, sdCutoff)

Arguments

Yexpr

Expression data.

sdCutoff

std dev threshold.

Details

Screens for genes with sample variance above user-specified threshold. Consolidates data for duplicate genes by averaging across duplicates.

Value

Returns expression data with unique gene names and variance greater than psecified threshold

Author(s)

Anguraj Sadanandam anguraj.sadanandam@icr.ac.uk

References

Anguraj Sadanandam, et all (2015). A cross-species analysis in pancreatic neuroendocrine tumors reveals molecular subtypes with distinctive clinical, metastatic, developmental, and metabolic characteristics. Cancer Discovery.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (Yexpr, sdCutoff) 
{
    data.all <- Yexpr
    print(paste("    Read expression data for ", dim(data.all)[1], 
        " genes.", sep = ""))
    data.mapped <- data.all[toupper(data.all[, 1]) != "UNMAPPED" & 
        !is.na(data.all[, 1]), ]
    print(paste("    Mapped data to ", dim(data.mapped)[1], " genes IDs.", 
        sep = ""))
    nLastCol <- dim(data.mapped)[2]
    print("Removing genes with low sample variance...")
    sampleVar <- apply(data.mapped[, 2:nLastCol], 1, var)
    idxHighVar <- sampleVar > sdCutoff^2
    data.highVar <- data.mapped[idxHighVar, ]
    sampleVar.highVar <- sampleVar[idxHighVar]
    print(paste("    Found ", dim(data.highVar)[1], " genes exceeding SD threshold of ", 
        sdCutoff, ".", sep = ""))
    nGenesHighVar <- dim(data.highVar)[1]
    genesUnique <- as.vector(unique(data.highVar[, 1]))
    nGenesUnique <- length(genesUnique)
    nSamples <- dim(data.highVar)[2] - 1
    data <- array(dim = c(nGenesUnique, nLastCol))
    data[, 1] <- genesUnique
    colnames(data) <- colnames(data.highVar)
    print("Removing duplicate genes (selecting for max standard deviation)...")
    for (gene in genesUnique) {
        idxGenes <- seq(along = 1:nGenesHighVar)[data.highVar[, 
            1] == gene]
        data.slice <- data.highVar[idxGenes, 2:nLastCol]
        if (length(idxGenes) > 1) {
            idxMaxVar <- which.max(sampleVar.highVar[idxGenes])
            data.slice <- data.slice[idxMaxVar, ]
        }
        data[data[, 1] == gene, 2:nLastCol] <- as.matrix(data.slice)
    }
    print(paste("    ", nGenesUnique, " unique genes IDs remain.", 
        sep = ""))
    print("Screened data is the output:... ")
    print("")
    rm(data.all, data.mapped, data.highVar)
    sdata <- data
    sdata
  }

syspremed/PanNETassigner documentation built on May 31, 2019, 12:48 a.m.