This document provides information on how to extract subsets of genes from previously available gene lists by setting different filtering conditions such as the fold change, the p-value or the availability of Entrez identifier.

Data Input Format for gene list selection}

In principle a filtering tool might read the file header and, once this is done, create an interactive dialog to query for the values that would be applied for subsetting the lists rows or columns.

In practice, and in our work environment most lists will be extracted from the standard output of our microarray analysis pipeline. (In this point we assume that the user is familiarized with standard microarray analysis ``a la Bioconductor''. If this is not so the reader can browse through the slides and examples in \url{http://eib.stat.ub.edu/Omics+Data+Analysis}). These files are generically described as "ExpressionAndTopTables" because they consist of tables having: The Gene Symbols and the Entrez Identifiers in the first two columns The standard output of the limma software known as "topTable" * (optionally) the Expression values that have been used to compute the Toptable.

Although some type of analyses require only the gene identifiers other need also the expressions. For this reason these output files contain ``all that is needed'' for further analyses.

Load example data

The simplest way to get the data is to load into an R object.

fileName<- system.file("extdata", "topTables.Rda", package = "geneLists")
load(fileName)
class(AvsB)
colnames(AvsB)
head(AvsB[,1:7])

How many genes

The function numGenesChanged allows one to make an idea of how many genes will be recovered based on a p-value filtering criteria. It is a good idea to start with this function to explore the topTables that will be subsetted. This may help the appropriate filters.

require(geneLists)
cbind(numGenesChanged(AvsB, "AvsB"), numGenesChanged(AvsL, "AvsL"), numGenesChanged(BvsL, "BvsL")) 

Filtering genes

The functions available in the package allow to extract simple gene lists.

entrezs_01_up  <- genesFromTopTable (AvsB, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.01, updown="up", id2Select = "ENTREZ", FCcutoff=1, cols2Select =0) 
length(entrezs_01_up)

ALternatively one can extract a subtable consisting of several columns from the original table

table_01_up  <- genesFromTopTable (AvsB, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.01, updown="up", id2Select = NULL, FCcutoff=1, cols2Select =1:3) 
dim(table_01_up)

A case study

A typical situation for a user of this package may consist of some or all the following actions:

AvsB0  <- genesFromTopTable (AvsB, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 1)
AvsL0  <- genesFromTopTable (AvsL, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 1)
BvsL0  <- genesFromTopTable (BvsL, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 1)
AvsB1  <- genesFromTopTable (AvsB, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.05)
AvsL1  <- genesFromTopTable (AvsL, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.05)
BvsL1  <- genesFromTopTable (BvsL, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.05)

cat("AvsB: ", length(AvsB0), "-->", length(AvsB1), "\n")
cat("AvsL: ", length(AvsL0), "-->", length(AvsL1), "\n")
cat("BvsL: ", length(BvsL0), "-->", length(BvsL1), "\n")
AvsB1Up  <- genesFromTopTable (AvsB, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.05, updown="up")
AvsL1Up  <- genesFromTopTable (AvsL, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.05, updown="up")
BvsL1Up  <- genesFromTopTable (BvsL, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.05, updown="up")

cat("AvsB: ", length(AvsB1), "-->", length(AvsB1Up), "\n")
cat("AvsL: ", length(AvsL1), "-->", length(AvsL1Up), "\n")
cat("BvsL: ", length(BvsL1), "-->", length(BvsL1Up), "\n")

AvsB1Down  <- genesFromTopTable (AvsB, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.05, updown="down")
AvsL1Down  <- genesFromTopTable (AvsL, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.05, updown="down")
BvsL1Down  <- genesFromTopTable (BvsL, entrezOnly = TRUE, uniqueIds=TRUE, adjOrrawP = "adj", Pcutoff = 0.05, updown="down")

cat("AvsB: ", length(AvsB1), "-->", length(AvsB1Down), "\n")
cat("AvsL: ", length(AvsL1), "-->", length(AvsL1Down), "\n")
cat("BvsL: ", length(BvsL1), "-->", length(BvsL1Down), "\n")
commonAvsLandBvsL <- intersect(AvsL0, BvsL0)
length(commonAvsLandBvsL)


alexsanchezpla/geneLists documentation built on May 12, 2019, 7:41 a.m.