DataGeneSets: Function that filters the gene sets to work with the desired...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/DataGeneSets.R

Description

This function provides the gene sets with the desired size. It provides two lists of gene sets, one with the gene identifiers of interest, and the other with the position of the gene identifiers with respect to the dataset. Finally it provides a list of the sizes of all the gene sets considered.

Usage

1
DataGeneSets(output.ReadGMT, data.gene.symbols, size)

Arguments

output.ReadGMT

Output of the function ReadGMT.

data.gene.symbols

Vector with the gene identifiers associated to the dataset interest. These gene identifiers have to the same as the ones on the .gmt file of interest.

size

Integer with the minimum number of genes in a gene set.

Details

This function constructs the gene sets that are going to be considered in the analysis based on the desired size.

Value

This function returns a list with three items

DataGeneSetsIds

A list of gene sets with the positions of the gene identifiers with respect to the dataset of interest.

DataGeneSetsNms

A list of gene sets of the gene identifiers.

Size

A vector with the size of the gene sets

Author(s)

A. Quiroz-Zarate aquiroz@jimmy.harvard.edu

See Also

See the BAGS Vignette for examples on how to use this function and the help of the function Gibbs5 for a detailed example of its use.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
library(breastCancerVDX)
library(Biobase)

data(vdx,package="breastCancerVDX")
gene.expr=exprs(vdx)   # Gene expression of the package
vdx.annot=fData(vdx)   # Annotation associated to the dataset
vdx.clinc=pData(vdx)   # Clinical information associated to the dataset 

# Identifying the sample identifiers associated to ER+ and ER- breast cancer
er.pos=which(vdx.clinc$er==1)
er.neg=which(vdx.clinc$er==0)

# Only keep columns 1 and 3, probeset identifiers and Gene symbols respectively
vdx.annot=vdx.annot[,c(1,3)]

all(rownames(gene.expr)==as.character(vdx.annot[,1]))  # Checking if the probeset are ordered with respect to the dataset
all(colnames(gene.expr)==as.character(vdx.clinc[,1]))  # Checking if the sample identifiers are order with respect to the dataset
rownames(gene.expr)=as.character(vdx.annot[,2])        # Changing the row identifiers to the gene identifiers of interest

#===== Because we have several measurements for a gene (multiple rows for a gene), we filter the genes
#===== Function to obtain the genes with highest variabilty among phenotypes
gene.nms.u=unique(rownames(gene.expr))
gene.nms=rownames(gene.expr)
indices=NULL
for(i in 1:length(gene.nms.u))
{
	aux=which(gene.nms==gene.nms.u[i])
	if(length(aux)>1){
		var.r = apply(cbind(apply(gene.expr[aux,er.pos],1,mean),apply(gene.expr[aux,er.neg],1,mean)),1,var)
		aux=aux[which.max(var.r)]
	}
	indices=c(indices,aux)
}
#===== Only keep the genes with most variability among the phenotypes of interest
gene.expr=gene.expr[indices,]
gene.nams=rownames(gene.expr)     # The gene symbols of interest are stored here


#===== In the following R dataset it is stored the .gmt file associated to the MF from GO.
#===== So "reading the GMT" is the only step that we skip. But an example is provided on the
#===== help file associated to the function "ReadGMT".
data(AnnotationMFGO,package="BAGS")

data.gene.grps=DataGeneSets(AnnotationMFGO,gene.nams,10)

BAGS documentation built on Nov. 8, 2020, 11:11 p.m.