generateprofiles: Generate Profiles

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/generateprofiles.R

Description

Processing Affymetrix data to generate ranked lists of differential gene expression and associated p-values.

Usage

1
generateprofiles(input = c("AE", "GEO", "localAE", "local"), normalisation = c("rma", "mas5"), accession = NULL, customfile = NULL, celfilepath = NULL, sdrfpath = NULL, case = c("disease", "drug"), statistic = c("coef", "t", "diff"), annotation = NULL, factorvalue = NULL,annotationmap=NULL,type=c("average","medpolish","maxvar","max"),outputgenedata=FALSE)

Arguments

input

Character string denoting the source of the data. One of AE (default), GEO, localAE or local.

normalisation

Character string denoting the normalisation procedure as implemented in the affy package. One of mas5 (default) or rma.

accession

Optional character string giving the database reference for use with either the AE or GEO options.

customfile

Optional character string giving the path of a file containing the factor values associated with the CEL files specified in folder celfilepath

celfilepath

Optional character string giving the path of a folder containing CEL files to analyse.

sdrfpath

Optional character string giving path of an sdrf file corresponding to CEL files in celfilepath

case

Character string, one of disease (default) or drug denoting whether the input profiles are disease or drug profiles.

statistic

Character string, one of coef (default), t or diff.

annotation

Optional character string giving the platform of the affymetrix files

factorvalue

Optional character string giving the name of the factor value in the GEO database.

annotationmap

Optional matrix, or string to text file, containing an annotation map to convert from probes (first column) to HUGO gene symbols (second column). If passing a file path name the text file should have only two columns without rownames or headers.

type

The type of statistic to use to combine multiple probes to a single gene. Can be one of average (default) expression values, median polish, maxvar: the single probe to represent the set which has maximum variance or max to use the probe with maximal variance.

outputgenedata

Boolean set to default FALSE. Outputs the gene data produced by generate profiles instead of the fitted coefficients from the linear models.

Details

Input types of AE and GEO use raw data download from Array Express using the ArrayExpress [1] package or processed GDS files from GEO using the GEOquery package [2]. CEL files and sdrf files downloaded from Array Express and stored locally can be processed using localAE option with the sdrf file path specified in sdrfpath and the path of the folder containing the CEL files contained in celfilepath. Users data stored locally can be processed using the local option with CEL file folders in celfilepath and factors associated with the CEL files in customfile. Where metadata may be missing from the GEO database, platform annotations can be specified using the annotation parameters and the name of main factor value (e.g. disease status, or compound treatment) using factorvalue option. Raw CEL files are normalised (rma or mast)[3] and data is converted from probes to genes using BioMart annotations [4]. Linear models are fitted using the database factor vales or user provided factors for locally stored data [5]. The differential expression is calculated for HUGO genes with the mapping performed automatically for Affymetrix platforms, HGU133A, HGU133Plus2 and HGU133A2 using BioMart. The differential expression statistic is one of coef (default), which corresponds to log (base 2) FC, diff (which is the difference between raw (non-logged) expression values, or t for the t-statistic based on log base 2 expression values.

Value

List with two elements:

Ranklist

Matrix containing the ranks of gene expression. Rows containing the genes, columns the different profiles

Pvalues

Matrix containing the associated p-values to the differential expression profiles in Ranklist

Author(s)

C. Pacini

References

[1]Kauffmann et al. (2009) Importing Array Express datasets into R/Bioconductor. Bioinformatics, 25(16):2092-4.

[2]Davis et al. (2007) GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 14, 1846-1847.

[3]Irizarry et al. (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research, 31(4); e15.

[4]Durinck et al. (2009). Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature Protocols 4, 1184-1191.

[5]Smyth et al. (2004). Linear models and empirical Bayes method for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, Vol. 3, No. 1, Article 3.

See Also

classifyprofile

Examples

1
#profileAE<-generateprofiles(input="AE",accession="E-GEOD-22528")

DrugVsDisease documentation built on Nov. 8, 2020, 5:56 p.m.