reduceGenes_pca: Reduce number of genes in expression matrix using principal...

Description Usage Arguments Value

Description

Takes ExpressionSet object, performs PCA on the transposed expression matrix, then selects the specified number of most influential genes on the specified PCs. Two methods are available for selecting the genes: 1) the genes with the highest absolute loading value across all the specified PCs are chosen, or 2) the genes whose loadings show the most significant positive and negative correlations with the specified PCs are chosen using the dimdesc function within the FactoMineR package.

Usage

1
2
reduceGenes_pca(cellData, corr = TRUE, PCs = c(1, 2, 3), genes = 300,
  center = TRUE, scale = FALSE, print = FALSE, saveTable = FALSE)

Arguments

cellData

ExpressionSet object created with readCells (and preferably transformed with prepCells). It is also helpful to first run reduceGenes_var.

corr

Boolean when set to TRUE reduces genes using the dimedesc function within the FactoMineR package. With this method, an equal number of genes from each of the specified PCs is chosen. Also, within an individual PC, an equal number of genes with positive and negative loadings is chosen. E.g. if 200 genes on PCs 1 and 2 are specified, the top 50 genes with the most significant positive correlation and the top 50 genes with the most signficant negative correlation on PC 1 and PC 2 will be chosen. If set to FALSE, the genes with the maximum absolute loading value across all the specified PCs are chosen. Be warned that with this method, genes from an individual PC can dominate.

PCs

Vector of integers specifying which PCs to select genes from. Running pcaMatrix on the full list of genes can be helpful in determining which PCs to use.

genes

Integer specifying the desired number of genes to select from the pca analysis. The number of genes in the expression matrix will be reduced to this number.

center

Boolean specifying whether to the center the data prior to PCA. This is generally recommended.

scale

Boolean specifying whether the data should be scaled prior to PCA. This is generally not recommended unless samples have different units (e.g. some samples are counts and some are TPMs).

print

Boolean specifying whether the results from the PCA analysis should be displayed in the terminal window.

saveTable

Boolean specifying whether the results from the PCA analysis should be saved in a .txt output file.

Value

ExpressionSet object with genes removed from the expression matrix according to the optional parameters specified above. Note that the original list of genes will still be present within fData. Genes that pass filter will be stored in fData as TRUE, genes that do not pass filter will be stored as FALSE.


joeburns06/hocuspocus documentation built on May 19, 2019, 2:59 p.m.