expressionFileSetToMatrix: Convert Expression Files to/from a Matrix of Gene Values

expressionFileSetToMatrixR Documentation

Convert Expression Files to/from a Matrix of Gene Values

Description

Combine a set of transcriptome text files into one numeric matrix, with a row for each gene and a column for each dataset; or the inverse.

Usage

expressionFileSetToMatrix(fnames, fids, geneColumn = c("GENE_ID","GeneID"),
                          intensityColumn = c("INTENSITY", "RPKM_M", "RANK"),
                          missingGenes = c("na","drop","fill"), sep="\t",
			  keepIntergenics = FALSE, verbose = FALSE)

expressionMatrixToFileSet(m, geneColumn = c("GENE_ID","GeneID"),
                          intensityColumn = c("INTENSITY", "RPKM_M"),
			  path = ".", sep = "\t", verbose = FALSE)

Arguments

fnames

character vector of full pathnames to the transcript files

fids

character vector of SampleID terms (same length as fnames) to become the column names of the matrix

geneColumn

the column name of the gene identifier column in all transcript files

intensityColumn

the column name of the expression magnitude column in all transcript files

missingGenes

how to handle genes that are not present in every file

sep

passed to read.delim for reading in the expression files

keepIntergenics

logical, whether to keep or drop the non-gene rows from all transcriptome files

m

a numeric matrix of gene expression values, with rownames for the GeneIDs and column names for the SampleIDs.

Details

Genes can be in any row order in the input files, and will be output in alphabetical order. An error occurs if any file does not exist, and the list of missing files is reported. The handling of genes not present in every file is controllable: 'drop' removes entire rows whenever a gene is missing from any file; 'fill' fills in missing values with the smallest magnitude value observed; 'na' sets missing values to NA.

Value

A matrix of gene expression values, having a column for each transcript file (with fids as column names), and a row for each gene (with GeneIDs as rownames).

For the inverse function, a set of files is written to path, one for each column of m, with file names created from the column names of the form SampleID.Prefix.Transcript.txt

Author(s)

Bob Morrison


robertdouglasmorrison/DuffyTools documentation built on May 6, 2024, 8:26 p.m.