preprocess_querydata_new: preprocess_querydata_new

Description Usage Arguments Details Value Examples

View source: R/spagi2_master.R

Description

This function preprocesses the query data to convert gene IDs to gene symbols and to calculate average value of the replicates and then get expressed genes based on expression cut-off.

Usage

1
2
3
4
5
6
7
8
9
preprocess_querydata_new(
  cell.tissue.data,
  exp.cutoff.th,
  species = "hsapiens",
  data.format = "matrix",
  geneID = "external_gene_name",
  experiment.descriptor = NULL,
  collapse.method = "max"
)

Arguments

cell.tissue.data

An expression matrix with replicated column headers per-replicate. It is assumed that all query data are in RPKM/FPKM/CPM and log normalized form. Also assume that gene ids are official gene symbols. For the matrix, rows denote the genes and the columns denote the cell types or tissues. Duplicate column names are expected in this case denoting replicate samples. All the replicate samples for a specific cell or tissue should have identical column names, otherwise the experiment.descriptor parameter should be used to identify replicate samples of a specific cell type or tissue.

exp.cutoff.th

An expression cut-off threshold value for the query data.

species

The species abbreviation of the query data (cell.tissue.data). Default is "hsapiens".

data.format

Format of cell.tissue.data. Default is "matrix".

geneID

The code for the type of gene IDs used by cell.tissue.data, as used by the biomaRt database. To find the valid codes for gene IDs for a species, please see the find_valid_geneID() function of the package. Default is "external_gene_name".

experiment.descriptor

A vector corresponding to the matrix column names of cell.tissue.data, containing the cell type or tissues of each sample. The names should be identical for a specific cell or tissue. Defaults to NULL.

collapse.method

How to summarise values when one ensembl_gene_id has more then one value (multiple microarray probes or transcripts to one gene for example). Currently two options are implemented, 'max' or 'mean'. Default is "max".

Details

This function preprocesses the query data to convert gene IDs to gene symbols and to calculate average value of the replicates and then get expressed genes based on expression cut-off.

Value

This function returns a list with specifically expressed genes for each cell type / tissue with gene IDs as gene symbols

Examples

1
2
3
4
query.data<-matrix(sample(1:10, 100, replace=TRUE),10,10)
rownames(query.data)<-c("CRYAA", "CRYAB", "CRYBB3", "PAX6", "SOX2", "PROX1", "SIX3", "CRIM1", "CRYBB2", "BMP7")
colnames(query.data)<-c("cell1", "cell1", "cell1", "cell2", "cell2", "cell2", "cell3", "cell3", "cell3", "cell3")
preprocess_querydata_new(cell.tissue.data=query.data, exp.cutoff.th=5)

humayun2017/SPAGI2 documentation built on Aug. 5, 2020, 12:06 a.m.