preprocessDataset: Preprocess Dataset

Description Usage Arguments Value Examples

Description

Preprocesses given dataset. Preprocessing consists of 3 major steps: 1) If needed, probes corresponding to the same genes are collapsed, only most expressed probe is taken for further analysis. It's common technique in microarray data analysis. 2) If needed, only highly expressed genes are taken for further analysis. (Say hello to noize reduction) 3) All genes are clustered with Kmeans using cosine simillarity as distance.

Usage

1
2
preprocessDataset(dataset, annotation = NULL, geneSymbol = "Gene Symbol",
  samples = NULL, topGenes = 10000, topVar = FALSE)

Arguments

dataset

matrix, data.frame, path to file or GSE accession with expression data

annotation

dataframe, matrix, named vector with annotation to probes

geneSymbol

column from annotation to collapse the genes, deafult value is 'Gene Symbol'

samples

character vector of samples. If column were not in samples, it would be excluded from analysis. Default value is NULL, which takes every sample from dataset

topGenes

integer How many genes include in analysis. We suppose to include only expressed genes. Default value is 10000

Value

clustered dataset, matrix, first column identifies cluster of the row

Examples

1
2
3
4
data('datasetLiverBrainLung')
prep <- preprocessDataset(datasetLiverBrainLung)
prep <- preprocessDataset(datasetLiverBrainLung, k=5) # 5 clusters
prep <- preprocessDataset(datasetLiverBrainLung, topGenes=6000) # leave only top 6k genes

ctlab/ClusDec documentation built on May 14, 2019, 12:29 p.m.