readSE: Reading gene expression data from file

View source: R/readSE.R

readSER Documentation

Reading gene expression data from file

Description

The function reads in plain expression data from file with minimum annotation requirements for the colData and rowData slots.

Usage

readSE(
  assay.file,
  cdat.file,
  rdat.file,
  data.type = c(NA, "ma", "rseq"),
  NA.method = c("mean", "rm", "keep")
)

Arguments

assay.file

Expression matrix. A tab separated text file containing expression values. Columns = samples/subjects; rows = features/probes/genes; NO headers, row or column names.

cdat.file

Column (phenotype) data. A tab separated text file containing annotation information for the samples in either *two or three* columns. NO headers, row or column names. The number of rows/samples in this file should match the number of columns/samples of the expression matrix. The 1st column is reserved for the sample IDs; The 2nd column is reserved for a *BINARY* group assignment. Use '0' and '1' for unaffected (controls) and affected (cases) sample class, respectively. For paired samples or sample blocks a third column is expected that defines the blocks.

rdat.file

Row (feature) data. A tab separated text file containing annotation information for the features. In case of probe level data: exactly *TWO* columns; 1st col = probe/feature IDs; 2nd col = corresponding gene ID for each feature ID in 1st col. In case of gene level data: the gene IDs newline-separated (i.e. just *one* column). It is recommended to use *ENTREZ* gene IDs (to benefit from downstream visualization and exploration functionality of the EnrichmentBrowser). NO headers, row or column names. The number of rows (features/probes/genes) in this file should match the number of rows/features of the expression matrix. Alternatively, this can also be the ID of a recognized platform such as 'hgu95av2' (Affymetrix Human Genome U95 chip) or 'ecoli2' (Affymetrix E. coli Genome 2.0 Array).

data.type

Expression data type. Use 'ma' for microarray and 'rseq' for RNA-seq data. If NA, data.type is automatically guessed. If the expression values in the expression matrix are decimal numbers, they are assumed to be microarray intensities. Whole numbers are assumed to be RNA-seq read counts. Defaults to NA.

NA.method

Determines how to deal with NA's (missing values). This can be one out of:

  • mean: replace NA by the mean over all samples for one feature at a time. removed.

  • keep: do nothing. Missing values are kept (which, however, can then cause several issues in the downstream analysis)

Defaults to 'mean'.

Value

An object of class SummarizedExperiment.

Author(s)

Ludwig Geistlinger

See Also

SummarizedExperiment

Examples


    # reading the expression data from file
    assay.file <- system.file("extdata/exprs.tab", package="EnrichmentBrowser")
    cdat.file <- system.file("extdata/colData.tab", package="EnrichmentBrowser")
    rdat.file <- system.file("extdata/rowData.tab", package="EnrichmentBrowser")
    se <- readSE(assay.file, cdat.file, rdat.file)


lgeistlinger/EnrichmentBrowser documentation built on May 9, 2024, 7:22 p.m.