Create an joint gene expression table of all samples

Share:

Description

The function reads in all gene expression data given by the sample description sample_desc and return a joint expression table of all samples.

Usage

1
2
create_gene_exp(sample_desc, read_fun = NULL, progress = TRUE,
  progress_width = 48, ...)

Arguments

sample_desc

data.table object created by create_sample_desc.

read_fun

Custom reader function, see its own section for more detail.

progress

Whether to display a progress bar. By default TRUE.

progress_width

The text width of the shown progress bar. By default is 48 chars wide.

...

Arguments passed to the custom reader function specified in read_fun.

Details

By default it assumes the data to be of TCGA level 3 file format. However, nearly all real world data fail to have the same format as TCGA. In this case, one needs to tell the function how to parse the data by implementing a custom reader function that accepts the filepath as the first argument. See Detail section for full specification. The function naively concatenates all return expression as if all gene expressions are stated in the same gene order as columns in a new data.table.

Value

data.table of all samples gene expression, whose rows are gene expression and columns are sample names. First column GENE contains the corresponding gene names.

Custom reader function

Custom reader function is given by read_fun = your_reader_fun. It takes the filepath as the first argument and return a data.table with the first two columns being GENE and Expression of type character and double.

The output joint gene expression table has first column GENE store the gene name, which are are determined by the first sample being evaluated.

Rest arguments of create_gene_exp(...) will be passed to this reader function.

Note: all string-like columns should NOT be of type factor. Remember to set stringsAsFactors = FALSE.

Note

The function assumes row order for all samples' gene expressions are the same.

See Also

read.table and fread for custom reader function implementation; create_sample_desc for creating sample description.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Use first three samples of the builtin dataset

sample_root <- system.file("extdata", package = "iGC")
sample_desc_pth <- file.path(sample_root, "sample_desc.csv")
sample_desc <- create_sample_desc(
    sample_desc_pth, sample_root=sample_root
)[1:3]

## Define custom reader function for TCGA level 3 data
my_gene_exp_reader <- function(ge_filepath) {
    gene_exp <- read.table(
        ge_filepath,
        header = FALSE, skip = 2,
        na.strings = "null",
        colClasses = c("character", "double")
    )
    dt <- data.table::as.data.table(gene_exp)
    data.table::setnames(dt, c("GENE", "Expression"))
}
gene_exp <- create_gene_exp(
    sample_desc,
    read_fun = my_gene_exp_reader,
    progress_width = 60
)
gene_exp[1:5]