collapse.gct.file: Collapse a GCT file

Description Usage Arguments Value Author(s)

View source: R/collapse.gct.R

Description

It's common for microarrays to have multiple probes per gene. They tend to represent different isoforms. Most geneset testing is done at the gene symbol level & ignores isoforms, so you need to choose 1 probe for each gene. How? 2 common approaches are to take the most abundant probe, or the most variable probe, considered across the cohort. I quite like doing t-stats on each gene & selecting the best performing probe - ie the one with the largest t-stat in either direction. Why? On the Affy 133+2 array, there can be lots of poor probes for each gene. If 5 probes for a gene have these t-stats: 1.2, 0.9, 0.1, -0.1, -10; then IMO, the one that scored -10 is the best probe, since it had a really strong t-stat score. thus method="maxabs" combined with a rnk.file

Usage

1
2
collapse.gct.file(gct.file, chip.file, gct.outfile, rnk.file = NULL,
  method = c("var", "mean", "median"), reverse = FALSE, filter = FALSE)

Arguments

gct.file

the path to a gct file

chip.file

the path to a chip file

gct.outfile

the path to the gct output file

rnk.file

[optional] path to a rnk file (eg a t-statistic for each probe, where you want to select best probe from this score) NB currently UNUSED

method

“mean”, “median” select the probe with highest average/median level, or “var”: select the probe with highest variance across samples; “maxabs” select the probe with the large absolute score in the rnk file (see details).

reverse

[default=FALSE] reverse the ordering selected by method arg. so instead of most variable, it would be least variable.

filter

Filter out (ie exclude) those probes that don't have a gene symbol (as determined by probes that have a gene symbol of NA, “”, “—”, or “NA”.)

Value

A gct file is created with 1 row per gene symbol & now the ‘probe ids’ in column 1 are actually gene symbols.

Author(s)

Mark Cowley, 2011-02-27


drmjc/metaGSEA documentation built on Aug. 8, 2020, 1:53 p.m.