sd_selection: Selecting Highly Oscillating Transcripts
In BioTIP: BioTIP: An R package for characterization of Biological Tipping-Point

Description Usage Arguments Value Author(s) See Also Examples

View source: R/BioTIP_update_4_09282020_v3.R

sd_selection pre-selects highly oscillating transcripts from the input dataset df. The dataset must contain multiple sample groups (or 'states'). For each state, the function filters the dataset using a cutoff value for standard deviation. The default cutoff value is 0.01 (i.e., higher than the top 1 percentage standard deviation).

sd_selection(
  df,
  samplesL,
  cutoff = 0.01,
  method = c("other", "reference", "previous", "itself", "longitudinal reference"),
  control_df = NULL,
  control_samplesL = NULL
)

`df`	A numeric matrix or data frame. The rows and columns represent unique transcript IDs (geneID) and sample names, respectively.
`samplesL`	A list of vectors, whose length is the number of states. Each vector gives the sample names in a state. Note that the vectors (sample names) has to be among the column names of the R object 'df'.
`cutoff`	A positive numeric value. Default is 0.01. If < 1, automatically selects top x transcripts using the a selecting method (which is either the `reference`, `other` stages or `previous` stage), e.g. by default it will select top 1 percentage of the transcripts.
`method`	Selection of methods from `reference`,`other`, `previous`, default uses `other`. Partial match enabled. `itself`, or `longitudinal reference`. Some specific requirements for each option: `reference`, the reference has to be the first. `previous`, make sure `sampleL` is in the right order from benign to malign. `itself`, make sure the cutoff is smaller than 1. `longitudinal reference` make sure `control_df` and `control_samplesL` are not NULL. The row numbers of control_df is the same as df and all transcripts in df are also in control_df.
`control_df`	A count matrix with unique loci as row names and samples names of control samples as column names, only used for method `longitudinal reference`
`control_samplesL`	A list of characters with stages as names of control samples, required for method 'longitudinal reference'

sd_selection() A list of data frames, whose length is the number of states. The rows in each data frame are the filtered transcripts with highest standard deviation selected from df and based on an assigned cutoff value. Each resulting data frame represents a subset of the raw input df, with the sample ID of the same state in the column.

Zhezhen Wang zhezhen@uchicago.edu

optimize.sd_selection

counts = matrix(sample(1:100, 18), 2, 9)
colnames(counts) = 1:9
row.names(counts) = c('loci1', 'loci2')
cli = cbind(1:9, rep(c('state1', 'state2', 'state3'), each = 3))
colnames(cli) = c('samples', 'group')
samplesL <- split(cli[, 1], f = cli[, 'group'])
test_sd_selection <- sd_selection(counts,  samplesL,  0.01)