optimize.sd_selection: Optimization of sd selection

View source: R/BioTIP_update_04202022.R

optimize.sd_selectionR Documentation

Optimization of sd selection

Description

The optimize.sd_selection filters a multi-state dataset based on a cutoff value for standard deviation per state and optimizes. By default, a cutoff value of 0.01 is used. Suggested if each state contains more than 10 samples.

Usage

optimize.sd_selection(
  df,
  samplesL,
  B = 100,
  percent = 0.8,
  times = 0.8,
  cutoff = 0.01,
  method = c("other", "reference", "previous", "itself", "longitudinal reference"),
  control_df = NULL,
  control_samplesL = NULL
)

Arguments

df

A dataframe of numerics. The rows and columns represent unique transcript IDs (geneID) and sample names, respectively.

samplesL

A list of n vectors, where n equals to the number of states. Each vector gives the sample names in a state. Note that the vectors (sample names) has to be among the column names of the R object 'df'.

B

An integer indicating number of times to run this optimization, default 1000.

percent

A numeric value indicating the percentage of samples will be selected in each round of simulation.

times

A numeric value indicating the percentage of B times a transcript need to be selected in order to be considered a stable signature.

cutoff

A positive numeric value. Default is 0.01. If < 1, automatically goes to select top x percentage transcripts using the a selecting method (which is either the reference, other or previous stage), e.g. by default it will select top 1 percentage of the transcripts.

method

Selection of methods from reference, other, previous, default uses other. Partial match enabled.

  • itself, or longitudinal reference. Some specific requirements for each option:

  • reference, the reference has to be the first.

  • previous, make sure sampleL is in the right order from benign to malign.

  • itself, make sure the cutoff is smaller than 1.

  • longitudinal reference make sure control_df and control_samplesL are not NULL. The row numbers of control_df is the same as df and all transcript in df are also in control_df.

control_df

A count matrix with unique loci as row names and samples names of control samples as column names, only used for method longitudinal reference.

control_samplesL

A list of characters with stages as names of control samples, required for method 'longitudinal reference'.

Value

A list of dataframe of filtered transcripts with the highest standard deviation are selected from df based on a cutoff value assigned. The resulting dataframe represents a subset of the raw input df.

Author(s)

Zhezhen Wang zhezhen@uchicago.edu

See Also

sd_selection


xyang2uchicago/NPS documentation built on Nov. 7, 2023, 1 a.m.