Description Usage Arguments Value Author(s) See Also Examples
View source: R/BioTIP_update_4_09282020_v3.R
The optimize.sd_selection
filters a multi-state dataset
based on a cutoff value for standard deviation per state and optimizes.
By default, a cutoff value of 0.01 is used. Suggested if each state contains more than 10 samples.
1 2 3 4 5 6 7 8 9 10 11 | optimize.sd_selection(
df,
samplesL,
B = 100,
percent = 0.8,
times = 0.8,
cutoff = 0.01,
method = c("other", "reference", "previous", "itself", "longitudinal reference"),
control_df = NULL,
control_samplesL = NULL
)
|
df |
A dataframe of numerics. The rows and columns represent unique transcript IDs (geneID) and sample names, respectively. |
samplesL |
A list of n vectors, where n equals to the number of states. Each vector gives the sample names in a state. Note that the vectors (sample names) has to be among the column names of the R object 'df'. |
B |
An integer indicating number of times to run this optimization, default 1000. |
percent |
A numeric value indicating the percentage of samples will be selected in each round of simulation. |
times |
A numeric value indicating the percentage of |
cutoff |
A positive numeric value. Default is 0.01. If < 1, automatically
goes to select top x percentage transcripts using the a selecting method (which is
either the |
method |
Selection of methods from
|
control_df |
A count matrix with unique loci as row names and samples names
of control samples as column names, only used for method |
control_samplesL |
A list of characters with stages as names of control samples, required for method 'longitudinal reference'. |
A list of dataframe of filtered transcripts with the highest standard
deviation are selected from df
based on a cutoff value assigned. The
resulting dataframe represents a subset of the raw input df
.
Zhezhen Wang zhezhen@uchicago.edu
1 2 3 4 5 6 7 | counts = matrix(sample(1:100, 30), 2, 30)
colnames(counts) = 1:30
row.names(counts) = paste0('loci', 1:2)
cli = cbind(1:30, rep(c('state1', 'state2', 'state3'), each = 10))
colnames(cli) = c('samples', 'group')
samplesL <- split(cli[, 1], f = cli[, 'group'])
test_sd_selection <- optimize.sd_selection(counts, samplesL, B = 3, cutoff =0.01)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.