View source: R/edec_aux_functions.R
estimate_stability | R Documentation |
This function runs EDec Stage 1 for a series of random subsets of methylation profiles of bulk tissue samples, with varying numbers of constituent cell types. It then computes the similarity of estimated methylation profiles and proportions of constituent cell types across subsets of data for models with each number of constituent cell types. Stability of the model across subsets of the data is generally a good indicator of which number of cell types is an appropriate choice for that dataset.
estimate_stability( meth_bulk_samples, informative_loci, possible_num_ct, subset_prop = 0.8, num_subsets = 5, reps_per_subset = 1, max_its = 1000, rss_diff_stop = 1e-08 )
meth_bulk_samples |
Matrix with methylation profiles of bulk tissue samples. Rows correspond to loci/probes and columns correspond to different samples. |
informative_loci |
A vector containing names (strings) of rows corresponding to loci/probes that are informative for distinguishing cell types. |
possible_num_ct |
A vector of containing the possible numbers of cell types to be used in EDec Stage 1 |
subset_prop |
Proportion of samples from the full dataset to be included in each subset of the data. |
num_subsets |
Number of random subsets of the data on which EDec Stage 1 with different numbers of cell types will be tested. |
reps_per_subset |
How many times to run EDec Stage 1 with each number of cell types in each subset of the data. |
max_its |
Maximum number of iterations after which the EDec Stage 1 algorithm will stop. |
rss_diff_stop |
Maximum difference between the residual sum of squares of the model in two consecutive iterations for the EDec Stage 1 algorithm to converge. |
A specified number of subsets (num_subsets
) of the samples with
methylation profiles will be generated by randomly selecting a fraction
(subset_prop
) of the columns of meth_bulk_samples
. For each of
those subsets of samples, EDec Stage 1 will be run using all possible number
of cell types (possible_num_ct
). Since different runs of EDec Stage 1
with the same parameters can give different results, there is also the option
of running EDec Stage 1 multiple times (reps_per_subset
) with each
number of cell types in each subset of the data, and keeping the best fitting
model. Once all runs of EDec Stage 1 are complete, the estimated methylation
profiles and proportions of constituent cell types for each given number of
constituent cell types will be compared across data subsets. Such comparisons
will be made by computing the Pearson correlation between methylation
profiles or proportion estimates for the same cell type in each pair of data
subsets. To determine which methylation profiles or proportion estimates
correspond to the same cell type in two runs of EDec, this function will
compute the correlation between every pair of estimated methylation profiles,
and find the permutation of the correlation matrix that is most similar to
the identity matrix.
A list with the following components:
most_stable_num_ct
The number of cell types giving the most stable models across the data subsets. Minimum Pearson correlation between either methylation or proportion estimates across all data subsets is used to determine most stable model.
methylation_estimates
A list containing matrices of average methylation profiles of constituent cell types for each data subset and number of cell types.
proportion_estimates
A list containing matrices of proportions of constituent cell types in each input sample for each data subset and number of cell types.
stability_metric_meth
A matrix containing 0 to 100th quantiles, with 5% steps, of Pearson correlations between estimated methylation profiles of constituent cell types across subsets of the data for models with each possible number of cell types. Rows represent different number of cell types. Columns represent different quantiles.
stability_metric_props
A matrix containing 0 to 100th quantiles, with 5% steps, of Pearson correlations between estimated proportions of constituent cell types across subsets of the data for models with each possible number of cell types. Rows represent different number of cell types. Columns represent different quantiles.
stability_metric_comb
A matrix containing 0 to 100th quantiles, with 5% steps, of Pearson correlations between estimated proportions of constituent cell types and between methylation profiles of constituent cell types across subsets of the data for models with each possible number of cell types. Rows represent different number of cell types. Columns represent different quantiles.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.