estimate_stability: This chunk of codes are taken from EDec paper Estimate...

View source: R/edec_aux_functions.R

estimate_stabilityR Documentation

This chunk of codes are taken from EDec paper Estimate stability of EDec models

Description

This function runs EDec Stage 1 for a series of random subsets of methylation profiles of bulk tissue samples, with varying numbers of constituent cell types. It then computes the similarity of estimated methylation profiles and proportions of constituent cell types across subsets of data for models with each number of constituent cell types. Stability of the model across subsets of the data is generally a good indicator of which number of cell types is an appropriate choice for that dataset.

Usage

estimate_stability(
  meth_bulk_samples,
  informative_loci,
  possible_num_ct,
  subset_prop = 0.8,
  num_subsets = 5,
  reps_per_subset = 1,
  max_its = 1000,
  rss_diff_stop = 1e-08
)

Arguments

meth_bulk_samples

Matrix with methylation profiles of bulk tissue samples. Rows correspond to loci/probes and columns correspond to different samples.

informative_loci

A vector containing names (strings) of rows corresponding to loci/probes that are informative for distinguishing cell types.

possible_num_ct

A vector of containing the possible numbers of cell types to be used in EDec Stage 1

subset_prop

Proportion of samples from the full dataset to be included in each subset of the data.

num_subsets

Number of random subsets of the data on which EDec Stage 1 with different numbers of cell types will be tested.

reps_per_subset

How many times to run EDec Stage 1 with each number of cell types in each subset of the data.

max_its

Maximum number of iterations after which the EDec Stage 1 algorithm will stop.

rss_diff_stop

Maximum difference between the residual sum of squares of the model in two consecutive iterations for the EDec Stage 1 algorithm to converge.

Details

A specified number of subsets (num_subsets) of the samples with methylation profiles will be generated by randomly selecting a fraction (subset_prop) of the columns of meth_bulk_samples. For each of those subsets of samples, EDec Stage 1 will be run using all possible number of cell types (possible_num_ct). Since different runs of EDec Stage 1 with the same parameters can give different results, there is also the option of running EDec Stage 1 multiple times (reps_per_subset) with each number of cell types in each subset of the data, and keeping the best fitting model. Once all runs of EDec Stage 1 are complete, the estimated methylation profiles and proportions of constituent cell types for each given number of constituent cell types will be compared across data subsets. Such comparisons will be made by computing the Pearson correlation between methylation profiles or proportion estimates for the same cell type in each pair of data subsets. To determine which methylation profiles or proportion estimates correspond to the same cell type in two runs of EDec, this function will compute the correlation between every pair of estimated methylation profiles, and find the permutation of the correlation matrix that is most similar to the identity matrix.

Value

A list with the following components:

most_stable_num_ct

The number of cell types giving the most stable models across the data subsets. Minimum Pearson correlation between either methylation or proportion estimates across all data subsets is used to determine most stable model.

methylation_estimates

A list containing matrices of average methylation profiles of constituent cell types for each data subset and number of cell types.

proportion_estimates

A list containing matrices of proportions of constituent cell types in each input sample for each data subset and number of cell types.

stability_metric_meth

A matrix containing 0 to 100th quantiles, with 5% steps, of Pearson correlations between estimated methylation profiles of constituent cell types across subsets of the data for models with each possible number of cell types. Rows represent different number of cell types. Columns represent different quantiles.

stability_metric_props

A matrix containing 0 to 100th quantiles, with 5% steps, of Pearson correlations between estimated proportions of constituent cell types across subsets of the data for models with each possible number of cell types. Rows represent different number of cell types. Columns represent different quantiles.

stability_metric_comb

A matrix containing 0 to 100th quantiles, with 5% steps, of Pearson correlations between estimated proportions of constituent cell types and between methylation profiles of constituent cell types across subsets of the data for models with each possible number of cell types. Rows represent different number of cell types. Columns represent different quantiles.


bozdaglab/CTDPathSim2.0 documentation built on April 14, 2022, 12:39 a.m.