Description Usage Arguments Details Value Note References Examples

Fits the TCA model for an input matrix of observations coming from a mixture of `k`

sources, under the assumption that each observation is a mixture of unique source-specific values (in each feature in the data). For example, in the context of tissue-level bulk DNA methylation data coming from a mixture of cell types (i.e. the input is methylation sites by individuals), `tca`

allows to model the methylation of each individual as a mixture of cell-type-specific methylation levels that are unique to the individual.

1 2 3 4 5 |

`X` |
An |

`W` |
An |

`C1` |
An |

`C1.map` |
An |

`C2` |
An |

`refit_W` |
A logical value indicating whether to re-estimate the input |

`refit_W.features` |
A vector with the names of the features in |

`refit_W.sparsity` |
A numeric value indicating the number of features to select using the ReFACTor algorithm when re-estimating |

`refit_W.sd_threshold` |
A numeric value indicating a standard deviation threshold to be used for excluding low-variance features in |

`tau` |
A non-negative numeric value of the standard deviation of the measurement noise (i.e. the i.i.d. component of variation in the model). If |

`parallel` |
A logical value indicating whether to use parallel computing (possible when using a multi-core machine). |

`num_cores` |
A numeric value indicating the number of cores to use (activated only if |

`max_iters` |
A numeric value indicating the maximal number of iterations to use in the optimization of the TCA model ( |

`log_file` |
A path to an output log file. Note that if the file |

`debug` |
A logical value indicating whether to set the logger to a more detailed debug level; please set |

`verbose` |
A logical value indicating whether to print logs. |

The TCA model assumes that the hidden source-specific values are random variables. Formally, denote by *Z_{hj}^i* the source-specific value of observation *i* in feature *j* source *h*, the TCA model assumes:

*Z_{hj}^i \sim N(μ_{hj},σ_{hj}^2)*

where *μ_{hj},σ_{hj}* represent the mean and standard deviation that are specific to feature *j* source *h*. The model further assumes that the observed value of observation *i* in feature *j* is a mixture of *k* different sources:

*X_{ji} = ∑_{h=1}^k W_{ih}Z_{hj}^i + ε_{ji}*

where *W_{ih}* is the non-negative proportion of source *h* in the mixture of observation *i* such that *∑_{h=1}^kW_{ih} = 1*, and *ε_{ji} \sim N(0,τ^2)* is an i.i.d. component of variation that models measurement noise. Note that the mixture proportions in *W* are, in general, unique for each individual, therefore each entry in the data matrix *X* is coming from a unique distribution (i.e. a different mean and a different variance).

In cases where the true `W`

is unknown, `tca`

can be provided with initial estimates of `W`

and then re-estimate `W`

as part of the optimization procedure (see argument `refit_W`

). These initial estimates should not be random but rather capture the information in `W`

to some extent. When the argument `refit_W`

is used, it is typically the case that only a subset of the features should be used for re-estimating `W`

. Therefore, when re-estimating `W`

, `tca`

performs feature selection using the ReFACTor algorithm; alternatively, it can also be provided with a user-specified list of features to be used in the re-estimation (see argument `refit_W.features`

).

Factors that systematically affect the source-specific values *Z_{hj}^i* can be further considered (see argument `C1`

). In that case, we assume:

*Z_{hj}^i \sim N(μ_{hj}+c^{(1)}_i γ_j^h,σ_{hj}^2)*

where *c^{(1)}_i* is a row vector from `C1`

, corresponding to the values of the *p_1* factors for observation *i*, and *γ_j^h* is a vector of *p_1* corresponding effect sizes.

Factors that systematically affect the mixture values *X_{ji}*, such as variables that capture biases in the collection of the measurements, can also be considered (see argument `C2`

). In that case, we assume:

*X_{ji} \sim ∑_{h=1}^k W_{ih}Z_{hj}^i + c^{(2)}_i δ_j + ε_{ij}*

where *c^{(2)}_i* is a row vector from `C2`

, corresponding to the values of the *p_2* factors for observation *i*, and *δ_j* is a vector of *p_2* corresponding effect sizes.

A list with the estimated parameters of the model. This list can be then used as the input to other functions such as `tcareg`

.

`W` |
An |

`mus_hat` |
An |

`sigmas_hat` |
An |

`tau_hat` |
An estimate of the standard deviation of the i.i.d. component of variation in |

`gammas_hat` |
An |

`deltas_hat` |
An |

The function `tca`

may require a long running time when the input matrix `X`

is very large; to alleviate this, it is strongly advised to use the `parallel`

argument, given that a multi-core machine is available.

Rahmani E, Schweiger R, Rhead B, Criswell LA, Barcellos LF, Eskin E, Rosset S, Sankararaman S, Halperin E. Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology. Nature Communications 2018.

Rahmani E, Zaitlen N, Baran Y, Eng C, Hu D, Galanter J, Oh S, Burchard EG, Eskin E, Zou J, Halperin E. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nature Methods 2016.

1 2 |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.