View source: R/get_heteroplasmy.R
get_heteroplasmy | R Documentation |
It is one of the two main functions of the MitoHEAR package (together with get_raw_counts_allele). It computes the allele frequencies and the heteroplasmy matrix starting from the counts matrix obtained with get_raw_counts_allele.
get_heteroplasmy( raw_counts_allele, name_position_allele, name_position, number_reads, number_positions, filtering = 1, my.clusters = NULL )
raw_counts_allele |
A raw counts matrix obtained from get_raw_counts_allele. |
name_position_allele |
A character vector with elements specifying the genomic coordinate of the base and the allele (obtained from get_raw_counts_allele). |
name_position |
A character vector with elements specifying the genomic coordinate of the base (obtained from get_raw_counts_allele). |
number_reads |
Integer specifying the minimum number of counts above which we consider the base covered by the sample. |
number_positions |
Integer specifying the minimum number of bases that must be covered by the sample (with counts>number_reads), in order to keep the sample for down-stream analysis. |
filtering |
Numeric value equal to 1 or 2. If 1 then only the bases that are covered by all the samples are kept for the downstream analysis. If 2 then all the bases that are covered by more than 50% of the the samples in each cluster (specified by my.clusters) are kept for the down-stream analysis. Default is 1. |
my.clusters |
Character vector specifying a partition of the samples. It is only used when filtering is equal to 2. Default is NULL |
Starting from raw counts allele matrix, the function performed two consequentially filtering steps. The first one is on the samples, keeping only the ones that cover a number of bases above number_positions. The second one is on the bases, defined by the parameter filtering. The heteroplasmy for each sample-base pair is computed as 1-max(f), where f are the frequencies of the four alleles.
It returns a list with 5 elements:
sum_matrix |
A matrix (n_row=number of sample, n_col=number of bases) with the counts for each sample/base, for all the initial samples and bases included in the raw counts allele matrix. |
sum_matrix_qc |
A matrix (n_row=number of sample, n_col=number of bases) with the counts for each sample/base, for all the samples and bases that pass the two consequentially filtering steps. |
heteroplasmy_matrix |
A matrix with the same dimension of sum_matrix_qc where each entry (i,j) is the heteroplasmy for sample i in base j. |
allele_matrix |
A matrix (n_row=number of sample, n_col=4*number of bases) with allele frequencies, for all the samples and bases that pass the two consequentially filtering steps. |
index |
Indices of the samples that cover a base, for all bases and samples that pass the two consequentially filtering steps (if filtering = 2); if all the samples cover all the bases (that is the case for filtering = 1), then index is NULL |
Gabriele Lubatti gabriele.lubatti@helmholtz-muenchen.de
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.