View source: R/get_heteroplasmy.R
get_heteroplasmy | R Documentation |
It is one of the two main functions of the MitoHEAR package (together with get_raw_counts_allele). It computes the allele frequencies and the heteroplasmy matrix starting from the counts matrix obtained with get_raw_counts_allele.
get_heteroplasmy( raw_counts_allele, name_position_allele, name_position, number_reads, number_positions, filtering = 1, my.clusters = NULL )
raw_counts_allele |
A raw counts matrix obtained from get_raw_counts_allele. |
name_position_allele |
A character vector with elements specifying the genomic coordinate of the base and the allele (obtained from get_raw_counts_allele). |
name_position |
A character vector with elements specifying the genomic coordinate of the base (obtained from get_raw_counts_allele). |
number_reads |
Integer specifying the minimum number of counts above which we consider the base covered by the sample. |
number_positions |
Integer specifying the minimumnumber of bases that must be covered by the sample (with counts>number_reads), in order to keep the sample for down-stream analysis. |
filtering |
Numeric value equal to 1 or 2. If 1 then only the bases that are covered by all the samples are kept for the downstream analysis. If 2 then all the bases that are covered by more than 50% of the the samples in each cluster (specified by my.clusters) are kept for the down-stream analysis. Default is 1. |
my.clusters |
Character vector specifying a partition of the samples. It is only used when filtering is equal to 2. Default is NULL |
Starting from raw counts allele matrix, the function performed two consequentially filtering steps. The first one is on the samples, keeping only the ones that cover a number of bases above number_positions. The second one is on the bases, defined by the parameter filtering. The heteroplasmy for each sample-base pair is computed as 1-max(f), where f are the frequencies of the four alleles.
It returns a list with 5 elements:
sum_matrix |
A matrix (n_row=number of sample, n_col=number of bases) with the counts for each sample/base, for all the initial samples and bases included in the raw counts allele matrix. |
sum_matrix_qc |
A matrix (n_row=number of sample, n_col=number of bases) with the counts for each sample/base, for all the samples and bases that pass the two consequentially filtering steps. |
heteroplasmy_matrix |
A matrix with the same dimension of sum_matrix_qc where each entry (i,j) is the heteroplasmy for sample i in base j. |
allele_matrix |
A matrix (n_row=number of sample, n_col=4*number of bases) with allele frequencies, for all the samples and bases that pass the two consequentially filtering steps. |
index |
Indices of the samples that cover a base, for all bases and samples that pass the two consequentially filtering steps; if all the samples cover all the bases, then index is NULL |
Gabriele Lubatti gabriele.lubatti@helmholtz-muenchen.de
# Two samples and two bases whose reference allele is A and C. # The two samples have 100 reads in the reference allele and 0 in all the others. sample1_A <- c(100, 0, 0, 0) names_A <- rep("1_A", length(sample1_A)) sample1_C <- c(100, 0, 0, 0) names_C <- rep("2_C", length(sample1_C)) allele <- c("A", "C", "T", "G") names_A_allele <- paste(names_A, allele, sep = " ") names_C_allele <- paste(names_C, allele, sep = " ") sample1 <- c(sample1_A, sample1_C) sample2_A <- c(100, 0, 0, 0) sample2_C <- c(100, 0, 0, 0) sample2 <- c(sample2_A, sample2_C) test_allele <- matrix(c(sample1, sample2), byrow = TRUE, ncol = 8, nrow = 2) colnames(test_allele) <- c(names_A_allele, names_C_allele) row.names(test_allele) <- c("sample1", "sample2") name_position_allele_test <- c(names_A_allele, names_C_allele) name_position_test <- c(names_A, names_C) test <- get_heteroplasmy(test_allele, name_position_allele_test, name_position_test, 50, 1, 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.