get_heteroplasmy: get_heteroplasmy
In MitoHEAR: Quantification of Mitochondrial DNA Heteroplasmy

get_heteroplasmy

R Documentation

get_heteroplasmy

Description

It is one of the two main functions of the MitoHEAR package (together with get_raw_counts_allele). It computes the allele frequencies and the heteroplasmy matrix starting from the counts matrix obtained with get_raw_counts_allele.

Usage

get_heteroplasmy(
  raw_counts_allele,
  name_position_allele,
  name_position,
  number_reads,
  number_positions,
  filtering = 1,
  my.clusters = NULL
)

Arguments

`raw_counts_allele`	A raw counts matrix obtained from get_raw_counts_allele.
`name_position_allele`	A character vector with elements specifying the genomic coordinate of the base and the allele (obtained from get_raw_counts_allele).
`name_position`	A character vector with elements specifying the genomic coordinate of the base (obtained from get_raw_counts_allele).
`number_reads`	Integer specifying the minimum number of counts above which we consider the base covered by the sample.
`number_positions`	Integer specifying the minimumnumber of bases that must be covered by the sample (with counts>number_reads), in order to keep the sample for down-stream analysis.
`filtering`	Numeric value equal to 1 or 2. If 1 then only the bases that are covered by all the samples are kept for the downstream analysis. If 2 then all the bases that are covered by more than 50% of the the samples in each cluster (specified by my.clusters) are kept for the down-stream analysis. Default is 1.
`my.clusters`	Character vector specifying a partition of the samples. It is only used when filtering is equal to 2. Default is NULL

Details

Starting from raw counts allele matrix, the function performed two consequentially filtering steps. The first one is on the samples, keeping only the ones that cover a number of bases above number_positions. The second one is on the bases, defined by the parameter filtering. The heteroplasmy for each sample-base pair is computed as 1-max(f), where f are the frequencies of the four alleles.

Value

It returns a list with 5 elements:

`sum_matrix`	A matrix (n_row=number of sample, n_col=number of bases) with the counts for each sample/base, for all the initial samples and bases included in the raw counts allele matrix.
`sum_matrix_qc`	A matrix (n_row=number of sample, n_col=number of bases) with the counts for each sample/base, for all the samples and bases that pass the two consequentially filtering steps.
`heteroplasmy_matrix`	A matrix with the same dimension of sum_matrix_qc where each entry (i,j) is the heteroplasmy for sample i in base j.
`allele_matrix`	A matrix (n_row=number of sample, n_col=4*number of bases) with allele frequencies, for all the samples and bases that pass the two consequentially filtering steps.
`index`	Indices of the samples that cover a base, for all bases and samples that pass the two consequentially filtering steps; if all the samples cover all the bases, then index is NULL

Author(s)

Gabriele Lubatti gabriele.lubatti@helmholtz-muenchen.de

Examples

# Two samples and two bases whose reference allele is A and C.
# The two samples have 100 reads in the reference allele and 0 in all the others.
sample1_A <- c(100, 0, 0, 0)
names_A <- rep("1_A", length(sample1_A))
sample1_C <- c(100, 0, 0, 0)
names_C <- rep("2_C", length(sample1_C))
allele <- c("A", "C", "T", "G")
names_A_allele <- paste(names_A, allele, sep = " ")
names_C_allele <- paste(names_C, allele, sep = " ")
sample1 <- c(sample1_A, sample1_C)
sample2_A <- c(100, 0, 0, 0)
sample2_C <- c(100, 0, 0, 0)
sample2 <- c(sample2_A, sample2_C)
test_allele <- matrix(c(sample1, sample2), byrow = TRUE, ncol = 8, nrow = 2)
colnames(test_allele) <- c(names_A_allele, names_C_allele)
row.names(test_allele) <- c("sample1", "sample2")
name_position_allele_test <- c(names_A_allele, names_C_allele)
name_position_test <- c(names_A, names_C)
test <- get_heteroplasmy(test_allele, name_position_allele_test, name_position_test, 50, 1, 1)

MitoHEAR documentation built on March 18, 2022, 6:47 p.m.