mtSummary: mtSummary function

View source: R/mtSummary.R

mtSummaryR Documentation

mtSummary function

Description

Identification of mtDNA variations, output summary statistics, annotation of heteroplasmic and/or homoplasmic variations.

Usage

mtSummary(
  aaf,
  allele,
  freq,
  coverage,
  coverage.qc = 250,
  thre.lower = 0.03,
  thre.upper = 0.97,
  loci = c(1:.mtLength),
  type = "both",
  coverSummary = T,
  varHist = T,
  annot.select = c("Pos", "ref", "Gene", "TypeMutation", "MissensMutation",
    "CodonPosition", "ProteinDomain", "dbSNP_150_id", "PolyPhen2", "PolyPhen2_score",
    "SIFT", "SIFT_score", "CADD", "CADD_score", "CADD_phred_score"),
  path = "./",
  study = "Study",
  anno = T
)

Arguments

aaf

a numeric matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. It contains subject ID as the column names, and the AAFs of all 16569 mtDNA loci for each subject. It is generated from mtAAF function.

allele

a character matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. This matrix contains the alleles of each subject at each locus. The matrix must contain subject ID as the column names. "/" is used to delimited different allele calls in a locus.

freq

a character matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. This matrix contains the allele fractions of the corresponding allele matrix. The matrix must contain subject ID as the column names. "/" is used to delimited the allele fractions.

coverage

a numeric matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. This matrix contains the reads coverage of the 16569 mtDNA loci for each subject. The matrix must contain the subject ID as the column names.

coverage.qc

a number(default is 250) of threshold for the coverage. If the coverage<coverage.qc, the allele call at that locus of the subject will not be used.

thre.lower

a number(default is 0.03) of lower bound of the threshold defining heteroplasmic and homoplasmic variations

thre.upper

a number(default is 0.97) of upper bound of the threshold defining heteroplasmic and homoplasmic variations

loci

one of: 1. a vector(default is c(1:16569)) of mitochondrial DNA loci to specify which loci should be used to identify the variations and annotate, 2. a character string for the regions (e.g. "coding" , "tRNA", "RNR1" , "RNR2",...)

type

a character of indicator choosing to output annotation to all variations, heteroplasmic variations, or homoplasmic variations. “both” returns annotation to all variations (default), "heter" returns annotation to heteroplasmic variations and "homo" returns annotation to homoplasmic variations.

coverSummary

logical(default is True). A user can specify to output summary of mean coverage at each mtDNA loci (across all participants) and for each individual (across all mtDNA loci).

varHist

logical(default is True). A user can specify to output histograms to visualize the heteroplasmic and homoplasmic burden across participants and mtDNA loci.

annot.select

A character vector of variation position, alternative allele, corresponding gene and types of annotation scores to output based on user's choice. The available choices are "Pos", "ref", "Gene", "TypeMutation", "MissensMutation", "CodonPosition", "ProteinDomain", "mFOLD_dG", "mFOLD_Initial", "mFOLD_rCRS.DG", "mFOLD_rCRS.Initial", "mFOLD_AnticodonAminoAcidChange", "mFOLD_Location", "PolyPhen2", "PolyPhen2_score", "SIFT", "SIFT_score", "PROVEAN", "PROVEAN_score", "MutationAssessor", "MutationAssessor_score", "CADD", "CADD_score", "CADD_phred_score", "PANTHER", "PANTHER_score", "PhD_SNP", "PhD_SNP_score", "SNAP", "SNAP_score", "MutationTaster", "MutationTaster_score", "dbSNP_150_id", "APOGEE_boost_consensus", "APOGEE_boost_mean_prob", "APOGEE_boost_mean"

path

the path of the output annotation file. If not provided, the annotation file will output to the current working directory

study

A string of study names. Default is Study.

anno

A logical value (default is False) indicating whether output the annotation results in a .csv file

Value

1. Summary frequency: descriptive statistics of sequencing coverage across individual and across mtDNA loci; the total number of mtDNA loci with variations and the number of heteroplasmic/homoplasmic mtDNA loci in the study sample; the min/Q1/mean/median/Q3/largest number of homoplasmy/heteroplasmy carried by an individual. 2. Plots: a scatter plot of the median coverage across mtDNA loci; histograms to visualize the heteroplasmic and homoplasmic burden across participants and mtDNA loci based on user’s choice. 3. Summary annotation for all of the heteroplasmic and/or homoplasmic variations observed in the study data based on user's choice.

Examples


## Not run: 
## Read input data
allele_file <- "allele.csv"
freq_file   <- "freq.csv"

allele <- as.matrix( read.csv(file = allele_file, sep = ",") )
freq   <- as.matrix( read.csv(file = freq_file, sep = ",") )

aaf =  mtAAF ( allele, freq)

## Summary for the aaf object
mtSummary(aaf, allele, freq, coverage)

## End(Not run)

mtDNA-BU/mtdnaANNO documentation built on Aug. 11, 2022, 10:57 a.m.