mtSummary | R Documentation |
Identification of mtDNA variations, output summary statistics, annotation of heteroplasmic and/or homoplasmic variations.
mtSummary( aaf, allele, freq, coverage, coverage.qc = 250, thre.lower = 0.03, thre.upper = 0.97, loci = c(1:.mtLength), type = "both", coverSummary = T, varHist = T, annot.select = c("Pos", "ref", "Gene", "TypeMutation", "MissensMutation", "CodonPosition", "ProteinDomain", "dbSNP_150_id", "PolyPhen2", "PolyPhen2_score", "SIFT", "SIFT_score", "CADD", "CADD_score", "CADD_phred_score"), path = "./", study = "Study", anno = T )
aaf |
a numeric matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. It contains subject ID as the column names, and the AAFs of all 16569 mtDNA loci for each subject. It is generated from mtAAF function. |
allele |
a character matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. This matrix contains the alleles of each subject at each locus. The matrix must contain subject ID as the column names. "/" is used to delimited different allele calls in a locus. |
freq |
a character matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. This matrix contains the allele fractions of the corresponding allele matrix. The matrix must contain subject ID as the column names. "/" is used to delimited the allele fractions. |
coverage |
a numeric matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. This matrix contains the reads coverage of the 16569 mtDNA loci for each subject. The matrix must contain the subject ID as the column names. |
coverage.qc |
a number(default is 250) of threshold for the coverage. If the coverage<coverage.qc, the allele call at that locus of the subject will not be used. |
thre.lower |
a number(default is 0.03) of lower bound of the threshold defining heteroplasmic and homoplasmic variations |
thre.upper |
a number(default is 0.97) of upper bound of the threshold defining heteroplasmic and homoplasmic variations |
loci |
one of: 1. a vector(default is c(1:16569)) of mitochondrial DNA loci to specify which loci should be used to identify the variations and annotate, 2. a character string for the regions (e.g. "coding" , "tRNA", "RNR1" , "RNR2",...) |
type |
a character of indicator choosing to output annotation to all variations, heteroplasmic variations, or homoplasmic variations. “both” returns annotation to all variations (default), "heter" returns annotation to heteroplasmic variations and "homo" returns annotation to homoplasmic variations. |
coverSummary |
logical(default is True). A user can specify to output summary of mean coverage at each mtDNA loci (across all participants) and for each individual (across all mtDNA loci). |
varHist |
logical(default is True). A user can specify to output histograms to visualize the heteroplasmic and homoplasmic burden across participants and mtDNA loci. |
annot.select |
A character vector of variation position, alternative allele, corresponding gene and types of annotation scores to output based on user's choice. The available choices are "Pos", "ref", "Gene", "TypeMutation", "MissensMutation", "CodonPosition", "ProteinDomain", "mFOLD_dG", "mFOLD_Initial", "mFOLD_rCRS.DG", "mFOLD_rCRS.Initial", "mFOLD_AnticodonAminoAcidChange", "mFOLD_Location", "PolyPhen2", "PolyPhen2_score", "SIFT", "SIFT_score", "PROVEAN", "PROVEAN_score", "MutationAssessor", "MutationAssessor_score", "CADD", "CADD_score", "CADD_phred_score", "PANTHER", "PANTHER_score", "PhD_SNP", "PhD_SNP_score", "SNAP", "SNAP_score", "MutationTaster", "MutationTaster_score", "dbSNP_150_id", "APOGEE_boost_consensus", "APOGEE_boost_mean_prob", "APOGEE_boost_mean" |
path |
the path of the output annotation file. If not provided, the annotation file will output to the current working directory |
study |
A string of study names. Default is Study. |
anno |
A logical value (default is False) indicating whether output the annotation results in a .csv file |
1. Summary frequency: descriptive statistics of sequencing coverage across individual and across mtDNA loci; the total number of mtDNA loci with variations and the number of heteroplasmic/homoplasmic mtDNA loci in the study sample; the min/Q1/mean/median/Q3/largest number of homoplasmy/heteroplasmy carried by an individual. 2. Plots: a scatter plot of the median coverage across mtDNA loci; histograms to visualize the heteroplasmic and homoplasmic burden across participants and mtDNA loci based on user’s choice. 3. Summary annotation for all of the heteroplasmic and/or homoplasmic variations observed in the study data based on user's choice.
## Not run: ## Read input data allele_file <- "allele.csv" freq_file <- "freq.csv" allele <- as.matrix( read.csv(file = allele_file, sep = ",") ) freq <- as.matrix( read.csv(file = freq_file, sep = ",") ) aaf = mtAAF ( allele, freq) ## Summary for the aaf object mtSummary(aaf, allele, freq, coverage) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.