knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
suppressPackageStartupMessages({ library(comapr) library(SummarizedExperiment) })
In the previous vignette, we demonstrated how to calculate genetic distances using genotyped markers from a group of samples.
Genetic distance calculate from genotype shifts of markers
In this document, we will focus on building individualized genetic maps from
output files of sscocaller which is available at
here.
The comapr package includes a list of toy example output files from the sscocaller
command line tool. The follow code will get the file path that we will use later.
demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
We can see that we have two samples (individual donors) and each of them has haplotype states inferred for chr1 to chr5.
list.files(demo_path)
{sample}_chr1_vi.mtx, a sparse mtx with entries representing inferred viterbi state (haplotype state)
{sample}_chr1_snpAnnot.txt, the SNP positions and allele
comapr provides quality-check functions for examining SNP coverage per chr and
per cell, chromosome segregation pattern checks, and summary statics for
filtering low confidence crossovers.
pcQC <- perCellChrQC(sampleName="s1",chroms=c("chr1"), path=paste0(demo_path,"/"), barcodeFile=NULL)
Input-parsing functions are implemented in comapr to construct
RangedSummarizedExpriment object that parses files generated
from sscocaller and filter out low-confidence COs at the same time. For the
demo dataset, these filters do not have any effects:
minSNP = 0, minlogllRatio = 50, bpDist = 100, maxRawCO=10, minCellSNP = 1
RangedSummarizedExpriment objectWe first construct RangedSummarizedExpriment object from parsing output files
from sscocaller and filter out low-confidence crossovers.
s1_rse_state <- readHapState("s1",chroms=c("chr1","chr2"), path=demo_path,barcodeFile=NULL, minSNP = 0, minlogllRatio = 50, bpDist = 100, maxRawCO=10, minCellSNP = 1) s2_rse_state <- readHapState("s2",chroms=c("chr1","chr2"), path=demo_path, barcodeFile=NULL, minSNP = 0, minlogllRatio = 50, bpDist = 100, maxRawCO=10, minCellSNP = 1)
s1_rse_state
The Viterbi states for SNP markers are stored in the "assay" slot:
assay(s1_rse_state)
The rowRanges contains the SNP's positions:
rowRanges(s1_rse_state)
We have read in the Viterbi states for cells from two samples: s1 and s2. We now combine them into one data object for downstream analysis.
colData(s1_rse_state) colData(s1_rse_state)$sampleGroup <- "s1" colData(s2_rse_state)$sampleGroup <- "s2"
We now call combineHapState function to combine sample s1 and sample s2
into one RangedSummarizedExperiment object.
twoSample <- combineHapState(s1_rse_state, s2_rse_state)
Now the assay data slot contains the Viterbi states across SNP positions for
the combined samples.
twoSample <- combineHapState(s1_rse_state,s2_rse_state)
Now the twoSample object contains the cells from both samples with assay slot
contains the Viterbi states and rowRanges contains position annotaitons for
the list SNP markers.
assay(twoSample)
The countCOs function can then find out the crossover intervals according to
the Viterbi states for each cell and the number of crossovers per cell is then
calculated.
countCOs function will find the crossover intervals for each samples and
attribute the observed crossovers from each sample to the corresponding intervals.
cocounts <- countCOs(twoSample)
The rowRanges from the resulting data object now denotes the crossover interval
and the assay slot now contains the number of crossovers in each cell.
Now rowRanges contains the intervals
rowRanges(cocounts)
The colData slot still contains the annotations of each cell.
colData(cocounts)
The genetic distances can be calculated by using mapping functions such as the
Kosambi or Haldane \cite{} and assay slot contains the number of crossovers
found in each sample across these intervals.
assay(cocounts)
calGeneticDist function will convert the raw crossover frequencies to genetic
distances via selected mapping function (ie. kosambi or haldane).
dist_gr <- calGeneticDist(co_count = cocounts, mapping_fun = "k") dist_gr
The genetic distances for each interval are stored in rowData.
rowData(dist_gr)
The above genetic distances have been calculated using all samples. We can also specify the group factor so that we can calculate genetic distances for different sample groups:
## sampleGroup is a column in the colData slot dist_gr <- calGeneticDist(co_count = cocounts, group_by = "sampleGroup", mapping_fun = "k")
Now the group/Sample specific distances are stored in rowData
rowData(dist_gr)$kosambi
We construct a GRange object from the dist_gr first.
p_gr <- granges(dist_gr) mcols(p_gr) <- rowData(dist_gr)$kosambi
We can plot whole-genome genetic distances
plotWholeGenome(p_gr)
We can also do per chromosome plot
plotGeneticDist(p_gr,chr = "chr1",cumulative = TRUE)
bootstrapDist function generates the sampling distribution of the difference
in total genetic distances for the two groups under comparisons.
set.seed(100) bootsDiff <- bootstrapDist(co_gr = cocounts,B=10, group_by = "sampleGroup")
hist(bootsDiff)
From the bootsDiff data object, we can find a 95% confidence interval to test
whether the difference is statistically significant.
quantile(bootsDiff,c(0.025,0.975),na.rm =TRUE)
An alternative re-sampling method, permuteDist, can be applied to generate a
null distribution for the group difference by reshuffling the group labels across
the samples.
set.seed(100) perms <- permuteDist(cocounts,B=1000,group_by = "sampleGroup")
A p-value can be calculated using the statmod::permp function [@Phipson2010-xi].
statmod::permp(x = sum(perms$permutes>=perms$observed_diff, na.rm = TRUE), nperm = sum(!is.na(perms$permutes)), n1 = perms$nSample[1], n2 = perms$nSample[2])
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.