knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
suppressPackageStartupMessages({ library(comapr) library(SummarizedExperiment) })
In the previous vignette, we demonstrated how to calculate genetic distances using genotyped markers from a group of samples.
Genetic distance calculate from genotype shifts of markers
In this document, we will focus on building individualized genetic maps from
output files of sscocaller
which is available at
here.
The comapr
package includes a list of toy example output files from the sscocaller
command line tool. The follow code will get the file path that we will use later.
demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
We can see that we have two samples (individual donors) and each of them has haplotype states inferred for chr1 to chr5.
list.files(demo_path)
{sample}_chr1_vi.mtx, a sparse mtx with entries representing inferred viterbi state (haplotype state)
{sample}_chr1_snpAnnot.txt, the SNP positions and allele
comapr
provides quality-check functions for examining SNP coverage per chr and
per cell, chromosome segregation pattern checks, and summary statics for
filtering low confidence crossovers.
pcQC <- perCellChrQC(sampleName="s1",chroms=c("chr1"), path=paste0(demo_path,"/"), barcodeFile=NULL)
Input-parsing functions are implemented in comapr
to construct
RangedSummarizedExpriment
object that parses files generated
from sscocaller
and filter out low-confidence COs at the same time. For the
demo dataset, these filters do not have any effects:
minSNP = 0, minlogllRatio = 50, bpDist = 100, maxRawCO=10, minCellSNP = 1
RangedSummarizedExpriment
objectWe first construct RangedSummarizedExpriment
object from parsing output files
from sscocaller
and filter out low-confidence crossovers.
s1_rse_state <- readHapState("s1",chroms=c("chr1","chr2"), path=demo_path,barcodeFile=NULL, minSNP = 0, minlogllRatio = 50, bpDist = 100, maxRawCO=10, minCellSNP = 1) s2_rse_state <- readHapState("s2",chroms=c("chr1","chr2"), path=demo_path, barcodeFile=NULL, minSNP = 0, minlogllRatio = 50, bpDist = 100, maxRawCO=10, minCellSNP = 1)
s1_rse_state
The Viterbi states for SNP markers are stored in the "assay" slot:
assay(s1_rse_state)
The rowRanges
contains the SNP's positions:
rowRanges(s1_rse_state)
We have read in the Viterbi states for cells from two samples: s1 and s2. We now combine them into one data object for downstream analysis.
colData(s1_rse_state) colData(s1_rse_state)$sampleGroup <- "s1" colData(s2_rse_state)$sampleGroup <- "s2"
We now call combineHapState
function to combine sample s1 and sample s2
into one RangedSummarizedExperiment
object.
twoSample <- combineHapState(s1_rse_state, s2_rse_state)
Now the assay
data slot contains the Viterbi states across SNP positions for
the combined samples.
twoSample <- combineHapState(s1_rse_state,s2_rse_state)
Now the twoSample
object contains the cells from both samples with assay
slot
contains the Viterbi states and rowRanges
contains position annotaitons for
the list SNP markers.
assay(twoSample)
The countCOs
function can then find out the crossover intervals according to
the Viterbi states for each cell and the number of crossovers per cell is then
calculated.
countCOs
function will find the crossover intervals for each samples and
attribute the observed crossovers from each sample to the corresponding intervals.
cocounts <- countCOs(twoSample)
The rowRanges
from the resulting data object now denotes the crossover interval
and the assay
slot now contains the number of crossovers in each cell.
Now rowRanges
contains the intervals
rowRanges(cocounts)
The colData slot still contains the annotations of each cell.
colData(cocounts)
The genetic distances can be calculated by using mapping functions such as the
Kosambi or Haldane \cite{} and assay
slot contains the number of crossovers
found in each sample across these intervals.
assay(cocounts)
calGeneticDist
function will convert the raw crossover frequencies to genetic
distances via selected mapping function (ie. kosambi or haldane).
dist_gr <- calGeneticDist(co_count = cocounts, mapping_fun = "k") dist_gr
The genetic distances for each interval are stored in rowData
.
rowData(dist_gr)
The above genetic distances have been calculated using all samples. We can also specify the group factor so that we can calculate genetic distances for different sample groups:
## sampleGroup is a column in the colData slot dist_gr <- calGeneticDist(co_count = cocounts, group_by = "sampleGroup", mapping_fun = "k")
Now the group/Sample specific distances are stored in rowData
rowData(dist_gr)$kosambi
We construct a GRange
object from the dist_gr
first.
p_gr <- granges(dist_gr) mcols(p_gr) <- rowData(dist_gr)$kosambi
We can plot whole-genome genetic distances
plotWholeGenome(p_gr)
We can also do per chromosome plot
plotGeneticDist(p_gr,chr = "chr1",cumulative = TRUE)
bootstrapDist
function generates the sampling distribution of the difference
in total genetic distances for the two groups under comparisons.
set.seed(100) bootsDiff <- bootstrapDist(co_gr = cocounts,B=10, group_by = "sampleGroup")
hist(bootsDiff)
From the bootsDiff
data object, we can find a 95% confidence interval to test
whether the difference is statistically significant.
quantile(bootsDiff,c(0.025,0.975),na.rm =TRUE)
An alternative re-sampling method, permuteDist
, can be applied to generate a
null distribution for the group difference by reshuffling the group labels across
the samples.
set.seed(100) perms <- permuteDist(cocounts,B=1000,group_by = "sampleGroup")
A p-value can be calculated using the statmod::permp
function [@Phipson2010-xi].
statmod::permp(x = sum(perms$permutes>=perms$observed_diff, na.rm = TRUE), nperm = sum(!is.na(perms$permutes)), n1 = perms$nSample[1], n2 = perms$nSample[2])
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.