IBDCheck: Sample relationship check with SeqSQC object input file.

View source: R/IBDCheck.R

IBDCheckR Documentation

Sample relationship check with SeqSQC object input file.

Description

Function to calculate the IBD coefficients for all sample pairs and to predict related sample pairs in study cohort.

Usage

IBDCheck(
  seqfile,
  remove.samples = NULL,
  LDprune = TRUE,
  kin.filter = TRUE,
  missing.rate = 0.1,
  ss.cutoff = 300,
  maf = 0.01,
  hwe = 1e-06,
  ...
)

Arguments

seqfile

SeqSQC object, which includes the merged gds file for study cohort and benchmark.

remove.samples

a vector of sample names for removal from IBD calculation. Could be problematic samples identified from previous QC steps, or user-defined samples.

LDprune

whether to use LD-pruned snp set. The default is TRUE.

kin.filter

whether to use "kinship coefficient >= 0.08" as the additional criteria for related samples. The default is TRUE.

missing.rate

to use the SNPs with "<= missing.rate" only; if NaN, no threshold. By default, we use missing.rate = 0.1 to filter out variants with missing rate greater than 10%.

ss.cutoff

the minimum sample size (300 by default) to apply the MAF filter. This sample size is the sum of study samples and the benchmark samples of the same population as the study cohort.

maf

to use the SNPs with ">= maf" if sample size defined in above argument is greater than ss.cutoff; otherwise NaN is used by default for no MAF threshold.

hwe

to use the SNPs with Hardy-Weinberg equilibrium p >= hwe if sample size defined in above argument is greater than ss.cutoff; otherwise no hwe threshold. The default is 1e-6.

...

Arguments to be passed to other methods.

Details

Using LD-pruned variants (by default), we calculate the IBD coefficients for all sample pairs, and then predict related sample pairs in study cohort using the support vector machine (SVM) method with linear kernel and the known relatedness embedded in benchmark data as training set.
Sample pairs with discordant self-reported and predicted relationship are considered as problematic. All predicted related pairs are also required to have coefficient of kinship >= 0.08 by default. The sample with higher missing rate in each related pair is selected for removal from further analysis by function of IBDRemove.

Value

a data frame with sample names, the descent coefficients of k0, k1 and kinship, self-reported relationship and predicted relationship for each pair of samples.

Author(s)

Qian Liu qliu7@buffalo.edu

Examples

load(system.file("extdata", "example.seqfile.Rdata", package="SeqSQC"))
gfile <- system.file("extdata", "example.gds", package="SeqSQC")
seqfile <- SeqSQC(gdsfile = gfile, QCresult = QCresult(seqfile))
seqfile <- IBDCheck(seqfile, remove.samples=NULL, LDprune=TRUE, missing.rate=0.1)
res.ibd <- QCresult(seqfile)$IBD
tail(res.ibd)

Liubuntu/SeqSQC documentation built on April 12, 2024, 6:39 p.m.