snpgdsIBS: Identity-By-State (IBS) proportion

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/IBS.R

Description

Calculate the fraction of identity by state for each pair of samples

Usage

1
2
3
snpgdsIBS(gdsobj, sample.id=NULL, snp.id=NULL, autosome.only=TRUE,
    remove.monosnp=TRUE, maf=NaN, missing.rate=NaN, num.thread=1L,
    useMatrix=FALSE, verbose=TRUE)

Arguments

gdsobj

an object of class SNPGDSFileClass, a SNP GDS file

sample.id

a vector of sample id specifying selected samples; if NULL, all samples are used

snp.id

a vector of snp id specifying selected SNPs; if NULL, all SNPs are used

autosome.only

if TRUE, use autosomal SNPs only; if it is a numeric or character value, keep SNPs according to the specified chromosome

remove.monosnp

if TRUE, remove monomorphic SNPs

maf

to use the SNPs with ">= maf" only; if NaN, no MAF threshold

missing.rate

to use the SNPs with "<= missing.rate" only; if NaN, no missing threshold

num.thread

the number of (CPU) cores used; if NA, detect the number of cores automatically

useMatrix

if TRUE, use Matrix::dspMatrix to store the output square matrix to save memory

verbose

if TRUE, show information

Details

The minor allele frequency and missing rate for each SNP passed in snp.id are calculated over all the samples in sample.id.

The values of the IBS matrix range from ZERO to ONE, and it is defined as the average of 1 - | g_{1,i} - g_{2,i} | / 2 across the genome for the first and second individuals and SNP i.

Value

Return a list (class "snpgdsIBSClass"):

sample.id

the sample ids used in the analysis

snp.id

the SNP ids used in the analysis

ibs

a matrix of IBS proportion, "# of samples" x "# of samples"

Author(s)

Xiuwen Zheng

See Also

snpgdsIBSNum

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

# perform identity-by-state calculations
ibs <- snpgdsIBS(genofile)

# perform multidimensional scaling analysis on
# the genome-wide IBS pairwise distances:
loc <- cmdscale(1 - ibs$ibs, k = 2)
x <- loc[, 1]; y <- loc[, 2]
race <- as.factor(read.gdsn(index.gdsn(genofile, "sample.annot/pop.group")))
plot(x, y, col=race, xlab = "", ylab = "", main = "cmdscale(IBS Distance)")
legend("topleft", legend=levels(race), text.col=1:nlevels(race))

# close the file
snpgdsClose(genofile)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Identity-By-State (IBS) analysis on genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
Working space: 279 samples, 8,722 SNPs
    using 1 (CPU) core
IBS:    the sum of all selected genotypes (0,1,2) = 2446510
Sun Jan 21 05:32:11 2018    (internal increment: 13056)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Sun Jan 21 05:32:11 2018    Done.

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.