SNP.rm.duplicates: Remove duplicated SNPs

View source: R/snp_rm_duplicates.r

SNP.rm.duplicatesR Documentation

Remove duplicated SNPs

Description

Remove duplicated SNPs, taking into account possible genotype mismatches

Usage

 SNP.rm.duplicates(x, by = "chr:pos", na.keep = TRUE, incomp.rm = TRUE) 

Arguments

x

A bed.matrix

by

The criterium used to determine duplicates

na.keep

If TRUE, duplicated genotypes which are missing for at least one SNP are set to NA.

incomp.rm

If TRUE, duplicated SNPs with allele incompatibility are removed.

Details

Positions of duplicated SNPs are determined using SNP.duplicated using parameter by (we recommend to use "chr:pos", the default).

Then the function considers the possibility of alleles swaps or reference strand flips. In case of allele incompatibility, the SNPs can be removed or not (according to incomp.rm parameter).

When alleles can be matched, only one of the two SNPs is conserved. If there are genotype incompatibilities between the duplicates for some individuals, these genotypes are set to NA. The parameter na.keep settles the case of genotypes missing in one of the SNPs.

Moreover the function takes special care of SNP with possible alleles "0". This case occurs for monomorphic SNPs, when data are read from a .ped file; for example, a whole column of A A's will result in a SNP with alleles "A" and "0". If there's a duplicate of the SNP with a few, says, A C's in it, it will have alleles "A" and "C". In that case, SNP.duplicated with by = "chr:pos:alleles" will not consider these SNPs as duplicates.

Value

A bed.matrix without duplicated SNPs.

See Also

SNP.match, SNP.duplicated, dupli

Examples

# Use example data of 10 individuals with 7 duplicated SNPs
data(dupli)
x <- as.bed.matrix(dupli.gen, fam = dupli.ped, bim = dupli.bim)

# There are any duplicated positions:
dupli.bim

x1 <- SNP.rm.duplicates(x)
# By default (na.keep = TRUE), as soon as the genotype is missing
# in one of the SNPs it is set to missing 
# (here looking at duplicated SNPs 2a and 2b)
as.matrix(x[,2:3])
as.matrix(x1[,2])

# With na.keep = FALSE 
x2 <- SNP.rm.duplicates(x, na.keep = FALSE)
as.matrix(x2[,2])

# Let's examinate SNP 3.a and 3.b (swapped alleles)
as.matrix(x[,4:5])
as.matrix(x1[,3])
as.matrix(x2[,3])

# and so on... (see also ?dupli)

gaston documentation built on Dec. 28, 2022, 1:30 a.m.