removedMaleHetX: Remove male subjects with haploid heterozygous SNPs

View source: R/genotypeQC.R

removedMaleHetXR Documentation

Remove male subjects with haploid heterozygous SNPs

Description

Determine the frequency of male subjects that have heterozygous SNPs on chromosome X and a reasonable cutoff to remove those affect males, if chromosome X data exists.

Usage

removedMaleHetX(
  plink,
  inputPrefix,
  hhSubjCutOff = 15,
  outputPrefix,
  outputSubjHetFile,
  outputRetainSubjectFile,
  outputHetSNPfile
)

Arguments

plink

an executable program in either the current working directory or somewhere in the command path.

inputPrefix

the prefix of the input PLINK binary files.

hhSubjCutOff

the cutoff for removing male subjects with haploid heterozygous SNPs on the chromosome X. The default is 15.

outputPrefix

the prefix of the output PLINK binary files.

outputSubjHetFile

the output pure text file that stores male subjects that have heterozygous SNPs with their frequency (if any), i.e. the number of .hh SNPs in this male. Lines are sorted by descending number.

outputRetainSubjectFile

the output pure text file that stores male subjects that have heterozygous SNPs with their frequency after subject removal (if any). Lines are sorted by descending number.

outputHetSNPfile

the output pure text file that stores all heterozygous SNPs with their frequency (the number of males for this SNP) , if any. Lines are sorted by descending number.

Details

A haploid heterozygous is a male genotype that is heterozygous, which could be an error given the haploid nature of the male X chromosome. In principle, one has to remove all males that have heterozygous SNPs on the chromosome X. However, too many males might be removed in some data sets. Therefore a small percentage of such males in the data set is allowed.

Value

1.) The output PLINK binary files. 2.) A pure text file with two columns: heterozygous male subjects and their corresponding heterozygous SNPs. 3.) After subject removal, a pure text file consisting of two columns: heterozygous male subjects and their corresponding heterozygous SNPs. A pure text file with two columns: all heterozygous SNPs and their frequency.

Author(s)

Junfang Chen

Examples

## In the current working directory
bedFile <- system.file("extdata", "genoUpdatedData.bed", package="Gimpute")
bimFile <- system.file("extdata", "genoUpdatedData.bim", package="Gimpute") 
famFile <- system.file("extdata", "genoUpdatedData.fam", package="Gimpute")
system(paste0("scp ", bedFile, bimFile, famFile, " ."))  
inputPrefix <- "genoUpdatedData" 
hhSubjCutOff <- 15 ##  can be tuned
outputPrefix <- "2_02_removedInstHetX" 
outputSubjHetFile <- "2_02_instHetXfreqAll.txt" 
outputRetainSubjectFile <- "2_02_instHetXfreqRetained.txt"  
outputHetSNPfile <- "2_02_snpHHfreqAll.txt"
## Not run: Requires an executable program PLINK, e.g.
## plink <- "/home/tools/plink"
## removedMaleHetX(plink, inputPrefix, hhSubjCutOff,
##                 outputPrefix, outputSubjHetFile, 
##                 outputRetainSubjectFile, outputHetSNPfile)

transbioZI/Gimpute documentation built on April 10, 2022, 4:20 a.m.