updateGenoInfo: Update genotype information

View source: R/genotypeInfoUpdate.R

updateGenoInfoR Documentation

Update genotype information

Description

Update genotype information of the original PLINK binary files involving subject metadata information remapping and SNP information rearrangement and conversion according to the annotation file.

Usage

updateGenoInfo(
  plink,
  inputPrefix,
  metaDataFile,
  removedSampIDFile,
  ancestrySymbol,
  excludedProbeIdsFile,
  chipAnnoFile,
  chipType,
  outputPrefix,
  keepInterFile = TRUE
)

Arguments

plink

an executable program in either the current working directory or somewhere in the command path.

inputPrefix

the prefix of the input PLINK binary files.

metaDataFile

a pure text file that stores the meta information of the samples. This file must contain at least the following content (column names are in parentheses): family ID in the PLINK files (FID), individual ID in the PLINK files (IID), ID in the description files (descID), self identified ancestry (ance; e.g. AFR: African, AMR: Ad Mixed American, EAS: East Asian, EUR: European, SAS: South Asian), sex (sex; 1 = male, 2 = female), age (age), group (group; 0 = control/unaffected, 1 = case/affected). All unknown and missing values are represented by the value NA. Lines with a missing value for FID or IID are not contained.

removedSampIDFile

a pure text file that stores the useless sample IDs, each ID per line. If it is null, then duplicate the input PLINK files from the last step as the output files.

ancestrySymbol

an indicator that shows the symbol of genetic ancestry. If it is null, then all samples are selected.

excludedProbeIdsFile

a pure text file that stores the SNP IDs, one per line, which need to be removed. If it is null, no SNPs are removed.

chipAnnoFile

a pure text file that stores the chip annotation information.

chipType

a string name defines the type of the chip annotation file: 'SNPIDstudy', and 'rsIDstudy'. The detail is described in prepareAnnoFile4affy.

outputPrefix

the prefix of the output PLINK binary files.

keepInterFile

a logical value indicating if the intermediate processed files should be kept or not. The default is TRUE.

Details

The original PLINK files are implicitly processed by the following steps: 1.) remove duplicated subjects; 2.) update group ID and sex information; 3.) remove not labelled subjects; 4.) remove subjects with wrong ancestry; 5.) remove incorrectly annotated SNPs; 6.) remove SNPs that are not in the annotation file; 7.) remove duplicated SNPs; 8.) update SNP genomic position and strand information; 9.) split chromosome X into pseudoautosomal region (PAR) and non-PAR; 10.) remove SNPs on the chromosome Y and mitochondrial DNA. The metadata information file and the chip annotation file are used as the reference for the update. If the chip annotation file is not available for your study, it can be downloaded from http://www.well.ox.ac.uk/~wrayner/strand/.

Value

The output PLINK binary files after genotype information remapping.

Author(s)

Junfang Chen

References

Purcell, Shaun, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81.3 (2007): 559-575.

Examples

## In the current working directory
bedFile <- system.file("extdata", "controlData.bed", package="Gimpute")
bimFile <- system.file("extdata", "controlData.bim", package="Gimpute") 
famFile <- system.file("extdata", "controlData.fam", package="Gimpute")
system(paste0("scp ", bedFile, bimFile, famFile, " ."))  
inputPrefix <- "controlData" 
metaDataFile <- system.file("extdata", "1_01_metaData.txt",  
                            package="Gimpute")
excludedProbeIdsFile <- system.file("extdata", "excludedProbeIDs.txt", 
                                    package="Gimpute")
removedSampIDFile <- system.file("extdata", "excludedSampIDsV1.txt", 
                                 package="Gimpute")
chipAnnoFile <- system.file("extdata", "chipAnno.txt", package="Gimpute")
ancestrySymbol <- "EUR"
outputPrefix <- "1_11_removedYMtSnp" 
metaDataFile <- "1_01_metaData.txt"
chipType <- "rsIDstudy"
## Not run: Requires an executable program PLINK, e.g.
## plink <- "/home/tools/plink"  
## updateGenoInfo(plink, inputPrefix, metaDataFile, removedSampIDFile,
##                ancestrySymbol, excludedProbeIdsFile, chipAnnoFile,
##                chipType, outputPrefix, keepInterFile=TRUE)

transbioZI/Gimpute documentation built on April 10, 2022, 4:20 a.m.