plinkUtils: Utilities to create and check PLINK files

plinkUtilsR Documentation

Utilities to create and check PLINK files

Description

plinkWrite creates ped and map format files (used by PLINK) from a GenotypeData object. plinkCheck checks whether a set of ped and map files has identical data to a GenotypeData object.

Usage

plinkWrite(genoData, pedFile="testPlink", family.col="family",
  individual.col="scanID", father.col="father", mother.col="mother",
  phenotype.col=NULL,
  rs.col="rsID", mapdist.col=NULL, scan.exclude=NULL,
  scan.chromosome.filter=NULL, blockSize=100, verbose=TRUE)

plinkCheck(genoData, pedFile, logFile="plinkCheck.txt", family.col="family",
  individual.col="scanID", father.col="father", mother.col="mother",
  phenotype.col=NULL,
  rs.col="rsID", map.alt=NULL, check.parents=TRUE, check.sex=TRUE,
  scan.exclude=NULL, scan.chromosome.filter=NULL, verbose=TRUE)

Arguments

genoData

A GenotypeData object with scan and SNP annotation.

pedFile

prefix for PLINK files (pedFile.ped, pedFile.map)

logFile

Name of the output file to log the results of plinkCheck

family.col

name of the column in the scan annotation that contains family ID of the sample

individual.col

name of the column in the scan annotation that contains individual ID of the sample

father.col

name of the column in the scan annotation that contains father ID of the sample

mother.col

name of the column in the scan annotation that contains mother ID of the sample

phenotype.col

name of the column in the scan annotation that contains phenotype variable (e.g. case control statue) of the sample

rs.col

name of the column in the SNP annotation that contains rs ID (or some other ID) for the SNP

mapdist.col

name of the column in the SNP annotation that contains genetic distance in Morgans for the SNP

map.alt

data frame with alternate SNP mapping for genoData to PLINK. If not NULL, this annotation will be used to compare SNP information to the PLINK file, rather than the default conversion from the SNP annotation embedded in genoData. Columns should include "snpID", "rsID", "chromosome", "position".

check.parents

logical for whether to check the father and mother columns

check.sex

logical for whether to check the sex column

scan.exclude

vector of scanIDs to exclude from PLINK file

scan.chromosome.filter

a logical matrix that can be used to zero out (set to missing) some chromosomes, some scans, or some specific scan-chromosome pairs. Entries should be TRUE if that scan-chromosome pair should have data in the PLINK file, FALSE if not. The number of rows must be equal to the number of scans in genoData. The column labels must be in the set ("1":"22", "X", "XY", "Y", "M", "U").

blockSize

Number of samples to read from genoData at a time

verbose

logical for whether to show progress information.

Details

If "alleleA" and "alleleB" columns are not found in the SNP annotation of genoData, genotypes are written as "A A", "A B", "B B" (or "0 0" for missing data).

If phenotype.col=NULL, plinkWrite will use "-9" for writing phenotype data and plinkCheck will omit checking this column.

If mapdist.col=NULL, plinkWrite will use "0" for writing this column in the map file and plinkCheck will omit checking this column.

plinkCheck first reads the map file and checks for SNP mismatches (chromosome, rsID, and/or position). Any mismatches are written to logFile. plinkCheck then reads the ped file line by line, recording all mismatches in logFile. SNPs and sample order is not required to be the same as in genoData. In the case of genotype mismatches, for each sample the log file output gives the position of the first mismatched SNP in the PLINK file, as well as the genotypes of the first six mismatched SNPs (which may not be consecutive).

These utilities convert between chromosome coding in GenotypeData, which by default is 24=XY, 25=Y, and PLINK chromosome coding, which is 24=Y, 25=XY.

Larger blockSize will improve speed but will require more RAM.

Value

plinkCheck returns TRUE if the PLINK files contain identical data to genoData, and FALSE if a mismatch is encountered.

Author(s)

Stephanie Gogarten, Tushar Bhangale

References

Please see http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped for more information on the ped and map files.

See Also

snpgdsBED2GDS

Examples

library(GWASdata)
ncfile <- system.file("extdata", "illumina_geno.nc", package="GWASdata")
data(illuminaSnpADF, illuminaScanADF)
genoData <- GenotypeData(NcdfGenotypeReader(ncfile),
  scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

pedfile <- tempfile()
plinkWrite(genoData, pedfile)
logfile <- tempfile()
plinkCheck(genoData, pedfile, logfile)

# exclude samples
plinkWrite(genoData, pedfile, scan.exclude=c(281, 283),
  blockSize=10)
plinkCheck(genoData, pedfile, logfile)
readLines(logfile)
#samples not found in Ped:
#281
#283

close(genoData)
unlink(c(logfile, paste(pedfile, "*", sep=".")))

smgogarten/GWASTools documentation built on May 18, 2024, 1:19 a.m.