snp.list: List to describe the genotype data

Description Format Details Examples


The list to describe the genotype data for GxE.scan.


The format is: List of 14


File to use. No default.


Genotype format (see details). Typical values are "bed", "tped", "impute", "ldat", "lbat", etc. The default is determined from the file extension.


List to describe the subject ids stored in a file. This list is only needed when the genotype file does not contain the subject ids (for example with PLINK files). The order of the subject ids is subject.list$file must match the order in the genotype data. See subject.list. The default is NULL.


Starting row of file to begin processing SNPs. The default is 1.


Last row of file to finish processing SNPs. Use any value < 1 so that all SNPs from rows start.vec to the end of the file will be analyzed. The default is -1.


The delimiter used in file. The default is determined from the file format.


Vector of values to denote the missing values in file. The default is determined from the file format.

Vector of codes used for the heterozygous genotype. If NULL, then it is assumed that the heterozygous genotype is of the form "AB", "Aa", "CT", ... etc, ie a 2-character string with different characters (case sensitive). The default is NULL.

Options only used with GxE.scan:


File, list or character vector to define which SNPs should be included in the analysis. If a file, then the file should contain a single column of SNP ids to include. More generally, if the SNPs to be included are in a file with multiple columns, then include.snps can be a list of type subject.list. If it is a character vector, then it should be a vector of SNP ids. This option can also be used with the options start.vec and stop.vec (see details). The default is NULL.


Command for running the PLINK software to transform certain file formats (see details). Set PLINK to "" if PLINK is not available or if you do not want PLINK to be used.
The PLINK software can be found at The default is "plink".


Command for running the GLU software to transform certain file formats (see details). Set GLU to "" if GLU is not available or if you do not want GLU to be used.
The GLU software can be found at The default is "glu".


In this list, file must be specified, and format should be specified. If not, then the program will attempt to guess the correct format of the genotype data from the file extension of file. If format is a format that GxE.scan is not set up to read directly (such as "bed", "lbat", "ped"), then either PLINK or GLU will be called to transform the data into either a "tped" or "ldat" format. When the option include.snps is specified as a file, then the options start.vec and stop.vec will be applied to the SNPs in this file. For example, suppose we have the genotype file snps.bed which is the PLINK "bed" format. We can set include.snps to the corresponding ".bim" file:
include.snps <- list(file="snps.bim", id.var=2, header=0, delimiter="\t"). Then the included SNPs in the analysis will be the SNPs in rows start.vec to stop.vec of file "snps.bim".

Other options such as delimiter and in.miss do not need to be specified, because they can be determined from the genotype data format. If the SNPs are coded in the standard (0,1,2) coding, then set to 1 (the heterozygous genotype).


# Example snp.list for a PLINK binary pedigree file when using GxE.scan
## Not run: 
pathToPLINK  <- "c:/PLINK/plink-1.07-dos/plink.exe"
snp.file     <- "c:/data/project1/lungCancer.bed"
subject.list <- "c:/data/project1/lungCancer.fam"
snp.list <- list(file=snp.file, format="bed", PLINK=pathToPLINK, 

## End(Not run)

# Suppose the genotype data is an output genotype file from th IMPUTE2 software
# The below list is for processing the file.
## Not run: 
snp.list <- list(file="C:/temp/data/chr11_1.imputed.txt.gz", delimiter=" ", format="impute")

## End(Not run)

CGEN documentation built on May 2, 2018, 2:42 a.m.