generate.genotype: Import genotype data in the correct format for network...

Description Usage Arguments Value References Examples

View source: R/all_functions.R

Description

For network construction based on both genomic correlations as well as epistatic interactions a genotype matrix has to be created, consisting of one numeric value per SNP, per individual. This function takes Plink output (1,2-coding) to create the genotype matrix which can be used to calculate genomic correlations or epistatic interaction effects

Usage

1
2
generate.genotype(ped,tped,snp.id=NULL, pvalue=0.05,id.select=NULL,
gwas.p=NULL,major.freq=0.95,fast.read=T)

Arguments

ped

Input ped file as .ped file or data.frame. The ped file (.ped) is an input file from Plink: The PED file is a white-space (space or tab) delimited file: the first six columns are mandatory: Family ID, Idividual ID, Paternal ID, Maternal ID, Sex (1=male; 2=female;other=unknown) and Phenotype. The IDs are alphanumeric: the combination of family and individual ID should uniquely identify a person. A PED file must have 1 and only 1 phenotype in the sixth column. The phenotype can be either a quantitative trait or an affection status column: PLINK will automatically detect which type (i.e. based on whether a value other than 0, 1, 2 or the missing genotype code is observed). SNPs are 1,2-coded (1 for major allele,2 for minor allele) For more information: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped

tped

Input tped file as .tped file or data frame. The tped file (.tped) is a transposed ped file, from Plink. This file contains the SNP and genotype information where one row is a SNP. The first 4 columns of a TPED file are the same as a 4-column MAP file. Then all genotypes are listed for all individuals for each particular SNP on each line. Again, SNPs are 1,2-coded.

snp.id

Input SNP ids to use in analysis if not all snps are to be used

pvalue

A value for the cutoff of the SNPs which should be remained in the matrix, based on the pvalue resulting from the GWAS. Default value is 0.05

id.select

If requested, a subset of individuals can be selected (e.g. extremes). If nothing inserted, all individuals are in the output

gwas.p

A vector of the p-values corresponding to the input SNPs in the ped/tped file or gwas.id vector. If assigned, will select snps based on the pvalue parameter with a default value of 0.05.

major.freq

Maximum major allele frequency allowed in each variant. Default value is 0.95.

fast.read

If true will use fread from the data.table package to read the files. This is much faster than read.table, but requires consistent delimeters in the ped and tped file, and a maximum of approximately 950.000 colums in the ped file. This can be increased by changing the stack size (do this only if you know what you are doing)

Value

A genotype dataframe and the corresponding vector of passing snps in a vector. The genotype data frame has a row for each individual and a column for each SNP. SNPs are 1,1.5,2 coded: 1 for homozygous for the major allele, 1.5 for heterozygous, and 2 for homozygous for the minor allele. Missing values are NA coded.

References

Lisette J.A. Kogelman and Haja N.Kadarmideen (2014). Weighted Interaction SNP Hub (WISH) network method for building genetic networks for complex diseases and traits using whole genome genotype data. BMC Systems Biology 8(Suppl 2):S5. http://www.biomedcentral.com/1752-0509/8/S2/S5.

Examples

1

AQS-Group/WISH documentation built on July 17, 2020, 12:12 a.m.