geepack.lgst.batch: function to test genetic associations between a dichotomous...

Description Usage Arguments Details Value Author(s) Examples

Description

Fit logistic regression via Generalized Estimation Equation (GEE) to test associations between a dichotomous phenotype and all genotyped SNPs in a genotype file in family data with user specified genetic model. Each pedigree is treated as a cluster, with independence working correlation matrix used in the robust variance estimator. This function applies the same trait-SNP association test to all SNPs in the genotype data. The trait-SNP association test is carried out by geepack.lgst function where the the geese function from package geepack is used.

Usage

1
2
geepack.lgst.batch(genfile, phenfile, pedfile, outfile, phen, covars = NULL, 
model = "a", col.names = T, sep.ped = ",", sep.phe = ",", sep.gen = ",")

Arguments

genfile

a character string naming the genotype file for reading (see format requirement in details)

phenfile

a character string naming the phenotype file for reading (see format requirement in details)

pedfile

a character string naming the pedigree file for reading (see format requirement in details)

outfile

a character string naming the result file for writing

phen

a character string for a phenotype name in phenfile

covars

a character vector for covariates in phenfile

model

a single character of 'a','d','g', or 'r', with 'a'=additive, 'd'=dominant, 'g'=general and 'r'=recessive models

col.names

a logical value indicating whether the output file should contain column names

sep.ped

the field separator character for pedigree file

sep.phe

the field separator character for phenotype file

sep.gen

the field separator character for genotype file

Details

The geepack.lgst.batch function first reads in and merges phenotype-covariates, genotype and pedigree files, then tests the association of phen against all SNPs in genfile. genfile contains unique individual id and genotype data, with the column names being "id" and SNP names. For each genotyped SNP, the genotype data should be coded as 0, 1, 2 indicating the numbers of the coded alleles. The SNP names in genotype file should not have any dash, '-' and other special characters(dots and underscores are OK). phenfile contains unique individual id, phenotype and covariates data, with the column names being "id" and phenotype and covaraite names. pedfile contains pedigree informaion, with the column names being "famid","id","fa","mo","sex". In all files, missing value should be an empty space, except missing parental id in pedfile. Only phenotypes with two categories are analyzed. A phenotype should be coded as 0 and 1, with 1 denoting affected and 0 unaffected. SNPs with low genotype counts (especially minor allele homozygote) may be omitted or analyzed with dominant model or analyzed with logistic regression. The geepack.lgst.batch function fits GEE model using each pedigree as a cluster with geepack.lgst function from GWAF package and geese function from geepack package.

Value

No value is returned. Instead, results are written to outfile. When the genetic model is 'a', 'd' or 'r', the result includes the following columns. When the genetic model is 'g', beta and se are replaced with beta10, beta20, beta21, se10, se20, and se21 .

phen

phenotype name

snp

SNP name

n0

the number of individuals with 0 copy of coded alleles

n1

the number of individuals with 1 copy of coded alleles

n2

the number of individuals with 2 copies of coded alleles

nd0

the number of individuals with 0 copy of coded alleles in affected sample

nd1

the number of individuals with 1 copy of coded alleles in affected sample

nd2

the number of individuals with 2 copies of coded alleles in affected sample

miss.0

Genotype missing rate in unaffected sample

miss.1

Genotype missing rate in affected sample

miss.diff.p

P-value of differential missingness test between unaffected and affected samples

beta

regression coefficient of SNP covariate

se

standard error of beta

chisq

Chi-square statistic for testing beta not equal to zero

df

degree of freedom of the Chi-square statistic

model

model actually used in the analysis

remark

warning or additional information for the analysis, 'not converged' indicates the GEE analysis did not converge; 'logistic reg' indicates GEE model is replaced by logistic regression; 'exp count<5' indicates any expected count is less than 5 in phenotype-genotype table; 'not converged and exp count<5', 'logistic reg & exp count<5' are noted similarly; 'collinearity' indicates collinearity exists between SNP and some covariates

pval

p-value of the chi-square statistic

beta10

regression coefficient of genotype with 1 copy of coded allele vs. that with 0 copy

beta20

regression coefficient of genotype with 2 copy of coded allele vs. that with 0 copy

beta21

regression coefficient of genotype with 2 copy of coded allele vs. that with 1 copy

se10

standard error of beta10

se20

standard error of beta20

se21

standard error of beta21

Author(s)

Qiong Yang <qyang@bu.edu> and Ming-Huei Chen <mhchen@bu.edu>

Examples

1
2
3
4
5
## Not run: 
geepack.lgst.batch(phenfile="simphen.csv",genfile="simgen.csv",pedfile="simped.csv",
phen="SIMQT",model="a",outfile="simout.csv",sep.ped=",",sep.phe=",",sep.gen=",")

## End(Not run)

GWAF documentation built on May 2, 2019, 2:47 p.m.