glmm.score: Performing GLMM based score tests
In hanchenphd/GMMAT: Generalized Linear Mixed Model Association Tests

View source: R/glmm.score.R

glmm.score

R Documentation

Performing GLMM based score tests

Description

Use a glmmkin class object from the null GLMM to perform score tests for association with genotypes in a plink .bed file (binary genotypes), a GDS file .gds, or a plain text file (or compressed .gz or .bz2 file).

Usage

glmm.score(obj, infile, outfile, BGEN.samplefile = NULL, center = T, select = NULL, 
	MAF.range = c(1e-7, 0.5), miss.cutoff = 1, 
	missing.method = "impute2mean", nperbatch = 100, tol = 1e-5,
	infile.nrow = NULL, infile.nrow.skip = 0, infile.sep = "\t",
	infile.na = "NA", infile.ncol.skip = 1, infile.ncol.print = 1,
	infile.header.print = "SNP", is.dosage = FALSE, ncores = 1, verbose = FALSE)

Arguments

`obj`	a class glmmkin or class glmmkin.multi object, returned by fitting the null GLMM using `glmmkin`.
`infile`	the input file name or an object of class SeqVarGDSClass. Note that for plink binary genotype files only the prefix without .bed, .bim or .fam should be used. Only SNP major mode recognized in the binary file. Alternatively, it can be the full name of a BGEN file (including the suffix .bgen), a GDS file (including the suffix .gds), or a plain text file with some delimiters (comma, space, tab or something else), with one row for each SNP and one column for each individual. In that case, SNPs should be coded as numeric values (0/1/2 or dosages allowed, A/C/G/T coding is not recognized). There can be additional rows and columns to skip at the beginning. The order of individuals can be different from `obj` in the null GLMM (see the argument `select`). Some compressed files (.gz and .bz2) also allowed. If `infile` is an object of class SeqVarGDSClass, the .gds file will be closed upon successful completion of the function.
`outfile`	the output file name.
`BGEN.samplefile`	path to the BGEN sample file. Required when the BGEN file does not contain sample identifiers or the `select` parameter is NULL (default = NULL).
`center`	a logical switch for centering genotypes before tests. If TRUE, genotypes will be centered to have mean 0 before tests, otherwise raw values will be directly used in tests (default = TRUE).
`select`	an optional vector indicating the order of individuals in `infile`. If supplied, the length must match the number of individuals in `infile` (default = NULL). Individuals to be excluded should be coded 0. For example, `select = c(2, 3, 1, 0)` means the 1st individual in `infile` corresponds to the 2nd individual in `obj`, the 2nd individual in `infile` corresponds to the 3rd individual in `obj`, the 3rd individual in `infile` corresponds to the 1st individual in `obj`, the 4th individual in `infile` is not included in `obj`. If there are any duplicated `id_include` in `obj` (longitudinal data analysis), indices in `select` should match the order of individuals with unique `id_include` in `obj`. For plink binary genotype files and GDS files, this argument is not required and the sample ID's are automatically matched.
`MAF.range`	a numeric vector of length 2 defining the minimum and maximum minor allele frequencies of variants that should be included in the analysis (default = c(1e-7, 0.5)).
`miss.cutoff`	the maximum missing rate allowed for a variant to be included (default = 1, including all variants).
`missing.method`	method of handling missing genotypes. Either "impute2mean" or "omit" (default = "impute2mean").
`nperbatch`	an integer for how many SNPs should be tested in a batch (default = 100). The computational time can increase dramatically if this value is either small or large. The optimal value for best performance depends on the user's system.
`tol`	the threshold for determining monomorphism. If a SNP has value range less than the tolerance, it will be considered monomorphic and its association test p-value will be NA (default = 1e-5). Only used when `infile` is a plain text file (or compressed .gz or .bz2 file).
`infile.nrow`	number of rows to read in `infile`, including number of rows to skip at the beginning. If NULL, the program will determine how many rows there are in `infile` automatically and read all rows (default = NULL). Only used when `infile` is a plain text file (or compressed .gz or .bz2 file).
`infile.nrow.skip`	number of rows to skip at the beginning of `infile`. Must be nonnegative integers. Useful when header or comment lines are present (default = 0). Only used when `infile` is a plain text file (or compressed .gz or .bz2 file).
`infile.sep`	delimiter in `infile` (default = "\t"). Only used when `infile` is a plain text file (or compressed .gz or .bz2 file).
`infile.na`	symbol in `infile` to denote missing genotypes (default = "NA"). Only used when `infile` is a plain text file (or compressed .gz or .bz2 file).
`infile.ncol.skip`	number of columns to skip before genotype data in `infile`. These columns can be SNP name, alleles and/or quality measures and should be placed at the beginning in each line. After skipping these columns, the program will read in genotype data and perform score tests. Must be nonnegative integers. It is recommended that SNP name should be included as the first column in `infile` and genotype data should start from the second column or later (default = 1). Only used when `infile` is a plain text file (or compressed .gz or .bz2 file).
`infile.ncol.print`	a vector indicating which column(s) in `infile` should be printed to the output directly. These columns can be SNP name, alleles and/or quality measures placed at the beginning in each line. Must be nonnegative integers, no greater than `infile.ncol.skip` and sorted numerically in ascending order. By default, it is assumed that the first column is SNP name and genotype data start from the second column, and SNP name should be carried over to the output (default = 1). Should be set to NULL if `infile.ncol.skip` is 0. Only used when `infile` is a plain text file (or compressed .gz or .bz2 file).
`infile.header.print`	a character vector indicating column name(s) of column(s) selected to print by `infile.ncol.print` (default = "SNP"). Should be set to NULL if `infile.ncol.skip` is 0. Only used when `infile` is a plain text file (or compressed .gz or .bz2 file).
`is.dosage`	a logical switch for whether imputed dosage should be used from a GDS `infile` (default = FALSE).
`ncores`	a positive integer indicating the number of cores to be used in parallel computing (default = 1).
`verbose`	a logical switch for whether a progress bar should be shown for a GDS `infile` (default = FALSE).

Value

NULL if infile is a BGEN file (.bgen) or a GDS file (.gds), otherwise computational time in seconds, excluding I/O time.

Author(s)

Han Chen, Duy T. Pham

References

Chen, H., Wang, C., Conomos, M.P., Stilp, A.M., Li, Z., Sofer, T., Szpiro, A.A., Chen, W., Brehm, J.M., Celedón, J.C., Redline, S., Papanicolaou, G.J., Thornton, T.A., Laurie, C.C., Rice, K. and Lin, X. (2016) Control forpopulation structure and relatedness for binary traits in genetic association studies via logistic mixed models. The American Journal of Human Genetics 98, 653-666.

Examples


data(example)
attach(example)
model0 <- glmmkin(disease ~ age + sex, data = pheno, kins = GRM, id = "id",
       family = binomial(link = "logit"))
plinkfiles <- strsplit(system.file("extdata", "geno.bed", package = "GMMAT"), 
       ".bed", fixed = TRUE)[[1]]
outfile.bed <- tempfile()
glmm.score(model0, infile = plinkfiles, outfile = outfile.bed)
if(requireNamespace("SeqArray", quietly = TRUE) && requireNamespace("SeqVarTools",
        quietly = TRUE)) {
	infile <- system.file("extdata", "geno.gds", package = "GMMAT")
	outfile.gds <- tempfile()
	glmm.score(model0, infile = infile, outfile = outfile.gds)
	unlink(outfile.gds)
}
infile <- system.file("extdata", "geno.txt", package = "GMMAT")
outfile.text <- tempfile()
glmm.score(model0, infile = infile, outfile = outfile.text, infile.nrow.skip = 5, 
	infile.ncol.skip = 3, infile.ncol.print = 1:3, 
	infile.header.print = c("SNP", "Allele1", "Allele2"))
infile <- system.file("extdata", "geno.bgen", package = "GMMAT")
samplefile <- system.file("extdata", "geno.sample", package = "GMMAT")
outfile.bgen <- tempfile()
glmm.score(model0, infile = infile, BGEN.samplefile = samplefile,
        outfile = outfile.bgen)
unlink(c(outfile.bed, outfile.text, outfile.bgen))

hanchenphd/GMMAT documentation built on Nov. 18, 2023, 11:58 p.m.