scan.h2lmm: Run a haplotype-based genome scan from probabilities stored...

Description Usage Arguments Examples

Description

This function primarily takes a formula, data frame, and genome cache to run a genome scan.

Usage

1
2
3
4
5
6
7
scan.h2lmm(genomecache, data, formula, K = NULL, model = c("additive",
  "full"), p.value.method = c("LRT", "ANOVA"), use.par = "h2",
  use.multi.impute = TRUE, num.imp = 11, chr = "all", brute = TRUE,
  use.fix.par = TRUE, seed = 1, pheno.id = "SUBJECT.NAME",
  geno.id = "SUBJECT.NAME", weights = NULL, do.augment = FALSE,
  use.full.null = FALSE, added.data.points = 1, just.these.loci = NULL,
  print.locus.fit = FALSE, debug.single.fit = FALSE, ...)

Arguments

genomecache

The path to the genome cache directory. The genome cache is a particularly structured directory that stores the haplotype probabilities/dosages at each locus. It has an additive model subdirectory and a full model subdirectory. Each contains subdirectories for each chromosome, which then store .RData files for the probabilities/dosages of each locus.

data

A data frame with outcome and potential covariates. Should also have IDs that link to IDs in the genome cache, often with the individual-level ID named "SUBJECT.NAME", though others can be specified with pheno.id.

formula

An lm style formula with functions of outcome and covariates contained in data frame.

K

DEFAULT: NULL. A positive semi-definite relationship matrix, usually a realized genetic relationship matrix (GRM) based on SNP genotypes or the founder haplotype probabilities. Colnames and rownames should match the SUBJECT.NAME column in the data frame. If no K matrix is specified, either lmer is used (if sparse random effects are included in the formula) or a fixed effect model (equivalent to lm).

model

DEFAULT: additive. Specifies how to model the founder haplotype probabilities. The additive options specifies use of haplotype dosages, and is most commonly used. The full option regresses the phenotype on the actual diplotype probabilities.

p.value.method

DEFAULT: "LRT". "LRT" specifies a likelihood ratio test, which is flexible to testing fixed effects in fixed and mixed effect models. "ANOVA" specifies an F-test, which is only valid in fixed effect models. ANOVA is more conservative in models with low sample sizes, where the asymptotic theory underlying the LRT does not hold.

use.par

DEFAULT: "h2". The parameterization of the likelihood to be used.

use.multi.impute

DEFAULT: TRUE. This option specifies whether to use ROP or multiple imputations.

num.imp

DEFAULT: 11. IF multiple imputations are used, this specifies the number of imputations to perform.

chr

DEFAULT: "all". Specifies which chromosomes to scan.

brute

DEFAULT: TRUE. During the optimization to find maximum likelihood parameters, this specifies checking the boundary of h2=0. Slightly less efficient, but otherwise the optimization procedure will not directly check the boundary.

use.fix.par

DEFAULT: TRUE. This specifies an approximate fitting of mixed effect model (Kang et al. 2009). Much more efficient, as the optimization of h2 only needs to be performed once for the null model rather than every locus. Technically less powerful, though in practice it has proven to be almost equal to the exact procedure.

seed

DEFAULT: 1. Multiple imputations involve a sampling process of the diplotypes, thus a seed is necessary to produce the same results over multiple runs and different machines.

pheno.id

DEFAULT: "SUBJECT.NAME". The is the individual-level ID that is associated with data points in the phenotype data. Generally this should be unique for each data point.

geno.id

DEFAULT: "SUBJECT.NAME". The default represents the situation that each genome is unique. Specifying some other column allows for replicate genomes, such as in the CC or CC-RIX.

weights

DEFAULT: NULL. If unspecified, individuals are equally weighted. This option allows for a weighted analysis when using the mean of multiple individuals with the same genome.

do.augment

DEFAULT: FALSE. Augments the data with null observations for genotype groups. This is an approximately Bayesian approach to applying a prior to the data, and can help control highly influential data points.

use.full.null

DEFAULT: FALSE. Draws augmented data points from the null model. This allows for the inclusion of null data points that do not influence the estimation of other model parameters as much.

added.data.points

DEFAULT: 1. If augment weights are being used, this specifies how many data points should be added in total.

just.these.loci

DEFAULT: NULL. Specifies a reduced set of loci to fit. If loci is just one locus, the alternative model fit will also be output as fit1.

print.locus.fit

DEFAULT: FALSE. If TRUE, prints out how many loci have been fit currently.

debug.single.fit

DEFAULT: FALSE. If TRUE, a browser() call is activated after the first locus is fit. This option allows developers to more easily debug while still using the actual R package.

Examples

1

gkeele/kqtl documentation built on May 17, 2019, 6:06 a.m.