# Fit a Mixed Model Under the Null Hypothesis

### Description

`fitNullMM`

fits a mixed model with random effects specified by their covariance structures; this allows for the inclusion of a polygenic random effect using a kinship matrix or genetic relationship matrix (GRM). The output of `fitNullMM`

can be used to estimate genetic heritability and can be passed to `assocTestMM`

for the purpose of genetic association testing.

### Usage

1 2 3 |

### Arguments

`scanData` |
An object of class |

`outcome` |
A character string specifying the name of the outcome variable in |

`covars` |
A vector of character strings specifying the names of the fixed effect covariates in |

`covMatList` |
A list of matrices specifying the covariance structures of the random effects terms. The column and row names of each of these matrices must match the scanIDs from |

`scan.include` |
A vector of scanIDs for samples to include in the analysis. If NULL, all samples in |

`family` |
A description of the error distribution to be used in the model. The default "gaussian" fits a linear mixed model; see |

`group.var` |
This variable can only be used when |

`start` |
A vector of starting values for the variance component estimation procedure. The function will pick reasonable starting values when left NULL (default). See 'Details' for more information. |

`AIREML.tol` |
The convergence threshold for the Average Information REML (AIREML) procedure used to estimate the variance components of the random effects. See 'Details' for more information. |

`maxIter` |
The maximum number of iterations allowed in the AIREML procedure. |

`dropZeros` |
Logical indicator of whether variance component terms that converge to 0 should be removed from the model; the default is TRUE. See 'Details' for more information. |

`verbose` |
Logical indicator of whether updates from the function should be printed to the console; the default is TRUE. |

### Details

`covMatList`

is used to specify the covariance structures of the random effects terms in the model. For example, to include a polygenic random effect, one matrix in `covMatList`

could be a kinship matrix or a genetic relationship matrix (GRM). As another example, to include household membership as a random effect, one matrix in `covMatList`

should be a 0/1 matrix with a 1 in the [i,j] and [j,i] entries if individuals i and j are in the same household and 0 otherwise; the diagonals of such a matrix should all be 1.

When `family`

is not gaussian, the penalized quasi-likelihood (PQL) approximation to the generalized linear mixed model (GLMM) is fit following the procedure of GMMAT (Chen et al.).

For some outcomes, there may be evidence that different groups of observations have different residual variances, and the standard LMM assumption of homoscedasticity is violated. When `group.var`

is specified, separate (heterogeneous) residual variance components are fit for each unique value of `group.var`

.

Let m be the number of matrices in `covMatList`

and let g be the number of categories in the variable specified by `group.var`

. The length of the `start`

vector must be (m + 1) when `family`

is gaussian and `group.var`

is NULL; (m + g) when `family`

is gaussian and `group.var`

is specified; or m when `family`

is not gaussian.

A Newton-Raphson iterative procedure with Average Information REML (AIREML) is used to estimate the variance components of the random effects. When the Euclidean distance between the new and previous variance component estimates is less than `AIREML.tol`

, the algorithm declares convergence of the estimates. Sometimes a variance component may approach the boundary of the parameter space at 0; step-halving is used to prevent any component from becomming negative. However, when a variance component gets near the 0 boundary, the algorithm can sometimes get "stuck", preventing the other variance components from converging; if `dropZeros`

is TRUE, then variance components that converge to a value less than `AIREML.tol`

will be dropped from the model and the estimation procedure will continue with the remaining variance components.

### Value

A list including:

`varComp` |
The variance component estimates. There is one variance component for each random effect specified in |

`varCompCov` |
The estimated covariance matrix of the variance component estimates given by |

`fixef` |
A data.frame with effect size estimates (betas), standard errors, chi-squared test statistics, and p-values for each of the fixed effect covariates specified in |

`betaCov` |
The estimated covariance matrix of the effect size estimates (betas) of the fixed effect covariates. This can be used for hypothesis tests regarding the fixed effects. |

`fitted.values` |
The fitted values from the mixed model; i.e. W*beta where W is the design matrix and beta are the effect size estimates for the fixed effects. |

`resid.marginal` |
The marginal residuals from the mixed model; i.e. Y - W*beta where Y is the vector of outcome values. |

`eta` |
The linear predictor from the mixed model; i.e. W*beta + Z*u where Z*u specifies the effects of the random effects. |

`resid.conditional` |
The conditional residuals from the mixed model; i.e. Y - W*beta - Z*u. |

`logLikR` |
The restricted log-likelihood value. |

`logLik` |
The log-likelihood value. |

`AIC` |
The Akaike Information Criterion value. |

`RSS` |
The residual sum of squares from the model fit. When |

`workingY` |
The "working" outcome vector. When |

`model.matrix` |
The design matrix for the fixed effect covariates used in the model. |

`cholSigmaInv` |
The Cholesky decomposition of the inverse of the estimated outcome covariance structure. This is used by |

`scanID` |
A vector of scanIDs for the samples used in the analysis. |

`family` |
A character string specifying the family used in the analysis. |

`converged` |
A logical indicator of whether the AIREML procedure for estimating the random effects variance components converged. |

`zeroFLAG` |
A vector of logicals the same length as |

`hetResid` |
A logical indicator of whether heterogeneous residual variance components were used in the model (specified by |

### Author(s)

Matthew P. Conomos

### References

Chen H, Wang C, Conomos MP, Stilp AM, Li Z, Sofer T, Szpiro AA, Chen W, Brehm JM, Celedon JC, Redline S, Papanicolaou GJ, Thornton TA, Laurie CC, Rice K and Lin X. Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies Using Logistic Mixed Models. (Submitted).

Breslow NE and Clayton DG. (1993). Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association 88: 9-25.

Gilmour, A.R., Thompson, R., & Cullis, B.R. (1995). Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 1440-1450.

Gogarten, S.M., Bhangale, T., Conomos, M.P., Laurie, C.A., McHugh, C.P., Painter, I., ... & Laurie, C.C. (2012). GWASTools: an R/Bioconductor package for quality control and analysis of Genome-Wide Association Studies. Bioinformatics, 28(24), 3329-3331.

### See Also

`varCompCI`

for estimating confidence intervals for the variance components and the proportion of variability (heritability) they explain, `assocTestMM`

for running mixed model genetic association tests using the output from `fitNullMM`

.
`GWASTools`

for a description of the package containing the `ScanAnnotationDataFrame`

class.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ```
library(GWASTools)
# file path to GDS file
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
# read in GDS data
HapMap_geno <- GdsGenotypeReader(filename = gdsfile)
# create a GenotypeData class object
HapMap_genoData <- GenotypeData(HapMap_geno)
# load saved matrix of KING-robust estimates
data("HapMap_ASW_MXL_KINGmat")
# run PC-AiR
mypcair <- pcair(genoData = HapMap_genoData, kinMat = HapMap_ASW_MXL_KINGmat,
divMat = HapMap_ASW_MXL_KINGmat)
# run PC-Relate
mypcrel <- pcrelate(genoData = HapMap_genoData, pcMat = mypcair$vectors[,1],
training.set = mypcair$unrels)
close(HapMap_genoData)
# generate a phenotype
set.seed(4)
pheno <- 0.2*mypcair$vectors[,1] + rnorm(mypcair$nsamp, mean = 0, sd = 1)
# make ScanAnnotationDataFrame
scanAnnot <- ScanAnnotationDataFrame(data.frame(scanID = mypcrel$sample.id,
pc1 = mypcair$vectors[,1], pheno = pheno))
# make covMatList
covMatList <- list("Kin" = pcrelateMakeGRM(mypcrel))
# fit the null mixed model
nullmod <- fitNullMM(scanData = scanAnnot, outcome = "pheno", covars = "pc1", covMatList = covMatList)
``` |