fgls: Feasible Generalized Least Squares regression with family...
In SAONLIB/RFGLS: Rapid Feasible Generalized Least Squares

Description Usage Arguments Details Value Author(s) References See Also Examples

Jointly estimates the fixed-effects coefficients and residual variance-covariance matrix in a generalized least squares model by minimizing the (multivariate-normal) negative loglikelihood function, via optim() in the R base distribution. The residual variance-covariance matrix is block-diagonal sparse, constructed with bdsmatrix() from the bdsmatrix package.

fgls(fixed, data=parent.frame(), tlist=tlist, sizelist=sizelist,
  med=c("UN","VC"), vmat=NULL, start=NULL, theta=NULL, drop=NULL,
  get.hessian=FALSE, optim.method="BFGS", control=list(), weights=NULL,
  sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)

`fixed`	An object of class 'formula' (or one that can be coerced to that class): a symbolic description of the regression model to be fitted. The RHS of the formula contains the fixed effects of the model.
`data`	An optional data frame, list or environment (or object coercible by `as.data.frame()` to a data frame) containing the variables in the model, as specified in argument fixed. If not found in data the variables are taken from `environment(formula)`, typically the environment from which `fgls()` is called. If it contains a column named `"ID"`, then that column's values will be the row and column names of output element `sigma` (see below, under "Value").
`tlist`	The character vector of the family labels ("famlab") in the data. The length of the vector equals the number of family units. It should be ordered in the same order as the families appear in the data. Object tlist is created by the `gls.batch()` and `gls.batch.get()` functions.
`sizelist`	The integer vector of the family sizes in the data. The length of the vector equals the number of family units. It should be ordered in the same order as the families appear in the data. Object sizelist is created by the `gls.batch()` and `gls.batch.get()` functions.
`med`	A character string, either `"UN"` or `"VC"`, which are the two RFGLS methods described by Li et al. (2011). If `"UN"` (default), which stands for "unstructured," the residual covariance matrix will be constructed from, at most, 12 parameters (8 correlations and 4 variances). If `"VC"`, which stands for "variance components," the residual covariance matrix will be constructed from, at most, 3 variance components (additive-genetic, shared-environmental, and unshared-environmental).
`vmat`	The previously estimated (or known) residual covariance matrix (for conducting Rapid FGLS). If it is `NULL` (default), the matrix will either be jointly estimated with the fixed-effects coefficients (FGLS), or will be constructed values supplied to theta. If not `NULL`, must be either (1) an object of class 'bdsmatrix' (from the `bdsmatrix` package), or (2) a character string specifying the filename and path for a single-column text file, with header, containing the "blocks" of a `bdsmatrix`. Note that at least one of vmat and theta must be `NULL`.
`start`	A numeric vector of initial values for the residual-covariance parameters. If `NULL` (default), generic start values are used. Otherwise, it must be a numerical vector of either length 12 if `med="UN"`, or of length 3 if `med="VC"`. Each vector element provides the initial value for the parameter corresponding to its index (serial position). Values of `NA` are accepted, and will be replaced with the generic start value for that parameter. See below under "Details" for which parameters correspond to which indices. Users should bear in mind that especially poor start values can cause optimization to fail. Ignored if no residual-covariance parameters are estimated.
`theta`	A numeric vector of previously estimated (or known) residual-covariance parameters. Defaults to `NULL`, in which case it is ignored. Otherwise, it must be a numerical vector of of either length 12 if `med="UN"`, or of length 3 if `med="VC"`. Each vector element provides the value for the parameter corresponding to its index (serial position). Values of `NA` are accepted for extraneous parameters. See below under "Details" for which parameters correspond to which indices. The residual covariance matrix is constructed from the elements of theta, exactly as-is. Note that at least one of vmat and theta must be `NULL`.
`drop`	An integer vector of indices (serial positions) specifying which residual-covariance parameters to drop. Dropped parameters are not estimated. In addition to those specified by drop, `fgls()` automatically identifies which parameters are completely unidentified from the data (i.e., zero observations in the data are informative about them), and drops them as well. If drop is `NULL` (default), no user-specified parameters are dropped. Otherwise, it must be a vector of integers, either between 1 and 12 (inclusive) if `med="UN"`, or between 1 and 2 (inclusive) if `med="VC"`. Note that if a user-specified-dropped parameter ends up being needed to construct the residual covariance matrix, its value is taken to be that of its OLS equivalent: zero for correlations (`med="UN"`) and for the familial variance components (`med="VC"`), and the OLS residual variance for variances (`med="UN"`). It may be prudent to drop parameters when very few observations in the data are informative about them, which can at least save computation time. See below under "Details" for which parameters correspond to which indices. Ignored if no residual-covariance parameters are estimated.
`get.hessian`	Logical; default is `FALSE`. If `TRUE`, `fgls()` will include the Hessian matrix from `optim()` in its output list. Otherwise, the entry 'hessian' in the list will be `NULL`. Ignored if no residual-covariance parameters are estimated.
`optim.method`	Character string, passed as method to `optim()`. Ignored if no residual-covariance parameters are estimated. The default, `"BFGS"`, is usually fast and is recommended for general use. If method `"L-BFGS-B"` is used, `fgls()` will supply `optim()` with reasonable box constraints on the parameters, intended for use with `optim()`'s default control parameters (see argument control below). Method `"BFGS"` (the default) may fail when any of the residual-covariance parameters are poorly identified from the data. In these cases, it may be wise simply to drop the offending parameters. Other optimization methods, including `"L-BFGS-B"`, can succeed where `"BFGS"` fails. Method `"SANN"` should not generally be relied upon to find the global optimum, but it can sometimes produce reasonable, approximate solutions in instances where no other method works. As a last-resort diagnostic, one can combine `optim.method="SANN"` with `hessian=TRUE`, since the resulting Hessian matrix may provide clues as to which parameters are causing problems.
`control`	A list of control parameters passed to `optim()`, intended for advanced users. The default is also `optim()`'s default, which should be adequate for general use.
`weights`	A numeric vector of weights, with length equal to the number of observations in the data. Defaults to `NULL`.
`sizeLab`	This is an optional argument, and may be eliminated in future versions of this package. Defaults to `NULL`; otherwise, must be a character string. If the number of characters in the string is not equal to the size of the largest family in the data, `fgls()` will produce a warning.
`Mz, Bo, Ad, Mix, indobs`	These arguments are deprecated, and their values are ignored. They are retained in this package version for legacy reasons, but will be eliminated in future versions.

Function fgls() was originally intended to be called automatically, from within gls.batch(). However, calling it directly is likely to be useful to advanced users. The difficulty when directly invoking fgls() is supplying the function with arguments tlist and sizelist. But, these can be obtained easily via gls.batch.get().

When residual-covariance parameters are to be estimated, fgls() will attempt optimization, at most, two times. If the initial attempt fails, fgls() prints a message saying so to the console, and tries a second time. On the second attempt, before each evaluation of the objective function, the blocks composing the block-diagonal residual covariance matrix are forced to be positive definite. This uses nearPD() from the Matrix package, which turns each block matrix into its nearest positive-definite approximation (where "nearest" is meant in a least-squares sense). Forcing positive-definiteness in this way is only used for the second attempt, and not for the initial attempt (which has its own way of ensuring a positive-definite solution), since it slows down optimization and is unnecessary when the parameters are well-identified. Furthermore, it can have consequences the user might not expect. For instance, in fgls()'s output (see below, under "Value"), the elements of the residual covariance matrix sigma might not correspond to the parameter estimates in estimates, or covariances that are supposed to be the same across families might not be so in the actual matrix sigma. Nevertheless, the second attempt may succeed when the initial attempt fails.

When med="UN", the residual covariance matrix is constructed from, at most, 12 parameters–8 correlations and 4 variances. Below is an enumerated list of those 12 parameters, in which the number of each list entry is the index (serial position) of that parameter, and the quoted text is the element name of each estimated parameter as it appears in fgls() output:

"cor(m,f)", correlation between mothers and fathers.
"cor(c/b,m)", correlation between biological offspring and mothers.
"cor(c/b,f)", correlation between biological offspring and fathers.
"cor(c,c)", MZ-twin correlation.
"cor(b,b)", full-sibling (DZ-twin) correlation.
"cor(a,m)", correlation between adoptees and mothers.
"cor(a,f)", correlation between adoptees and fathers.
"cor(a,a)", adoptive-sibling correlation.
"var(O)", offspring variance.
"var(m)", mother variance.
"var(f)", father variance.
"var(ind)", variance for "independent observations."

When med="VC", the residual covariance matrix is constructed from, at most, 3 variance components. Below is an enumerated list of those 3 parameters, in which the number of each list entry is the index (serial position) of that parameter, and the quoted text is the label of each estimated parameter as it appears in fgls() output:

"A", additive-genetic variance.
"C", shared-environmental variance (compound-symmetric within families).
"E", unshared-environmental variance (which cannot be dropped).

Additive-genetic variance contributes to covariance between family members commensurately to the expected proportion of segregating alleles they share: 1.0 for MZ twins, 0.5 for first-degree relatives, 0 for spouses and adoptive relatives. Shared-environmental variance, as defined here, represents covariance between biologically unrelated family members (including spouses).

In package version 1.0, arguments subset and na.action were accepted, and passed to lm(). Neither are accepted any longer. Subsetting should be done before directly calling fgls(); the function handles NA's in the data by what is (in effect) na.action=na.omit.

An object of class 'fgls'. It includes the following components:

`ctable`	Table of coefficients reminiscent of output from `summary.lm()`. Each fixed-effect term (including the intercept) has one row of the table, which are ordered as the terms appear in argument fixed. Each row contains a point estimate, an estimated standard error, a t-statistic, and a two-tailed p-value.
`Rsqd`	The generalized-least-squares coefficient of determination, a la Buse (1973).
`estimates`	The vector of MLEs of the parameters used to construct the residual covariance matrix, ordered as in the lists above, under "Details." Dropped parameters are given value `NA`. If no residual-covariance parameters are estimated, will instead be a single `NA`.
`drop`	A vector of parameter indices, representing which residual-covariance parameters were dropped (not estimated). See above, under "Details," for which parameters correspond to which indices. `NULL` if no parameters were dropped or estimated.
`iter`	`NULL` if no residual-covariance parameters were estimated. Otherwise, a single-row data frame, containing miscellaneous output pertaining to the optimization, specifically, the following named columns: `iterations` (integer): the number of function iterations, as returned from `optim()`. `convergence` (integer): convergence code, as returned from `optim()`; value 0 means that convergence was successful. `message` (character): additional information from the optimizer; a single whitespace means that `optim()` returned a message of `NULL`. `first_try` (logical): Did `fgls()`'s first attempt at optimization succeed? If `FALSE`, then during the second attempt, `fgls()` had to force the block matrices of the residual covariance matrix to be positive-definite, as described above, under "Details."
`loglik`	The negative loglikelihood, at the solution. If the residual-covariance parameters were estimated, then it equals -1 times the maximized joint loglikelihood of those parameters and the regression coefficients. If the residual-covariance parameter values were provided with argument vmat or theta, then it equals -1 times the maximized joint loglikelihood of the regression coefficients, conditional on the values supplied for the residual-covariance parameters.
`sigma`	The residual covariance matrix. It is of class 'bdsmatrix'. Its row and column names are taken from the column named "ID", if any, in argument `data`, otherwise its row and column names are sequential numbers. One of its slots, `sigma@blocks`, can be written to a single-column text file and subsequently read in by `gls.batch()`. Due to its potential size, it is not advised to return `sigma` to R's standard output or print it to the console.
`hessian`	If `get.hessian=TRUE` and residual-covariance parameters were estimated, the Hessian matrix from `optim()` for those parameters; `NULL` otherwise.
`n`	Sample size (i.e., number of individual participants), after excluding those with missing data (`NA`'s).
`df.residual`	Residual degrees of freedom in the feasible generalized-least-squares regression, as returned by `lm()`. Note that it only reflects the number of regression coefficients, and not the number of residual-covariance parameters that were estimated.
`residuals`	Residuals from the feasible generalized-least-squares regression. It is a vector of length `n`, i.e. it is not padded with `NA`'s for participants with missing data.
`fitted.values`	Predicted phenotype scores from the feasible generalized-least-squares regression. It is a vector of length `n`, i.e. it is not padded with `NA`'s for participants with missing data.
`variance`	The estimated covariance matrix for (the sampling distribution of) the fixed-effects regression coefficients.
`call`	Echo of `fgls()` function call.

Function fgls() also prints to console the estimates of non-dropped residual-covariance parameters (if any).

Xiang Li lixxx554@umn.edu, Robert M. Kirkpatrick kirk0191@umn.edu, and Saonli Basu saonli@umn.edu .

Li X, Basu S, Miller MB, Iacono WG, McGue M: A Rapid Generalized Least Squares Model for a Genome-Wide Quantitative Trait Association Analysis in Families. Human Heredity 2011;71:67-82 (DOI: 10.1159/000324839)

Buse, A: Goodness of Fit in Generalized Least Squares Estimation The American Statistician 1973;27:106-108

gls.batch

data(pheno)
data(geno)
data(map)
data(pedigree)
data(rescovmtx)
foo <- gls.batch.get(
  phenfile=pheno,genfile=data.frame(t(geno)),pedifile=pedigree,
  covmtxfile.in=NULL,theta=NULL,snp.names=map[,2],input.mode=c(1,2,3),
  pediheader=FALSE,pedicolname=c("FAMID","ID","PID","MID","SEX"),
  sep.phe=" ",sep.gen=" ",sep.ped=" ",
  phen="Zscore",covars="IsFemale",med=c("UN","VC"),
  outfile,col.names=TRUE,return.value=FALSE,
  covmtxfile.out=NULL,
  covmtxparams.out=NULL,
  sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)

bar <- fgls(
  Zscore ~ rs3934834 + IsFemale, data=foo$test.dat, tlist=foo$tlist,
  sizelist=foo$sizelist,med=c("UN","VC"), 
  vmat=rescovmtx, #<--Resid. cov. matrix from fgls onto IsFemale only.
  start=NULL, theta=NULL, drop=NULL, get.hessian=FALSE, 
  optim.method="BFGS", control=list(), weights=NULL,
  sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)

bar$ctable
##  To simultaneously estimate residual covariance matrix
##  and regression coefficients for rs3934834 & IsFemale,
##  use the same syntax, except with vmat = NULL .