Description Usage Arguments Details Value Author(s) References See Also Examples
Jointly estimates the fixed-effects coefficients and residual variance-covariance matrix in a generalized least squares model by minimizing the (multivariate-normal) negative loglikelihood function, via optim()
in the R base distribution. The residual variance-covariance matrix is block-diagonal sparse, constructed with bdsmatrix()
from the bdsmatrix
package.
1 2 3 4 |
fixed |
An object of class 'formula' (or one that can be coerced to that class): a symbolic description of the regression model to be fitted. The RHS of the formula contains the fixed effects of the model. |
data |
An optional data frame, list or environment (or object coercible by |
tlist |
The character vector of the family labels ("famlab") in the data. The length of the vector equals the number of family units. It should be ordered in the same order as the families appear in the data. Object tlist is created by the |
sizelist |
The integer vector of the family sizes in the data. The length of the vector equals the number of family units. It should be ordered in the same order as the families appear in the data. Object sizelist is created by the |
med |
A character string, either |
vmat |
The previously estimated (or known) residual covariance matrix (for conducting Rapid FGLS). If it is |
start |
A numeric vector of initial values for the residual-covariance parameters. If |
theta |
A numeric vector of previously estimated (or known) residual-covariance parameters. Defaults to |
drop |
An integer vector of indices (serial positions) specifying which residual-covariance parameters to drop. Dropped parameters are not estimated. In addition to those specified by drop, |
get.hessian |
Logical; default is |
optim.method |
Character string, passed as method to Method |
control |
A list of control parameters passed to |
weights |
A numeric vector of weights, with length equal to the number of observations in the data. Defaults to |
sizeLab |
This is an optional argument, and may be eliminated in future versions of this package. Defaults to |
Mz, Bo, Ad, Mix, indobs |
These arguments are deprecated, and their values are ignored. They are retained in this package version for legacy reasons, but will be eliminated in future versions. |
Function fgls()
was originally intended to be called automatically, from within gls.batch()
. However, calling it directly is likely to be useful to advanced users. The difficulty when directly invoking fgls()
is supplying the function with arguments tlist and sizelist. But, these can be obtained easily via gls.batch.get()
.
When residual-covariance parameters are to be estimated, fgls()
will attempt optimization, at most, two times. If the initial attempt fails, fgls()
prints a message saying so to the console, and tries a second time. On the second attempt, before each evaluation of the objective function, the blocks composing the block-diagonal residual covariance matrix are forced to be positive definite. This uses nearPD()
from the Matrix package, which turns each block matrix into its nearest positive-definite approximation (where "nearest" is meant in a least-squares sense). Forcing positive-definiteness in this way is only used for the second attempt, and not for the initial attempt (which has its own way of ensuring a positive-definite solution), since it slows down optimization and is unnecessary when the parameters are well-identified. Furthermore, it can have consequences the user might not expect. For instance, in fgls()
's output (see below, under "Value"), the elements of the residual covariance matrix sigma
might not correspond to the parameter estimates in estimates
, or covariances that are supposed to be the same across families might not be so in the actual matrix sigma
. Nevertheless, the second attempt may succeed when the initial attempt fails.
When med="UN"
, the residual covariance matrix is constructed from, at most, 12 parameters–8 correlations and 4 variances. Below is an enumerated list of those 12 parameters, in which the number of each list entry is the index (serial position) of that parameter, and the quoted text is the element name of each estimated parameter as it appears in fgls()
output:
"cor(m,f)", correlation between mothers and fathers.
"cor(c/b,m)", correlation between biological offspring and mothers.
"cor(c/b,f)", correlation between biological offspring and fathers.
"cor(c,c)", MZ-twin correlation.
"cor(b,b)", full-sibling (DZ-twin) correlation.
"cor(a,m)", correlation between adoptees and mothers.
"cor(a,f)", correlation between adoptees and fathers.
"cor(a,a)", adoptive-sibling correlation.
"var(O)", offspring variance.
"var(m)", mother variance.
"var(f)", father variance.
"var(ind)", variance for "independent observations."
When med="VC"
, the residual covariance matrix is constructed from, at most, 3 variance components. Below is an enumerated list of those 3 parameters, in which the number of each list entry is the index (serial position) of that parameter, and the quoted text is the label of each estimated parameter as it appears in fgls()
output:
"A", additive-genetic variance.
"C", shared-environmental variance (compound-symmetric within families).
"E", unshared-environmental variance (which cannot be dropped).
Additive-genetic variance contributes to covariance between family members commensurately to the expected proportion of segregating alleles they share: 1.0 for MZ twins, 0.5 for first-degree relatives, 0 for spouses and adoptive relatives. Shared-environmental variance, as defined here, represents covariance between biologically unrelated family members (including spouses).
In package version 1.0, arguments subset and na.action were accepted, and passed to lm()
. Neither are accepted any longer. Subsetting should be done before directly calling fgls()
; the function handles NA
's in the data by what is (in effect) na.action=na.omit
.
An object of class 'fgls'. It includes the following components:
ctable |
Table of coefficients reminiscent of output from |
Rsqd |
The generalized-least-squares coefficient of determination, a la Buse (1973). |
estimates |
The vector of MLEs of the parameters used to construct the residual covariance matrix, ordered as in the lists above, under "Details." Dropped parameters are given value |
drop |
A vector of parameter indices, representing which residual-covariance parameters were dropped (not estimated). See above, under "Details," for which parameters correspond to which indices. |
iter |
|
loglik |
The negative loglikelihood, at the solution. If the residual-covariance parameters were estimated, then it equals -1 times the maximized joint loglikelihood of those parameters and the regression coefficients. If the residual-covariance parameter values were provided with argument vmat or theta, then it equals -1 times the maximized joint loglikelihood of the regression coefficients, conditional on the values supplied for the residual-covariance parameters. |
sigma |
The residual covariance matrix. It is of class 'bdsmatrix'. Its row and column names are taken from the column named "ID", if any, in argument |
hessian |
If |
n |
Sample size (i.e., number of individual participants), after excluding those with missing data ( |
df.residual |
Residual degrees of freedom in the feasible generalized-least-squares regression, as returned by |
residuals |
Residuals from the feasible generalized-least-squares regression. It is a vector of length |
fitted.values |
Predicted phenotype scores from the feasible generalized-least-squares regression. It is a vector of length |
variance |
The estimated covariance matrix for (the sampling distribution of) the fixed-effects regression coefficients. |
call |
Echo of |
Function fgls()
also prints to console the estimates of non-dropped residual-covariance parameters (if any).
Xiang Li lixxx554@umn.edu, Robert M. Kirkpatrick kirk0191@umn.edu, and Saonli Basu saonli@umn.edu .
Li X, Basu S, Miller MB, Iacono WG, McGue M: A Rapid Generalized Least Squares Model for a Genome-Wide Quantitative Trait Association Analysis in Families. Human Heredity 2011;71:67-82 (DOI: 10.1159/000324839)
Buse, A: Goodness of Fit in Generalized Least Squares Estimation The American Statistician 1973;27:106-108
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | data(pheno)
data(geno)
data(map)
data(pedigree)
data(rescovmtx)
foo <- gls.batch.get(
phenfile=pheno,genfile=data.frame(t(geno)),pedifile=pedigree,
covmtxfile.in=NULL,theta=NULL,snp.names=map[,2],input.mode=c(1,2,3),
pediheader=FALSE,pedicolname=c("FAMID","ID","PID","MID","SEX"),
sep.phe=" ",sep.gen=" ",sep.ped=" ",
phen="Zscore",covars="IsFemale",med=c("UN","VC"),
outfile,col.names=TRUE,return.value=FALSE,
covmtxfile.out=NULL,
covmtxparams.out=NULL,
sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)
bar <- fgls(
Zscore ~ rs3934834 + IsFemale, data=foo$test.dat, tlist=foo$tlist,
sizelist=foo$sizelist,med=c("UN","VC"),
vmat=rescovmtx, #<--Resid. cov. matrix from fgls onto IsFemale only.
start=NULL, theta=NULL, drop=NULL, get.hessian=FALSE,
optim.method="BFGS", control=list(), weights=NULL,
sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)
bar$ctable
## To simultaneously estimate residual covariance matrix
## and regression coefficients for rs3934834 & IsFemale,
## use the same syntax, except with vmat = NULL .
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.