Description Usage Arguments Details Value Author(s) References See Also Examples
Fits a generalized least-squares regression model to test association between a quantitative phenotype and all SNPs in a genotype file, one at a time, via Rapid Feasible Generalized Least Squares. For each SNP, genotype is treated as a fixed effect, and the residual variance-covariance matrix is also estimated. In each trait-SNP association test, the fgls()
function is used for parameter estimation.
The arguments to gls.batch()
may be regarded as belonging to four groups:
those concerning how to load the input (phenfile, genfile, pedifile, covmtxfile.in, theta, snp.names, input.mode, pediheader, pedicolname, sep.phe, sep.gen, sep.ped);
those concerning what to do with the input, that is, the actual analysis (phen, covars, med);
those concerning how to provide the output (outfile, col.names, return.value, covmtxfile.out, covmtxparams.out);
and those that merely trigger optional checks on the input (sizeLab, Mz, Bo, Ad, Mix, indobs).
1 2 3 4 5 6 7 8 9 | gls.batch(phenfile,genfile,pedifile,covmtxfile.in=NULL,theta=NULL,
snp.names=NULL,input.mode=c(1,2,3),pediheader=FALSE,
pedicolname=c("FAMID","ID","PID","MID","SEX"),
sep.phe=" ",sep.gen=" ",sep.ped=" ",
phen,covars=NULL,med=c("UN","VC"),
outfile,col.names=TRUE,return.value=FALSE,
covmtxfile.out=NULL,
covmtxparams.out=NULL,
sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)
|
phenfile |
This can be either (1) a character string specifying a phenotype file on disk which includes the phenotypes and other covariates, or (2) a data frame object containing the same data. In either case, the data must be appropriately structured. See below under "Details." |
genfile |
This can be Note that genotype scores need not be integers; they can also be numeric. So, |
pedifile |
This can be either (1) a character string specifying the pedigree file corresponding to genfile, to be read from disk, or (2) a data frame object containing this pedigree information. At minimum, pedifile must have a column of subject IDs, named The pedigree file or data frame can also include other columns of pedigree information, like father's ID, mother's ID, etc. Argument pediheader (see below) is an indicator of whether the pedigree file on disk has a header or not, with default as |
covmtxfile.in |
Optional; can be either (1) a character string specifying a file on disk from which the residual variance-covariance matrix is to be read, or (2) the matrix itself. If |
theta |
An optional vector of previously estimated (or known) residual-covariance parameters. Defaults to |
snp.names |
An optional character vector with length equal to the number of markers in genfile, providing names for those markers. Defaults to |
input.mode |
Either 1 (default), 2, or 3, which tells |
pediheader |
A logical indicator specifying whether the pedigree file to be read from disk has a header row, to ensure it is read in correctly. Even if |
pedicolname |
A vector of character strings giving the column names that |
sep.phe |
Separator character of the phenotype file to be read from disk. Defaults to a single space. |
sep.gen |
Separator character of the genotype file to be read from disk. Defaults to a single space. |
sep.ped |
Separator character of the pedigree file. Defaults to a single space. |
phen |
A character string specifying the phenotype (column name) in the phenotype file to be analyzed. |
covars |
A character string or character vector that holds the (column) names of the covariates, in the phenotype file, to be used in the regression model. Defaults to |
med |
A character string, either |
outfile |
Either a character string specifying the path and filename for the output file to be written, or Users are warned that if a file with the same path and filename already exists, |
col.names |
A logical indicator specifying whether to write column names in the output file to be written to disk. Defaults to |
return.value |
A logical indicator specifying whether function |
covmtxfile.out |
An optional character string specifying the filename and path to which the residual covariance matrix, if it is to be constructed (i.e., if Users are warned that if a file with the same path and filename already exists, |
covmtxparams.out |
An optional character string specifying the filename and path to which the vector of residual-covariance parameters, if they are to be estimated (i.e., if covmtxfile.in and theta are both Users are warned that if a file with the same path and filename already exists, |
sizeLab |
This is an optional argument, and may be eliminated in future versions of this package. Defaults to |
Mz, Bo, Ad, Mix |
These are optional logical indicators that specify whether families containing MZ twins (MZ; family-type 1), DZ twins or full siblings (Bo; family-types 2 and 4), two adoptees (Ad; family-type 3), or 1 biological offspring and 1 adoptee (Mix; family-type 5) are present in the data. The values of each are checked against the actual family types present, after loading and merging the data and trimming out incomplete cases, and a warning is generated for each mismatch. If any of these four arguments is |
indobs |
An optional logical indicator of whether there are "independent observations" who do not fit into a four-person nuclear family present in the data. After loading and merging the data and trimming out incomplete cases, the value of indobs is checked against whether such individuals are actually present, and a warning is generated in case of a mismatch. If |
Reference is frequently made throughout this documentation to the "phenotype file," the "genotype file," and so forth, because gls.batch()
was intended to be used with potentially large datafiles to be read from disk. This should be evident from the presence of the word "file" in the names of many of this function's arguments, and the fact that all of those arguments may be character strings providing a filename and path. However, it can also accept the data if the file has already been loaded into R's workspace as a data frame object, in which case "the [whatever] file" should be taken to refer to such a data frame. For details specific to each argument, see above.
The function gls.batch()
first reads in the files and merges them into a data frame with columns of family-structure information, phenotype, covariates, and genotypes. Then, it creates a tlist vector and a sizelist vector, which comprise the family labels and family sizes in the data. Finally, it carries out single-SNP association analyses for all the SNPs in the genotype file.
At the bare minimum, the phenotype file must contain columns named "ID"
, "FAMID"
, and whatever character string is supplied to phen. These columns respectively contain individual IDs, family IDs, and phenotype scores; individual IDs must be unique.
At the bare minimum, the pedigree file need only contain a column consisting of unique individual IDs, corresponding to the label "ID"
in pedicolname. The number of participants in the pedigree file must equal the number of participants in the genotype file, with participants ordered the same way in both files. However, the default value for argument pedicolname (see above) assumes five columns, in the familiar "pedigree table" format.
The phenotype file or pedigree file may also contain the two key family-structure variables, "FTYPE"
(family-type) and "INDIV"
(individual code). If both contain these variables, then by default, they are read from the phenotype file (but see argument input.mode above). There are six recognized family types, which are distinguished primarily by how the offspring in the family are related to one another:
FTYPE=1
, containing MZ twins;
FTYPE=2
, containing DZ twins;
FTYPE=3
, containing adoptees;
FTYPE=4
, containing non-twin full siblings;
FTYPE=5
, "mixed" families containing one biological offspring and one adoptee;
FTYPE=6
, containing "independent observations" who do not fit into a four-person nuclear family.
It is assumed that all offspring except adoptees are biological children of the parents in the family. The four individual codes are:
INDIV=1
is for "Offspring #1;"
INDIV=2
is for "Offspring #2;"
INDIV=3
is for mothers;
INDIV=4
is for fathers.
The distinction between "Offspring #1" and "#2" is mostly arbitrary, except that in "mixed" families(FTYPE=5
), the biological offspring MUST have INDIV=1
, and the adopted offspring, INDIV=2
. If the phenotype file contains variables "FTYPE"
and "INDIV"
, it should be ordered by family ID ("FAMID"
), and by individual code "INDIV"
within family ID. Note that gls.batch()
treats participants with FTYPE=6
as the sole members of their own family units, and not as part of the family corresponding to their family ID.
If neither the phenotype nor pedigree file contain "FTYPE"
and "INDIV"
, gls.batch()
will construct them via FSV.frompedi()
.
When one is conducting parallel analyses on a computing array, judicious use of arguments covmtxfile.in, theta, covmtxparams.out, and covmtxfile.out can save time. For example, suppose one is analyzing different SNP sets in parallel but using a common phenotype file for all. In this case, one could calculate the residual covariance matrix ahead of time and write it to a file. Then, use the same filename and path for argument covmtxfile.in, for all jobs running in parallel. The matrix can be calculated by using gls.batch.get()
and then fgls()
. One could similarly obtain the residual-covariance parameters ahead of time, and supply them as a vector to theta in all jobs running in parallel.
If return.value=FALSE
, then gls.batch()
simply returns NULL
. If return.value=TRUE
and genfile=NULL
, then gls.batch()
returns the fgls()
output from a regression of the phenotype onto an intercept and covariates (if any). If return.value=TRUE
and genfile is non-NULL
, then gls.batch()
returns a data frame containing the results of the single-SNP analyses, 1 row per SNP. Specifically, this data frame includes the following named columns:
snp
(character): the names of the SNPs; equal to snp.names if any were supplied.
coef
(numeric): the regression coefficients of the SNPs.
se
(numeric): estimated standard errors of SNPs' regression coefficients.
t.stat
(numeric): t-statistics, i.e. regression coefficients divided by their estimated standard errors.
df
(integer): degrees-of-freedom (see df.residual
, from fgls()
output).
pval
(numeric): two-tailed p-values, from corresponding t-statistics and degrees-of-freedom.
Function gls.batch()
also has optional side effects of writing as many as three files to disk, depending on arguments outfile, covmtxfile.out, and covmtxparams.out. Note that if a file is written for outfile, that file will contain the single-SNP analysis results described above.
Xiang Li lixxx554@umn.edu, Robert M. Kirkpatrick kirk0191@umn.edu, and Saonli Basu saonli@umn.edu .
Li X, Basu S, Miller MB, Iacono WG, McGue M: A Rapid Generalized Least Squares Model for a Genome-Wide Quantitative Trait Association Analysis in Families. Human Heredity 2011;71:67-82 (DOI: 10.1159/000324839)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | data(pheno)
data(geno)
data(map)
data(pedigree)
data(rescovmtx)
minigwas <- gls.batch(
phenfile=pheno,genfile=data.frame(t(geno)),pedifile=pedigree,
covmtxfile.in=rescovmtx, #<--Precomputed, to save time.
theta=NULL,snp.names=map[,2],input.mode=c(1,2,3),pediheader=FALSE,
pedicolname=c("FAMID","ID","PID","MID","SEX"),
sep.phe=" ",sep.gen=" ",sep.ped=" ",
phen="Zscore",covars="IsFemale",med=c("UN","VC"),
outfile=NULL,col.names=TRUE,return.value=TRUE,
covmtxfile.out=NULL,covmtxparams.out=NULL,
sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)
minigwas
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.