trioLR: Trio Logic Regression
In trio: Testing of SNPs and SNP Interactions in Case-Parent Trio Studies

Description Usage Arguments Details Value Author(s) References See Also Examples

Performs a trio logic regression analysis as proposed by Li et al. (2011), where trio logic regression is an adaptation of logic regression (Ruczinski et al., 2003) for case-parent trio data.

## Default S3 method:
trioLR(x, y, search = c("sa", "greedy", "mcmc"), nleaves = 5, 
   penalty = 0, weights = NULL, control=lrControl(), rand = NA, ...)

## S3 method for class 'trioPrepare'
trioLR(x, ...)

## S3 method for class 'formula'
trioLR(formula, data, recdom = TRUE, ...)

`x`	either an object of class `trioPrepare`, i.e. the output of `trio.prepare`, or a binary matrix consisting of zeros and ones. If the latter, then each column of `x` must correspond to a binary variable (e.g., coding for a dominant or a recessive effect of a SNP), and each row to a case or a pseudo-control, where each trio is represented by a block of four consecutive rows of `x` containing the data for the case and the three matched pseudo-controls (in this order) so that the first four rows of `x` comprise the data for the first trio, rows 5-8 the data for the seocnd trio, and so on. Missing values are not allowed. A convenient way to generate this matrix is to use the function `trio.prepare`. Afterwards, `trioLR` can be directly applied to the output of `trio.prepare`.
`y`	a numeric vector specifying the case-pseudo-control status for the observations in `x` (if `x` is the binary matrix). Since in trio logic regression, cases are coded by a `3` and pseudo-controls by a `0`, `y` is given by `rep(c(3, 0, 0, 0), n.trios)`, where `n.trios` is the number of trios for which genotype data is stored in `x`. Thus, the length of `y` must be equal to the number of rows in `x`. No missing values are allowed in `y`. If not specified, `y` will be automatically generated.
`search`	character string naming the search algorithm that should be used in the search for the best trio logic regression model. By default, i.e. `search = "sa"`, simulated annealing, the standard search algorithm for a logic regression is used. In this case, depending on the length of `nleaves`, either one trio logic regression model is fitted or several trio logic regression models of different sizes are fitted. For details, see `nleaves`. Alternatively, a greedy search can be used by setting `search = "greedy"`, or a MC logic regression analysis (Kooperberg and Ruczinski, 2005) for case-parent trio data can be performed by setting `search = "mcmc"`.
`nleaves`	integer or vector of two integers specifying the maximum number of leaves, i.e.\ variables, in the logic tree of the trio logic regression model (please note in trio logic regression the model consists only of one logic tree). Must be a single integer, if `search = "greedy"` or `search = "mcmc"`. If `search = "sa"`, it can also be a vector of two integers, where the second integer must be larger than the first one. In this case, several trio logic regression models are fitted in which the maximum numbers of leaves range from `nleaves[1]` to `nleaves[2]`.
`penalty`	a non-negative value for the `penalty` parameter used in logic regression. The penalty takes the form `penalty` times the number of leaves in the model. By default, larger models are not penalized. `penalty` is only relevant when one logic regression model is fitted.
`weights`	a numeric vector containing one weight for each trio considered in `x`. Thus, `weights` must contain `nrow(x) / 4` positive values. By default, all trios are equally weighted.
`control`	a list of control parameters for the search algorithms and the logic tree considered when fitting a (trio) logic regression model. For these parameters, see `lrControl`, which is the function that should be used to specify `control`.
`rand`	integer. If specified, the random number generator will be set into a reproducible state.
`formula`	an object of class `formula` describing the model that should be fitted.
`data`	a data frame containing the variables in the model. Each row of `data` must correspond to an observation, and each column to a binary variable (coded by 0 and 1) or a factor (for details, see `recdom`) except for the column comprising the response, where no missing values are allowed in `data`. For a description of the specification of the response, see `y`.
`recdom`	a logical value or vector of length `ncol(data)` comprising whether a SNP should be transformed into two binary dummy variables coding for a recessive and a dominant effect. If `recdom` is `TRUE` (and a logical value), then all factors/variables with three levels will be coded by two dummy variables as described in `make.snp.dummy`. Each level of each of the other factors (also factors specifying a SNP that shows only two genotypes) is coded by one indicator variable. If `recdom` is`FALSE` (and a logical value), each level of each factor is coded by an indicator variable. If `recdom` is a logical vector, all factors corresponding to an entry in `recdom` that is `TRUE` are assumed to be SNPs and transformed into two binary variables as described above. All variables corresponding to entries of `recdom` that are `TRUE` (no matter whether `recdom` is a vector or a value) must be coded either by the integers 1 (coding for the homozygous reference genotype), 2 (heterozygous), and 3 (homozygous variant), or alternatively by the number of minor alleles, i.e. 0, 1, and 2, where no mixing of the two coding schemes is allowed. Thus, it is not allowed that some SNPs are coded by 1, 2, and 3, and others are coded by 0, 1, and 2.
`...`	for the `trioPrepare` and the `formula` method, optional parameters to be passed to the low level function `trioLR.default`, i.e. all arguments of `trioLR.default` except for `x` and `y`. Otherwise, ignored.

Trio logic regression is an adaptation of logic regression to case-parent trio data. Virtually all features for a standard logic regression analysis with the function logreg available in the R package LogicReg are also available for a trio logic regression analysis, either directly via trioLR or via the function trio.permTest for performing permutation tests.

For a detailed, comprehensive description on how to perform a logic regression analysis, and thus, a trio logic regression analysis, see the Details section of the help page for the function logreg in the R package LogicReg. For a detailed explanation on how to specify the parameters for simulated annealing, see the man page of the function logreg.anneal.control in the R package LogicReg.

Finally, an example for a trio logic regression analysis is given in the vignette trio available in the R package trio.

An object of class trioLR composed of the same objects as an object of class logreg. For details, see the Value section of the function logreg from the R package LogicReg.

Holger Schwender, holger.schwender@udo.edu

Kooperberg, C. and Ruczinski, I. (2005). Identifying Interacting SNPs Using Monte Carlo Logic Regression. Genetic Epidemiology, 28, 157-170.

Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.

Ruczinski, I., Kooperberg, C., and LeBlanc, M.L. (2003). Logic Regression. Journal of Computational and Graphical Statistics, 12, 475-511.

logreg, trio.prepare, trio.check, trio.permTest

# Load the simulated data.
data(trio.data)

# Prepare the data in trio.ped1 for a trio logic
# regression analysis by first calling
trio.tmp <- trio.check(dat = trio.ped1)

# and then applying
set.seed(123456)
trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3))

# where we here assume the block structure to be
# c(1, 4, 2, 3), which means that the first LD "block"
# only consists of the first SNP, the second LD block
# consists of the following four SNPs in trio.bin,
# the third block of the following two SNPs,
# and the last block of the last three SNPs.
# set.seed() is specified to make the results reproducible.

# For the application of trio logic regression, some
# parameters of trio logic regression are changed
# to make the following example faster.
my.control <- lrControl(start=1, end=-3, iter=1000, output=-4)

# Please note typically you should consider much more
# than 1000 iterations (usually, at least a few hundred
# thousand).

# Trio regression can then be applied to the trio data in
# trio.ped1 by
lr.out <- trioLR(trio.bin, control=my.control, rand=9876543)

# where we specify rand just to make the results reproducible.