PersonAlytic: Personalytic, a simplified user interface for linear and...

View source: R/PersonAlytic.r

PersonAlyticR Documentation

Personalytic, a simplified user interface for linear and generalized linear mixed effects models with high throughput options.

Description

A simplified user interface for longitudinal linear mixed effects models (also known as growth models or hierarchical linear models) using nlme and generalized linear mixed effects models using gamlss (not currently implemented). The basic mixed effects model is dv=time+phase+phase*time with random intercepts and random slopes for time. The phase variable is optional. Additional independent variables (or covariates) can be included.

High throughput capabilities are invoked when the model is to be fit to multiple individuals, multiple dependent variables, iterated over multiple target independent variables, or any combination of these three.

The residual correlation structure ARMA(p,q) is automatically selected among p=1,...,P and q=1,...,Q where P and Q are selected by the user. Model selection is done by comparing fit indices using either the BIC or AIC (ARMA models are not generally nested, precluding the use of likelihood ratio tests).

The functional form of the relationship between time and the dependent variable is automatically selected up to time^time_power where time_power is selected by the user. Model selection is done using likelihood ratio tests (LRT) with maximum likelihood estimators. For example, if time_power=3 (implying a maximum of a cubic growth model), linear, quadratic and cubic models will be fit. If the best fitting model is a quadratic growth model, then there will be random and fixed effects for time and time^2.

When there are multiple dependent variables and multiple target independent variables, Type I error corrections or false discovery rate corrections are made across target independent variables within each dependent variable (and within each person if individual models are being fit to the data). Correction options are given in p.adjust.

Usage

PersonAlytic(
  output = NULL,
  data,
  ids,
  dvs,
  time,
  phase = NULL,
  ivs = NULL,
  target_ivs = NULL,
  interactions = NULL,
  time_power = 1,
  correlation = NULL,
  family = gamlss.dist::NO(),
  subgroup = NULL,
  standardize = list(dv = FALSE, iv = FALSE, byids = FALSE),
  method = "REML",
  package = "nlme",
  individual_mods = FALSE,
  PalyticObj = NULL,
  autoSelect = list(AR = list(P = 3, Q = 3), TO = list(polyMax = 3), DIST = list()),
  whichIC = c("BIC", "AIC"),
  charSub = NULL,
  sigma.formula = ~1,
  p.method = "BY",
  alpha = 0.05,
  nbest = NULL,
  alignPhase = "none",
  fpc = 0,
  debugforeach = FALSE,
  cores = parallel::detectCores() - 1,
  userFormula = NULL,
  ...
)

Arguments

output

Character. The default is NULL.

A character string that will be used to name a file for saving output. If left NULL, the default is 'PersonAlytic_Output'. Do not give a file extension, these will be added automatically. For example, if output='MyResults', the output file will be called 'MyResults.csv' if high throughput options are used, or 'MyResults.txt' if only one model is fit. A full path with '/' instead of '\' can also be used. For example, output='C:/MyResults' will produce the files 'C:/MyResults.csv' or 'C:/MyResults.txt'. If a full path is not given, the results will be saved in the working directory (see getwd).

If p.method and nbest are specified, the output will be saved in a subfolder.

data

A data.frame. data must be provided by the user.

data that contains the variables ids, dv, time, and optionally, phase, ivs, target_ivs.

ids

Character. ids must be provided by the user.

The name of the ID variable. The ID variable must be a variable in data and must be numeric.

dvs

Character list. dvs must be provided by the user.

A list of one or more character dependent variable names in data. The linear mixed effects model dvs[d] ~ phase + time + phase*time + target_ivs[c] + ivs with random effects ~ time | ids[i] will be fit to the data. The iterators [d], [c], and [i] indicate that the model will be fit for each combination of dependent variable in dvs, independent variable in target_ivs, and each unique ID in ids (which can be overridden using individual_mods=FALSE) with each model controlling for the independent variables in ivs.

time

Character. time must be provided by the user.

The name of the time variable. The time variable must be a variable in data and must be numeric.

phase

Character. The default value is NULL.

Name of the phase or treatment variable. For model fitting, phase is treated the same as any variable names in ivs or target_ivs but is used for visualizing treatment effects.

ivs

Character list. The default value is NULL.

A list of names of covariates, e.g., list('iv2', 'iv2'). Note that the variables in ivs cannot also be in target_ivs.

target_ivs

Character list. The default value is NULL.

Independent variables that are iterated over (see dv for details). Effects for these variables are labeled as 'target predictor' in the output.

interactions

Character list. The default value is NULL.

List of pairs of variable names for interactions to include in the model, e.g., list(c('iv1','phase'), c('iv1','iv2')).

time_power

Numeric. The default is time_power=3.

Power of the time variable (e.g., time^time_power). A quadratic or cubic growth model would be specified using time_power=2 or time_power=3, respectively. If autoSelect$TO$polyMax is specified, polyMax is the largest value tested for. The default value is 3, testing up to a cubic growth model. If a linear growth model is desired, set autoSelect$TO=NULL and time_power=1.

correlation

Character. The default value is NULL.

See corStruct in nlme. Must be passed as a character, e.g. "corARMA(p=1)". The default value is NULL, assuming no residual autocorrelation. If auoDetect$AR is specified, correlation will be ignored.

family

See gamlss.family. The default is normal, family=NO(). This option is not yet implemented.

A list of the same length as length(dv) can be supplied. For example if dv=list('normal_y', 'binary_y', 'beta_y'), then family could be family=list(NO(), BI(), BEINF()). If length(dv)>1 and length(family)==1, the one distribution in family will be applied to all outcomes. The family parameter is ignored if package="nlme".

subgroup

Logical vector. The default is subgroup=NULL wich results in a logical vector of the same length as the number of rows in data with all values equal to TRUE.

A vector where length(subgroup)==nrow(data) indicating which subset of the data should be used for analysis. For example, if a model should only be fit to females, subgroup=gender=='female' might be used.

standardize

Named logical vector. The default is list(dv=FALSE, ivs=FALSE, byids=FALSE).

Which variables should be standardized? (i.e., rescaled to have 0 mean and unit variance; see scale)? See dv and ivs. The option byids controls whether standardization is done by individuals or by group. Does not apply to factor variables. The default is TRUE. Standardization makes parameter estimate magnitudes comparable across individuals, outcomes in dvs, and covariates in target_ivs. For dependent variables in dvs, standardization is only applied for normal outcomes, see family.

method

character. The default is "REML".

Which likelihood methods should be used to fit the models? Options are "REML", which should be used for final parameter estimates and effect sizes, and "ML", which should be used for model comparisons (e.g., this is done for automatic residual correlation structure detection and automatic time order detection).

package

Character. The default is "nlme".

Which package should be used to fit the models? Options are nlme, gamlss, and arma. It is passed as character strings, e.g., "gamlss" or "nlme". If there is only one participant in ids, "arma" will be used.

individual_mods

Logical. The default is individual_mods=FALSE.

Should individual models be fit for each ID in ids?

PalyticObj

See Palytic. Not currently implemented.

If PalyticObj is submitted then only dvs, target_ivs, and individual_mods will be used. This allows users access to additional options including generalized linear mixed effects models via the family option, user specified correlation structures (in non-NULL this will override the automated correlation structure search), and user specified models via formula.

autoSelect

List. The default is list(AR=list(P=3, Q=3) , TO=list(polyMax=3) , DIST=list()) .

If no automated model selection for the residual covariance structure (AR), the polynomial order for the relationship between time and the dependent variable (TO), or the dependent variable distribution is desired, an empty list should be passed (e.g., autoSelect=list()).

If AR is in the list, the residual correlation structure will be automatically selected from among ARMA(p,q) models? See correlation. Since these models are not generally nested, model selection is done using information information criterion (see whichIC). Model selection for the residual covariance structure is searches among p=1,...,P and p=1,...,Q, where P and Q are taken from PQ, i.e., PQ=c(P,Q). The values of p and p are passed to corARMA ( e.g., corARMA(p=p,q=q)). If individual_mods=FALSE, this done comparing lme modes for N>1 data. If individual_mods=TRUE, this is done using the auto.arima function on the residuals for each individual. For more detail, see the $GroupAR() method in Palytic.

If TO is in the list, models with polynomial powers of time from 1 to polyMax will be tested. For example, if polyMax=3 (implying a cubic growth model), the models compared include time, time + I(time^2), and time + I(time^2)+I(time^3). Since these models are nested, the best fitting model is selected using likelihood ratio tests with mixed effects models fit using maximum likelihood estimators in lme. This is done separately for each individual in ids if individual_mods=TRUE. For more detail, see the $getTO() method in Palytic.

If DIST is in the list and package='gamlss', each dependent variable in dvs will utilize the fitDist function of the gamlss package, and the best fitting distribution will be used for each dependent variable. For more detail, see the $dist() method in Palytic. To narrow the distributions that will be tested, the user must specify whether to rescale the dependent variable to the (0,1) range with to01. Currently settings for DIST apply to all dependent variables in dvs. This will be generalized to dependent variable specifics settings in a future release. If your dependent variables require different DIST settings, use separate calls to PersonAlytic.

whichIC

Character. The default is whichIC="BIC".

Either the Akaike Information Criterion (whichIC="AIC") or the Bayesian Information Criterion (whichIC="BIC").

If the time variable is equally spaced, this is done using the function forecast. If the time variable is not equally spaced, this is done using comparisons of mixed effects models using lme fit using maximum likelihood estimators.

Residual autocorrelation structure is detected separately for each individual in ids if individual_mods=TRUE.

charSub

Character list. The default in charSub=NULL.

A list of paired character strings for character substitution in the output. If the names of the target predictors in target_ivs had to be edited to make valid variable names, this parameter allows users put the illegal characters back in. For example, if the original variable name was "17.00_832.2375m/z", a letter would need to prefix the variable name and the "/" would need to be replaced with another character, e.g., "X17.00_832.2375m.z". To get the row names of the output back to original variable name, use charSub=list(c("X", ""), c("m.z", "m/z")). Note that inputs to charSub must be in double quotes and are case sensitive. All duplicates will be substituted. For example, if the variable name was "X1X23.x" and charSub=list(c("X", "")), the resulting row label for this variable would be "123.x".

sigma.formula

Not currently implemented.

A formula for the variance under gamlss. Currently static: it will not change dynamically over iterations nor will it be updated by time_power or autoSelect. If model fitting using this option fails, another attempt will be made after resetting it to its default, i.e., ~1.

p.method

See p.adjust.methods. When individual_mods=TRUE, length(dvs)>1, or length(target_ivs)>1, p-value adjustments are made and reported in a separate folder saved to the working directory. The parameter nbest must also be specified.

alpha

Numeric value in the (0,1) interval. The Type I error rate for adjusting p-values.

nbest

Numeric integer value. The number of best target_ivs to report on before and after adjusting p-values. This is only used if length(target_ivs)>1 and the target_ivs are all continuous.

alignPhase

Character. The default is 'none' which leaves time as is.

Other options are 'piecewise' which leads to a piecewise growth model and sets time to be 0 at the beginning of each phase, see pwtime. The other option is 'align' which aligns time at 0 between the first and second phase.

Should the time variable be realigned at the phase? If TRUE (the default), the time for first observation in the second phase becomes 0 and times prior to the second phase are negative. For example, if the time variable is c(0,1,2,3,4,5) and the phase variable is c(0,0,0,1,1,1), phase alignment yields a time variable of c{-3,-2,-1,0,1,2}. This is useful when the timing of the transition between the first and second phases varies by individual, especially for graphing. This approach does not generalize to three or more phases, and alignment only happens at the first phase transition. If there are three or more phases, the later phase will not be aligned.

fpc

Numeric. The default is 0. If the value of fpc is greater than the number of individuals in the data set, fpc is taken as the size of the finite population from which the data were sampled, and a finite population correction is made for the fixed effects standard errors, t-tests, and p-values.

debugforeach

Logical. The default is debugforeach=FALSE. Not implemented.

cores

Integer. The defaults is parallel::detectCores()-1, or one fewer cores than what is detected on the machine.

userFormula

List of formulae used to override default model construction. Items in the list must include fixed and random to specify the parameters of the same names using lme, or formula to specify the parameter of the same name in gamlss. See byPhasebyGroup for an example.

...

Not currently used.

Examples


## Not run: 

# full sample model
t0 <- PersonAlytic(output  = 'Test0'     ,
                   data    = OvaryICT    ,
                   ids     = "Mare"      ,
                   dvs     = "follicles" ,
                   phase   = "Phase"     ,
                   time    = "Time"      ,
                   package = "nlme"      )

# individual models
t1 <- PersonAlytic(output          = 'Test1'     ,
                   data            = OvaryICT    ,
                   ids             = "Mare"      ,
                   dvs             = "follicles" ,
                   phase           = "Phase"     ,
                   time            = "Time"      ,
                   package         = "arma"      ,
                   individual_mods = TRUE        )

summary(t0)
summary(t1)


# full sample model with a finite population correction (FPC)
t0fpc <- PersonAlytic(output  = 'Test0FPC'   ,
                      data    = OvaryICT     ,
                      ids     = "Mare"       ,
                      dvs     = "follicles"  ,
                      phase   = "Phase"      ,
                      time    = "Time"       ,
                      package = "nlme"       ,
                      fpc     = 100          )

# gamlss with two distributions - features not implemented
#OvaryICT$follicles01 <- to01(OvaryICT$follicles)
#t2 <- PersonAlytic(data=OvaryICT,
#                 ids="Mare",
#                 dvs=list("follicles", "follicles01"),
#                 phase="Phase",
#                 time="Time",
#                 family=c(NO(), BEINF()),
#                 package='gamlss')


# individual models with target variables
t3 <- PersonAlytic(output          = 'TargetIVStest'       ,
                   data            = OvaryICT              ,
                   ids             = "Mare"                ,
                   dvs             = "follicles"           ,
                   phase           = "Phase"               ,
                   time            = "Time"                ,
                   package         = "arma"                ,
                   individual_mods = TRUE                  ,
                   target_ivs      = names(OvaryICT)[6:11] )

# multiple DVs with no target_ivs and group models
t4 <- PersonAlytic(output          = 'MultiDVnoIDnoIV'          ,
                   data            = OvaryICT                   ,
                   ids             = "Mare"                     ,
                   dvs             = names(OvaryICT)[c(3,9:11)] ,
                   phase           = "Phase"                    ,
                   time            = "Time"                     ,
                   package         = "lme"                      ,
                   individual_mods = FALSE                      ,
                   target_ivs      = NULL                       ,
                   autoSelect      = list()                     )


# repeat t4 with finite population correction
t5 <- PersonAlytic(output          = 'MultiDVnoIDnoIVFPC'       ,
                   data            = OvaryICT                   ,
                   ids             = "Mare"                     ,
                   dvs             = names(OvaryICT)[c(3,9:11)] ,
                   phase           = "Phase"                    ,
                   time            = "Time"                     ,
                   package         = "lme"                      ,
                   individual_mods = FALSE                      ,
                   target_ivs      = NULL                       ,
                   autoSelect      = list()                     ,
                   fpc             = 200                        )

# repeat t5 with piecewise model, first making time an integer (required for piecewise)
OvaryICT$TimeP <- round(30*OvaryICT$Time)
t6 <- PersonAlytic(output          = 'PiecewiseExample'         ,
                   data            = OvaryICT                   ,
                   ids             = "Mare"                     ,
                   dvs             = names(OvaryICT)[c(3,9:11)] ,
                   phase           = "Phase"                    ,
                   time            = "TimeP"                    ,
                   package         = "lme"                      ,
                   individual_mods = FALSE                      ,
                   target_ivs      = NULL                       ,
                   autoSelect      = list()                     ,
                   fpc             = 200                        ,
                   alignPhase      = "piecewise"                )

# clean up
file.remove( 'PiecewiseExample_PersonAlytic.csv' )
file.remove( 'REMLlme.txt' )
file.remove( 'MultiDVnoIDnoIVFPC_PersonAlytic.csv' )
file.remove( 'MultiDVnoIDnoIV_PersonAlytic.csv' )
file.remove( 'TargetIVStest_PersonAlytic.csv' )
file.remove( 'NotREMLarma.txt' )
file.remove( 'Test0FPC.txt' )
file.remove( 'Test1_PersonAlytic.csv' )
file.remove( 'Test0.txt' )


## End(Not run)

ICTatRTI/PersonAlytics documentation built on Dec. 13, 2024, 11:06 p.m.