knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(PersonAlytics)
#OvaryICT <<- PersonAlytics::OvaryICT

depends <- function()
{
  d <- scan('../DESCRIPTION', what = 'character', sep='\n')
  wd <- which( grepl("*Depends*", d) )
  wr <- which( grepl("RoxygenNote*", d) )
  pkgs <- c(d[(wd+1):(wr-1)])
  pkgs <- gsub( " ", "", pkgs)
  pkgs <- gsub( "\t", "", pkgs)
  pkgs <- gsub( ",", "", pkgs)
  pkgs
}
pkgs <- depends()
pkgss <-  matrix( unlist( strsplit(pkgs, "\\(") ), ncol=2, byrow=TRUE)[,1]

Introduction

The pupropose of this vignette is to illustrate how to use PersonAlytics to analyze

  1. Single case time series data, also known as single subject or N-of-1 studies.

  2. Small sample intensive longitudinal design data.

  3. Ideographic clinical trial data (ICT), which a single-case or small sample longitudinal data set multiple treatment conditions (at a minimum, a baseline and follow-up condition) where all partipants exerience all treatment conditions.

  4. Automated high throughput analyses of these models in situations where the combination of predictors, outcomes, and/or individual level models yields an unweildly number of analyses to implement manually.

These three types of data are analyzed using longitudinal mixed effects models, also known as hierarchical linear models, multilevel models, latent growth curve models, or mixed method trajectory analysis. There are already several R packages such as nlme and gamlss that can be used to implement these models. The purpose of the Personalytics package is to automate the following:

  1. Model selection for detecting the shape of the trajectory over time. Using a good fitting trajectory shape yields parameters that correspond to the shape and improve interpretability. Current options include polynomial growth and piecewise growth.

  2. Model selection for detecting an appropriate time series model for the residuals. Getting a good fitting residual covariance model improves standard error estimation.

  3. Automate studies which may have thousands or millions of combinations of

  4. outcomes

  5. predictors

  6. individual level models

The PersonAlytic framework -- Ty --

Installing PersonAlytics

First you must install R, available at https://cran.r-project.org/. It is also suggested that you use a modern code editor such as Rstudio (https://www.rstudio.com/).

PersonAlytics requries several packages which can be installed (or updated if you already have them) by pasting the following function[^1] into your R console:

[^1]: See https://gist.github.com/stevenworthington/3178163 for original function.

install.deps <- function(pkg)
{
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  old.pkg <- pkg[ (pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg)) install.packages(new.pkg, dependencies = TRUE,
                                        repos = "https://cran.rstudio.com/")
  if (length(old.pkg)) update.packages(old.pkg, dependencies = TRUE)
}

The following packages should then be installed using the install.deps function as follows:

cat(paste("install.deps(c(", paste("'", pkgss[-1], "'", collapse=', ', sep=""), "))", 
          sep=""), "\n")

Now you're ready to install PersonAlytics install.packages(".../path/on/your/computer/PersonAlytics_0.0.1.tar.gz", repos = NULL, type = "source"). To obtain this file, contact the authors at https://personalytics.rti.org/.

Starting PersonAlytics

Now we can load PersonAlytics:

library(PersonAlytics)

We will illustrate the basic options of the PersonAlytic() function using the Ovary data from the nlme package which have been modified to represent an ICT. The PersonAlytic() function is the primary user interface for the PersonAlytics package. You can access the documentation for PersonAlytic() by typing

?PersonAlytic

The first six rows of the OvaryICT data are shown in Table x where it can be seen that the original Ovary data set has been augmented with Phase variables (which are an essentialy component of an ICT) and six randomly generated predictors named Target1 to Target6 (these will be used later).

kable(head(PersonAlytics::OvaryICT))

The first five parameters that can be provided to PersonAlytic() are

  1. data: the name of your data set. This data needs to have been read into R prior to using PersonAlytic(). A web search can be used to learn how to read in multiple types of data into R, including (but not limited to) csv, xlsx, sas, stata, spss, and most database formats. Since we assume that the data has alread y been read into R, it has an R variable name and no quotes are needed.

  2. ids: the name of the identification variable. This must be a quoted variable name matching your ID variable in the data set that you provided to the parameter data.

  3. dvs: the name of the dependent variable. This must be a quoted variable name matching your dependent variable in the data set that you provided to the parameter data. For multiple dependent variables, see Section xx.

  4. phase: the name of the phase variable. This is required for an ICT but is not required by PersonAlytic(). This must be a quoted variable name matching your dependent variable in the data set that you provided to the parameter data.

  5. time: the name of the time variable. This must be a quoted variable name matching your dependent varibale in the data set that you provided to the parameter data.

t1 <- PersonAlytic(data=OvaryICT,
                ids="Mare",
                dvs="follicles",
                phase="Phase",
                time="Time")

In the console output shows two automations implemented in PersonAlytics. The first is 'Automatic detection of the time/outcome relationship' and the second is 'Automatic detection of the residual correlation structure'. These will be described in detail in Section x and Section x, respectively.

The object 't1' is has class lme

class(t1)

If you are familiar with the lme package, this is an implementation of the linear (normal) mixed effects model.

PersonAlytic() Features

Naming your output

The argument output is character string that will be used to name a file for saving output. If left NULL, the default is 'PersonAlytic_Output'. Do not give a file extension, these will be added automatically. For example, if output='MyResults', the output file will be called 'MyResults.csv' if high throughput options are used, or 'MyResults.txt' if only one model is fit. A full path with '/' instead of '\' can also be used. For example, output='C:/MyResults' will produce the files 'C:/MyResults.csv' or 'C:/MyResults.txt'. If a full path is not given, the results will be saved in the working directory (see ?getwd).

Automatic detection of the time/outcome relationship

  1. detectTO turns automatic detection of the time/outcome relationship on or off. Options are TRUE and FALSE. Automatic detection is implemented using likelihood ratio test of the model with $Y=Time + Time^2 + \dots + Time^{O}$ vs. $Y=Time + Time^2 + \dots + Time^{O - 1}$ where $O$ is the polynomial order of the time variable.

  2. maxOrder is the largest value of $O$ to test. Only used if detectTO=TRUE and alignPhase='none' or alignPhase='align' .

  3. alignPhase gives options for aligning the time variable with the phase variable.

    • alignPhase='none' is the default and the time variable is left as-is.

    • alignPhase='align' aligns the time variable at the transition from the first and second phase within each participant. This alignment makes it so the effect at time=0 is the start of the second phase. For example, if the second phase starts at time 6 for participant A, and at time 8 for participant B, and there are 15 timepoints for each starting at 1:

      • Participant A's resulting time variable will be -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
      • Participant B's resulting time variable will be -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7.
    • alignPhase='piecewise' create a piecewise straight-line growth model within each phase. This option can be use to simplify complex trajectories when the within-phase trajectory is approximately linear within each phase. Although this may make results easier to interprent than a curvelinear model (e.g., when O=3), the piecewise model may have many more random effects and therefore be less parsimonious than a model with polynomial time. If alignPhase='piecewise', detectTO and maxOrder are ignored.

  4. time_power is used when detecTO=FALSE and you want to specify a value for $O$.

Automatic detection of the residual correlation structure

  1. detectAR turns automatic detection of the residual correlation structure on or off. Options are TRUE and FALSE.

    • If n=1, automatic detection is implemented using the auto.arima function of the forecast package. See ?auto.arima.

    • If n>1, automatic detection is implemented by comparing the random intercept model with time as specified in the prior section, adding all combinations of ARMA(p,q) models for the residuals where $p=0,\dots,P$ and $q=0,\dots,Q$. The values of $P$ and $Q$ are specified in the argument PQ. Model comparisons are done using either Akaike's Information Criterion (AIC) or the Bayesian Information Criterion (BIC) as specified by the argument whichIC. Likelihood ratio tests are not an option here as not all ARMA(p,q) models are nested.

  2. PQ is a pair of numbers specifying the maximum order of $P$ and $Q$ in ARMA models to be compared for detecting an adequate residual correlation structure. The default is PQ=c(3,3). See also max.p and max.q in ?auto.arima (not to be confused with max.P and max.Q which are parameters for a seasonal model and are not currently implemented).

  3. whichIC specifies the information criterion for selectiong the best fitting ARMA(p,q) model. Options are whichIC='BIC' and whichIC='AIC'. BIC is the default.

  4. correlation is used to specify a correlation structure when detectAR=FALSE. All correlation structures in the nlme package are options. Load nlme using library(nlme) and type ?corStruct into the R console for options. The value given must be quoted, e.g., correlation='corAR1()'.

High throughput analyses

High throughput analyses are automatically implemented when any of the following are present. If all are present, all possible combinations are fit.

  1. dvs has more than one dependent variable.

  2. target_ivs has more than one target independent variable that should be added to the model one at a time (as opposed to ivs which are all included in every model across all combinations of dvs, target_ivs, and indivdiuals).

  3. individual_mods=TRUE, which fits the model separately for each individual specified in ids.

When high throughput analyses are invoked, the following arguments control the model fitting and output:

False discovery rate (FDR) and Type-I error adjustments

p.method: When target_ivs has more than one target independent variable, p-value adjustments can be made across target_ivs within each depedent variable in dvs and/or each individual in ids if individual_mods=TRUE. The options are those availble in the p.adjust function whose documentation can be viewed by typing ?p.adjust into the R console. The default is p.method='BY'. alpha is the Type I error rate used for adjusting p-values.

Selecting predictors with the largest effect sizes

Generalized linear mixed effects models

Reading the Output

Output is named by the output argument of the PersonAlytic function.

Single analysis output

High throughput output

The high throughput output is a *.csv file that can be opened in any spreadsheet program. The rows are all possible combinations of dvs, target_ivs, and ids (if individual_mods=TRUE). The column variables are in five sets:

  1. Variables identify the combination of user inputs that lead to a given row's analysis.

  2. Variables describing whether the analysis converged and variables to help diagnose a failed run (e.g., zero variance in an outcome or independent variable.).

  3. Descriptive statistics.

  4. Model results with a parameter estimate, standard error, t-value, degrees of freedom, and p-value.

  5. If fpc>0. model relusts repeated with a finite population correction for the standard error (and consequently, the p-value).

Additional Considerations

Interaction terms

The argument interactions can be use to specify interactions between independent variables. By default, the time by phase interaction is always included. Additional interactions are specified as a list of paired variables. For example, if you want all interactions between x, y, and z, you would use interactions = list(c('x', 'y'), c('x', 'z'), c('y', 'z')).

Standardizing data

Comparisons across multiple depedent variables, or across multiple independent variables are made possible by standardizing these variables. This can be done by the user prior to analysis, or via the standardize argument. The standardize argument is a list with three slots named 'dv', 'iv', and 'byids'. If dv=TRUE the dependent variables are standardized. If iv=TRUE all of independent variables in ivs and target_ivs are standardized. If byids=TRUE standardization is done within each individual rather than across invidivuals.

Phase alignment

alignPhase gives options for aligning the time variable with the phase variable.

Character substitution in variable names

charSub is a character list. The default in charSub=NULL.

A list of paired character strings for character substitution in the output. If the names of the target predictors in target_ivs had to be edited to make valid variable names (see ?make.names), this parameter allows users put the illegal characters back in. For example, if the original variable name was "17.00_832.2375m/z", a letter would need to prefix the variable name and the "/" would need to be replaced with another character, e.g., "X17.00_832.2375m.z". To get the row names of the output back to original variable name, use charSub=list(c("X", ""), c("m.z", "m/z")). Note that inputs to charSub must be in double quotes and are case sensitive. All duplicates will be substituted. For example, if the variable name was "X1X23.x" and charSub=list(c("X", "")), the resulting row label for this variable would be "123.x".

Modeling the variance in linear models

Finite population correction

A finite population correction (FPC) can be used when you sample more than 5% of the population without replacement. In this situation the central limit theorem doesn't hold and the standard errors of your parameter estimates will be to large. Before illustrating how to implement a finite population correction, we first discuss the conditions under which a finite population correction may make a difference in a power analysis. This will only occur in situation where a large number of the B replications have p-values just larger than alpha and the FPC results in these p-values being less than alpha. In our experience, we rarely run into conditions where this is the case. The p-value distributions are usually smoothly positively skewed or near uniform. Neither of this distributions puts enough p-values near alpha for the FPC to have a large effect on power. This isn't to say users shouldn't apply the FPC for real data analyses using the PersonAlytics package, only that they should not expect the FPC to yield large improvement in power.

The argument fpc can be set to your finite population size and an FPC will be included in the analyses and output.



ICTatRTI/PersonAlytics documentation built on Dec. 13, 2024, 11:06 p.m.