knitr::opts_chunk$set(echo = TRUE) library(knitr) library(PersonAlytics) #OvaryICT <<- PersonAlytics::OvaryICT depends <- function() { d <- scan('../DESCRIPTION', what = 'character', sep='\n') wd <- which( grepl("*Depends*", d) ) wr <- which( grepl("RoxygenNote*", d) ) pkgs <- c(d[(wd+1):(wr-1)]) pkgs <- gsub( " ", "", pkgs) pkgs <- gsub( "\t", "", pkgs) pkgs <- gsub( ",", "", pkgs) pkgs } pkgs <- depends() pkgss <- matrix( unlist( strsplit(pkgs, "\\(") ), ncol=2, byrow=TRUE)[,1]
The pupropose of this vignette is to illustrate how to use PersonAlytics
to analyze
Single case time series data, also known as single subject or N-of-1 studies.
Small sample intensive longitudinal design data.
Ideographic clinical trial data (ICT), which a single-case or small sample longitudinal data set multiple treatment conditions (at a minimum, a baseline and follow-up condition) where all partipants exerience all treatment conditions.
Automated high throughput analyses of these models in situations where the combination of predictors, outcomes, and/or individual level models yields an unweildly number of analyses to implement manually.
These three types of data are analyzed using longitudinal mixed effects models, also known as hierarchical linear models, multilevel models, latent growth curve models, or mixed method trajectory analysis. There are already several R packages such as nlme
and gamlss
that can be used to implement these models. The purpose of the Personalytics package is to automate the following:
Model selection for detecting the shape of the trajectory over time. Using a good fitting trajectory shape yields parameters that correspond to the shape and improve interpretability. Current options include polynomial growth and piecewise growth.
Model selection for detecting an appropriate time series model for the residuals. Getting a good fitting residual covariance model improves standard error estimation.
Automate studies which may have thousands or millions of combinations of
outcomes
predictors
individual level models
The PersonAlytic
framework -- Ty --
PersonAlytics
First you must install R
, available at https://cran.r-project.org/
. It is also suggested that you use a modern code editor such as Rstudio (https://www.rstudio.com/
).
PersonAlytics
requries several packages which can be installed (or updated if you already have them) by pasting the following function[^1] into your R
console:
[^1]: See https://gist.github.com/stevenworthington/3178163 for original function.
install.deps <- function(pkg) { new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])] old.pkg <- pkg[ (pkg %in% installed.packages()[, "Package"])] if (length(new.pkg)) install.packages(new.pkg, dependencies = TRUE, repos = "https://cran.rstudio.com/") if (length(old.pkg)) update.packages(old.pkg, dependencies = TRUE) }
The following packages should then be installed using the install.deps
function as follows:
cat(paste("install.deps(c(", paste("'", pkgss[-1], "'", collapse=', ', sep=""), "))", sep=""), "\n")
Now you're ready to install PersonAlytics
install.packages(".../path/on/your/computer/PersonAlytics_0.0.1.tar.gz", repos = NULL, type = "source")
. To obtain this file, contact the authors at https://personalytics.rti.org/.
PersonAlytics
Now we can load PersonAlytics
:
library(PersonAlytics)
We will illustrate the basic options of the PersonAlytic()
function using the Ovary data from the nlme
package which have been modified to represent an ICT. The PersonAlytic()
function is the primary user interface for the PersonAlytics
package. You can access the documentation for PersonAlytic()
by typing
?PersonAlytic
The first six rows of the OvaryICT data are shown in Table x where it can be seen that the original Ovary data set has been augmented with Phase variables (which are an essentialy component of an ICT) and six randomly generated predictors named Target1 to Target6 (these will be used later).
kable(head(PersonAlytics::OvaryICT))
The first five parameters that can be provided to PersonAlytic()
are
data
: the name of your data set. This data needs to have been read into R prior to using PersonAlytic()
. A web search can be used to learn how to read in multiple types of data into R, including (but not limited to) csv, xlsx, sas, stata, spss, and most database formats. Since we assume that the data has alread y been read into R, it has an R variable name and no quotes are needed.
ids
: the name of the identification variable. This must be a quoted variable name matching your ID variable in the data set that you provided to the parameter data
.
dvs
: the name of the dependent variable. This must be a quoted variable name matching your dependent variable in the data set that you provided to the parameter data
. For multiple dependent variables, see Section xx.
phase
: the name of the phase variable. This is required for an ICT but is not required by PersonAlytic()
. This must be a quoted variable name matching your dependent variable in the data set that you provided to the parameter data
.
time
: the name of the time variable. This must be a quoted variable name matching your dependent varibale in the data set that you provided to the parameter data
.
t1 <- PersonAlytic(data=OvaryICT, ids="Mare", dvs="follicles", phase="Phase", time="Time")
In the console output shows two automations implemented in PersonAlytics. The first is 'Automatic detection of the time/outcome relationship' and the second is 'Automatic detection of the residual correlation structure'. These will be described in detail in Section x and Section x, respectively.
The object 't1' is has class lme
class(t1)
If you are familiar with the lme
package, this is an implementation of the linear (normal) mixed effects model.
PersonAlytic()
FeaturesThe argument output
is character string that will be used to name a file for saving output. If left NULL
, the default is 'PersonAlytic_Output'. Do not give a file extension, these will be added automatically. For example, if output='MyResults', the output file will be called 'MyResults.csv' if high throughput options are used, or 'MyResults.txt' if only one model is fit. A full path with '/' instead of '\' can also be used. For example, output='C:/MyResults' will produce the files 'C:/MyResults.csv' or 'C:/MyResults.txt'. If a full path is not given, the results will be saved in the working directory (see ?getwd
).
detectTO
turns automatic detection of the time/outcome relationship on or off. Options are TRUE
and FALSE
. Automatic detection is implemented using likelihood ratio test of the model with $Y=Time + Time^2 + \dots + Time^{O}$ vs. $Y=Time + Time^2 + \dots + Time^{O - 1}$ where $O$ is the polynomial order of the time variable.
maxOrder
is the largest value of $O$ to test. Only used if detectTO=TRUE
and alignPhase='none'
or alignPhase='align'
.
alignPhase
gives options for aligning the time variable with the phase variable.
alignPhase='none'
is the default and the time variable is left as-is.
alignPhase='align'
aligns the time variable at the transition from the first and second phase within each participant. This alignment makes it so the effect at time=0 is the start of the second phase. For example, if the second phase starts at time 6 for participant A, and at time 8 for participant B, and there are 15 timepoints for each starting at 1:
alignPhase='piecewise'
create a piecewise straight-line growth model within each phase. This option can be use to simplify complex trajectories when the within-phase trajectory is approximately linear within each phase. Although this may make results easier to interprent than a curvelinear model (e.g., when O=3), the piecewise model may have many more random effects and therefore be less parsimonious than a model with polynomial time. If alignPhase='piecewise'
, detectTO
and maxOrder
are ignored.
time_power
is used when detecTO=FALSE
and you want to specify a value for $O$.
detectAR
turns automatic detection of the residual correlation structure on or off. Options are TRUE
and FALSE
.
If n=1, automatic detection is implemented using the auto.arima
function of the forecast
package. See ?auto.arima
.
If n>1, automatic detection is implemented by comparing the random intercept model with time as specified in the prior section, adding all combinations of ARMA(p,q)
models for the residuals where $p=0,\dots,P$ and $q=0,\dots,Q$. The values of $P$ and $Q$ are specified in the argument PQ
. Model comparisons are done using either Akaike's Information Criterion (AIC) or the Bayesian Information Criterion (BIC) as specified by the argument whichIC
. Likelihood ratio tests are not an option here as not all ARMA(p,q)
models are nested.
PQ
is a pair of numbers specifying the maximum order of $P$ and $Q$ in ARMA models to be compared for detecting an adequate residual correlation structure. The default is PQ=c(3,3)
. See also max.p
and max.q
in ?auto.arima
(not to be confused with max.P
and max.Q
which are parameters for a seasonal model and are not currently implemented).
whichIC
specifies the information criterion for selectiong the best fitting ARMA(p,q) model. Options are whichIC='BIC'
and whichIC='AIC'
. BIC is the default.
correlation
is used to specify a correlation structure when detectAR=FALSE
. All correlation structures in the nlme
package are options. Load nlme
using library(nlme
) and type ?corStruct
into the R console for options. The value given must be quoted, e.g., correlation='corAR1()'
.
High throughput analyses are automatically implemented when any of the following are present. If all are present, all possible combinations are fit.
dvs
has more than one dependent variable.
target_ivs
has more than one target independent variable that should be added to the model one at a time (as opposed to ivs
which are all included in every model across all combinations of dvs
, target_ivs
, and indivdiuals).
individual_mods=TRUE
, which fits the model separately for each individual specified in ids
.
When high throughput analyses are invoked, the following arguments control the model fitting and output:
cores
: Parallelization is the process of splitting all the models across the processing cores on your computer to speed model estimation. By default, ICTpower
uses one fewer cores than are available on your machine. You can override this using the cores
argument. It is not recommended that you specify more cores than are available on your machine as it may cause R or your computer to crash. Use an internet search (including your operating system in the search terms) to find instructions on determining the number of cores your computer has.p.method
: When target_ivs
has more than one target independent variable, p-value adjustments can be made across target_ivs
within each depedent variable in dvs
and/or each individual in ids
if individual_mods=TRUE
. The options are those availble in the p.adjust
function whose documentation can be viewed by typing ?p.adjust
into the R console. The default is p.method='BY'
. alpha
is the Type I error rate used for adjusting p-values.
nbest
: When target_ivs
has more than one target independent variable, an additional report can extract the nbest
target independent variables as defined by those target independent variable with the largest coefficients among those with $p<=\alpha$. If the target independent variables are on different scales, you should standardize them first using the standardize
argument (descriped in detail later in its own section).Output is named by the output
argument of the PersonAlytic
function.
The high throughput output is a *.csv file that can be opened in any spreadsheet program. The rows are all possible combinations of dvs
, target_ivs
, and ids
(if individual_mods=TRUE
). The column variables are in five sets:
Variables identify the combination of user inputs that lead to a given row's analysis.
Variables describing whether the analysis converged and variables to help diagnose a failed run (e.g., zero variance in an outcome or independent variable.).
Descriptive statistics.
Model results with a parameter estimate, standard error, t-value, degrees of freedom, and p-value.
fpc>0
. model relusts repeated with a finite population correction for the standard error (and consequently, the p-value).The argument interactions
can be use to specify interactions between independent variables. By default, the time
by phase
interaction is always included. Additional interactions are specified as a list of paired variables. For example, if you want all interactions between x
, y
, and z
, you would use interactions = list(c('x', 'y'), c('x', 'z'), c('y', 'z'))
.
Comparisons across multiple depedent variables, or across multiple independent variables are made possible by standardizing these variables. This can be done by the user prior to analysis, or via the standardize
argument. The standardize
argument is a list with three slots named 'dv', 'iv', and 'byids'. If dv=TRUE
the dependent variables are standardized. If iv=TRUE
all of independent variables in ivs
and target_ivs
are standardized. If byids=TRUE
standardization is done within each individual rather than across invidivuals.
alignPhase
gives options for aligning the time variable with the phase variable.
alignPhase='none'
is the default and the time variable is left as-is.
alignPhase='align'
aligns the time variable at the transition from the first and second phase within each participant. This alignment makes it so the effect at time=0 is the start of the second phase. For example, if the second phase starts at time 6 for participant A, and at time 8 for participant B, and there are 15 timepoints for each starting at 1:
alignPhase='piecewise'
create a piecewise straight-line growth model within each phase. This option can be use to simplify complex trajectories when the within-phase trajectory is approximately linear within each phase. Although this may make results easier to interprent than a curvelinear model (e.g., when O=3), the piecewise model may have many more random effects and therefore be less parsimonious than a model with polynomial time. If alignPhase='piecewise'
, detectTO
and maxOrder
are ignored.
charSub
is a character list. The default in charSub=NULL.
A list of paired character strings for character substitution in the output. If the names of the target predictors in target_ivs
had to be edited to make valid variable names (see ?make.names
), this parameter allows users put the illegal characters back in. For example, if the original variable name was "17.00_832.2375m/z", a letter would need to prefix the variable name and the "/" would need to be replaced with another character, e.g., "X17.00_832.2375m.z". To get the row names of the output back to original variable name, use charSub=list(c("X", ""), c("m.z", "m/z")). Note that inputs to charSub must be in double quotes and are case sensitive. All duplicates will be substituted. For example, if the variable name was "X1X23.x" and charSub=list(c("X", "")), the resulting row label for this variable would be "123.x".
A finite population correction (FPC) can be used when you sample more than 5% of the population without replacement. In this situation the central limit theorem doesn't hold and the standard errors of your parameter estimates will be to large. Before illustrating how to implement a finite population correction, we first discuss the conditions under which a finite population correction may make a difference in a power analysis. This will only occur in situation where a large number of the B
replications have p-values just larger than alpha and the FPC results in these p-values being less than alpha. In our experience, we rarely run into conditions where this is the case. The p-value distributions are usually smoothly positively skewed or near uniform. Neither of this distributions puts enough p-values near alpha for the FPC to have a large effect on power. This isn't to say users shouldn't apply the FPC for real data analyses using the PersonAlytics
package, only that they should not expect the FPC to yield large improvement in power.
The argument fpc
can be set to your finite population size and an FPC will be included in the analyses and output.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.