analyzeData: Analyze simulated data replicates
In MikeKSmith/MSToolkit: The MSToolkit Library For Clinical Trial Design

analyzeData

R Documentation

Analyze simulated data replicates

Description

Analyzes a set of simulated trial data, possibly including interim analyses

Usage

analyzeData(
  replicates = "*",
  analysisCode,
  macroCode,
  interimCode = NULL,
  software = "R",
  grid = FALSE,
  waitAndCombine = TRUE,
  cleanUp = FALSE,
  removeMissing = TRUE,
  removeParOmit = TRUE,
  removeRespOmit = TRUE,
  seed = .deriveFromMasterSeed(),
  parOmitFlag = getEctdColName("ParOmit"),
  respOmitFlag = getEctdColName("RespOmit"),
  missingFlag = getEctdColName("Missing"),
  interimCol = getEctdColName("Interim"),
  doseCol = getEctdColName("Dose"),
  sleepTime = 15,
  deleteCurrData = TRUE,
  initialDoses = NULL,
  stayDropped = TRUE,
  fullAnalysis = TRUE,
  workingPath = getwd(),
  method = getEctdDataMethod()
)

Arguments

`replicates`	(Optional) Vector of replicates on which to perform analysis: all replicates are analyzed by default
`analysisCode`	(Required) File containing analysis code (for R or SAS) or an R function for analysis (R only)
`macroCode`	(Required) An R function to be used for macro evaluation of the result datasets. See the help file for the `macroEvaluation` function for more information
`interimCode`	(Optional) An R function to be applied to interim datasets in order to creation interim decisions. See the help file for the `interimAnalysis` function for more information. By default, no functions is provided, resulting in no interim analyses being performed
`software`	(Optional) The software to be used for analysis: either "R" or "SAS". "R" is the default software used
`grid`	(Optional) If available, should the analysis be split across available CPUs. Uses the "parallel" package to split jobs across available cores. Uses minimum of either: Cores-1 or getOption("max.clusters"), usually =2. FALSE by default.
`waitAndCombine`	(Optional) Should the process wait for all analyses to finish, then combine into micro and macro summary files? TRUE by default
`cleanUp`	(Optional) Should micro/macro directories be removed on completion? TRUE by default
`removeMissing`	(Optional) Should rows marked as 'Missing' during the data generation step be removed from the data before analysis is performed? TRUE by default
`removeParOmit`	(Optional) Should any rows marked as 'Omitted' during the parameter data generation step (ie. parameters out of range) be removed from the data before analysis is performed? TRUE by default
`removeRespOmit`	(Optional) Should any rows marked as 'Omitted' during the response generation step (ie. responses out of range) be removed from the data before analysis is performed? TRUE by default
`seed`	(Optional) Random number seed to use for the analysis. Based on the current random seed by default
`parOmitFlag`	(Optional) Parameter omit flag name. "PAROMIT" by default
`respOmitFlag`	(Optional) Response omit flag name. "RESPOMIT" by default
`missingFlag`	(Optional) Missing flag name. "MISSING" by default
`interimCol`	(Optional) Interim variable name. "INTERIM" by default
`doseCol`	(Optional) Dose variable name. "DOSE" by default
`sleepTime`	(Optional) Number of seconds to sleep between iterative checks for grid job completion. 15 seconds are used by default
`deleteCurrData`	(Optional) Should any existing micro evaluation and macro evaluation data be removed before new analysis is performed? TRUE by default
`initialDoses`	(Optional) For interim analyses, which doses should be present in interim 1? All are included by default
`stayDropped`	(Optional) For interim analyses, if a dose is dropped, should it stay dropped in following interims (as opposed to allowing the interim step to reopen the dose)
`fullAnalysis`	(Optional) Should a "full" analysis be performed on all doses? Default TRUE
`workingPath`	(Optional) Root directory in which replicate data is stored, and in which we should perform the analysis. Current working directory is used by default
`method`	Data storage method (ie. where the replicate data is stored). Given by getEctdDataMethod by default

Details

The first task of the function will be to check the options specifed: * If the "grid" network is unavailable or if the length of the "replicates" input is 1, the "grid" flag will be set to FALSE * If the "grid" flag is TRUE, the call to analyzeData will be split across multiple processors using the "parallel" library * If the length of the "replicates" vector is 1, the "waitAndCombine" flag will be set to FALSE * If the "waitAndCombine" flag is set to FALSE, the "cleanUp" flag will also be set to FALSE

The analyzeData function will iterate around each replicate specified in the "replicates" vector. For each replicate, the function will first call the analyzeRep with the required inputs. The output from the call to analyzeRep will be a data frame containing micro evaluation data. This data frame will be checked to ensure it is of the correct format. If the return from analyzeRep is a valid "Micro Evaluation" dataset, it will be saved to the "MicroEvaluation" folder, and also passed to the macroEvaluation function for further analysis. If the return from macroEvaluation is a valid "Macro Evaluation" dataset, it will be saved to the "MicroEvaluation" folder.

If the "waitAndCombine" flag is set to TRUE, the function will wait until all grid jobs are finished (if grid has been used), then compile the "Micro" and "Macro" evaluation results into single summary files (using the compileSummary function).

Value

This function will produce no direct output. As a consequence, however, many analysis, summary and log files will be produced.

Note

There are some restrictions on the code inputs to the analyzeData function. These restrictions are discussed here:

Analysis Code: The "analysisCode" input must be either an R function or a reference to an external file. If it is a reference to external file, it must contain either SAS code (if software is "SAS") or R code (if software is "R"). If the code is an R function, or an external R script, it must accept a data frame as it's only argument and return an acceptable "Micro Evaluation" data frame as set out in checkMicroFormat. If the code is an external SAS script, it must accept use a SAS dataset called "work.infile" and create a SAS dataset called "work.outfile" that conforms to the "Micro Evalutation" format as set out in checkMicroFormat. More information on "Micro Evaluation" structures can be found in the help file for function checkMicroFormat.

Interim Code: The "interimCode" input must be an R function that accepts a single "Micro Evaluation" data input, and returns an R "list" structure that is either empty or contains one or more of the following elements: An element called "STOP" which is a logical vector of length 1. This tells the analyzeData function whether the analysis should be halted at this interim An element called "DROP" which is a vector of numeric values relating to doses in the data to drop before the next interim is analyzed. More information on "Micro Evaluation" structures can be found in the help file for function interimAnalysis.

Macro Code: The "macroCode" input must be an R function that accepts an enhanced "Micro Evaluation" data input, and returns a valid "Macro Evaluation" data structure (as specified in the help file for the checkMacroFormat function.

Author(s)

Mike K Smith mstoolkit@googlemail.com

Examples

## Not run: 

# Standard analysis code
emaxCode <- function(data){
  library(DoseResponse)
  with( data,
    {
    uniDoses <- sort( unique(DOSE))
    eFit <- emaxalt( RESP, DOSE )
    outDf <- data.frame( DOSE = uniDoses,
      MEAN = eFit$dm[as.character(uniDoses)],
      SE = eFit$dsd[as.character(uniDoses)] )
    outDf$LOWER <- outDf$MEAN - 2 * outDf$SE
    outDf$UPPER <- outDf$MEAN + 2 * outDf$SE
    outDf$N     <- table(DOSE)[ as.character(uniDoses) ]
    outDf
  })
}

# Macro evaluation code
macrocode <- function(data) {
  # making up a t-test
  mu0   <- data$MEAN[ data$DOSE == 0 & data$INTERIM == 0]
  mu100 <- data$MEAN[ data$DOSE == 100 & data$INTERIM == 0]
  n0    <- data$N[ data$DOSE == 0 & data$INTERIM == 0]
  n100  <- data$N[ data$DOSE == 100 & data$INTERIM == 0]
  sd0   <- data$SE[ data$DOSE == 0 & data$INTERIM == 0]
  sd100 <- data$SE[ data$DOSE == 100 & data$INTERIM == 0]

  sddiff <- if( n0 == n100 ){
    sqrt( (sd0^2 + sd100^2)  / (n0 + n100) )
  } else {
    sqrt( (1/n0 + 1/n100) * ( (n0-1)*sd0^2 + (n100-1)*sd100^2  ) / (n0+n100-2)  )
  }
  tstat  <- ( mu100 - mu0 ) / sddiff
  success <- abs(tstat) > qt( .975, n0+n100-2)

  data.frame( SUCCESS = success, TSTAT = tstat )
}

# Interim analysis code
interimCode <- function( data ){
  dropdose  <- with( data, DOSE [ sign(UPPER) != sign(LOWER) & DOSE != 0] )
  outList <- list()
  if( length(dropdose) > 0 ) outList$DROP <- dropdose
  outList$STOP <- length(dropdose) == nrow(data)-1
  outList
}

# Run analysis
analyzeData( 1:5, analysisCode = emaxCode, macroCode = macrocode,
  interimCode = interimCode )


## End(Not run)

MikeKSmith/MSToolkit documentation built on July 3, 2024, 1:53 a.m.