knitr::opts_chunk$set( echo = TRUE, collapse = TRUE ) options(width=100)
The step I of the MASTA algorithm involves calculating the mean density function and covariance functions from large unlabeled data. To accelerate this calculation, we have implemented a fast and memory-efficient approach via multicore parallel computing.
library(MASTA) parallel::detectCores()
TrainCode <- data_org$TrainCode TrainCode[, monthstd := month / fu_time] # standardized time (range: 0-1) TrainCode
In the demo, the TrainCode
data is used in the FPCA step and it contains longitudinal encounter records for 20,600 patients (20,000 unlabeled & 600 labeled).
# non-zero NZ <- TrainCode[, 4:6] != 0 nz.pct <- round(colMeans(NZ) * 100) data.frame(nz = colSums(NZ), nz.pct = paste0(nz.pct, "%"))
timing <- function(code, ngrid, n_core) { x <- seq(0, 1, length.out = ngrid) pred <- TrainCode[[code]] TrainNP <- pred[pred > 0] t <- rep(TrainCode$monthstd, pred) h1 <- bw.nrd(t) h2 <- bw.nrd(t)^(5/6) system.time(FPC.Kern.S(x, t, TrainNP, h1, h2, n_core = n_core))["elapsed"] } # timing(code = "pred1", ngrid = 101, n_core = 4)
DF <- expand.grid( codes = paste0("pred", 1:3), ngrid = c(100, 200, 400) + 1, n_core = c(1, 2, 4), time = NA) for (i in 1:nrow(DF)) { code <- as.character(DF$code[i]) DF$time[i] <- timing(code, DF$ngrid[i], DF$n_core[i]) }
The figure below shows the elapsed time (in seconds) for three event codes under different settings. Even for ngrid
= 401, the elapsed time is less than one minute for each setting. The linear scalability allows the MASTA algorithm to be feasible even in studies with very large unlabeled data.
library(ggplot2) ggplot(DF, aes(x = factor(n_core), y = time)) + facet_grid(paste("ngrid =", ngrid) ~ codes) + geom_bar(stat = "identity") + theme_bw() + xlab("Number of CPU Cores") + ylab("Elapsed Time (seconds)")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.