PartiallyObservedHMM
This package creates a hidden Markov model (HMM) that accounts for partially observed data. Partially observed data is represented as a 3-dimensional array where all possibilities for individual "i" at time "t" can be visualized as a vector in row "i" column "t" of the 3-d array. For our model, data must be generated from three unique tests. The first data source must be a trichotomous 3-d array (that may or may not include partially observed data). The second and third data sources must be 2-dimensional matrices. The second and third data sources must be identical in dimension along with the first two dimensions of the first data source.
In our example the first data source is an HPV test. Persistence has been added so the data is now trichotomous (negative, newly-positive, persistent). The second source is a pap smear, and the final source is a colposcopy.
install.packages("devtools")
devtools::install_github("jordanaron22/PartiallyObservedHMM")
library(PartiallyObservedHMM)
#####Sample Parameters
n <- 1000
t <- 5
pi_0 <- 0.1
reduce <- T
epsilon <- .005
#####Estimation
three_mats <- GenerateSimulatedData(n,t,pi_0)
data_pattern_list <- CombineandPattern(three_mats[[1]],three_mats[[2]],three_mats[[3]],reduce,t)
data_pattern <- data_pattern_list[[1]]
freq_vec <- data_pattern_list[[2]]
init <- GetInit()
tran <- GetTran()
class <- GetClass()
pi_0 <- .05
max2 <- 3
max3 <- 2
parameters <- EM(data_pattern,freq_vec,epsilon,t,init,tran,class,pi_0,max2,max3)
initial_parameters <- parameters[[1]]
estimated_parameters <- parameters[[2]]
likelihood <- parameters[[3]]
Where n is the number of individuals, t is the number of observations per individual, p is the proportion of stayers, reduce is a T/F for if the third test value should be imputed to 0 if the first two test values are 0.
init, tran, class, and pi_0 are the parameters that we are optimizing. Let k be the number of states. init is a k length vector giving the probability of initially starting in any state. tran is a k x k matrix denoting the probability of transitioning from one state to another (from row to column). class is a k x k matrix with the probabilities of classifying the latent (true) state as the observed state (latent is row, observed is column). pi_0 is the probability of being a stayer. max2 is the number of outcomes of the second test, and max3 is for the third test. epsilon is the likelihood threshold for EM convergence.
initial_parameters and estimated_parameters are lists with similar structure for the initial and estimated parameters, respectivly. The first enrty is the vector of initial probabilities, the second is the matrix of transition probabilities, the third is the matrix of classification probabilities, and the fourth is the stayer proportion. likelihood is the final likelihood.
data_pattern_list <- CombineandPattern(mat_one,mat_two,mat_three,reduce,t, max2, max3)
data_pattern <- data_pattern_list[[1]]
freq_vec <- data_pattern_list[[2]]
parameters <- EM(data_pattern, freq_vec, epsilon, t, initial, transition, classification, pi_0, max2, max3)
Where the variables are defined as above and: mat_one, mat_two, and mat_three are data sources for three unique tests, max2 is the number of possible outcomes in mat_two, max2 is the number of possible outcomes in mat_three, initial is an initial estimate for the vector of initial state probabilities, transition is an initial estimate for the state transition matrix, classification is an initial estimate for the state classification matrix, and pi_0 is an initial estimate of the proportion of stayers.
#####Sample Parameters
n <- 1000
t <- 5
pi_0 <- 0.1
lambda <- c(.75,.25)
reduce <- T
epsilon <- .005
#####Estimation
three_mats <- GenerateSimulatedDataRaff(n, t, pi_0, lambda)
data_pattern_list <- CombineandPattern(three_mats[[1]][[1]], three_mats[[1]][[2]], three_mats[[1]][[3]], reduce, t)
data_pattern <- data_pattern_list[[1]]
freq_vec <- data_pattern_list[[2]]
init <- GetInitRaff()
tran <- GetTran()
class <- GetClass()
pi_0 <- .05
lambda <- c(.7,.3)
max2 <- 3
max3 <- 2
parameters <- EMRaff(data_pattern, freq_vec, epsilon, t, init, tran, class, pi_0, lambda, max2, max3)
initial_parameters <- parameters[[1]]
estimated_parameters <- parameters[[2]]
likelihood <- parameters[[3]]
Variables are defined similarly to 2.1.1, except for init and lambda. init is now a k by k matrix (where k is the number of states) giving the probability of being in the first two states (the row determines the first state and the column determines the second). lambda is a vector of length two that gives the weights for the first and second order transition from the Rafferty method (the first value in the vector corresponds to the first order weight.
initial_parameters and estimated_parameters is defined similarly as above except now the list have five entries, with the final enrty being lambda. likelihood is defined the same as above as well.
data_pattern_list <- CombineandPattern(mat_one, mat_two, mat_three, reduce, t, max2, max3)
data_pattern <- data_pattern_list[[1]]
freq_vec <- data_pattern_list[[2]]
parameters <- EMRaff(data_pattern,freq_vec,epsilon,t, initial, transition, classification, pi_0, lambda, max2, max3)
This is extremeley similar to 2.1.2, except for the use of EMRaff and the inclusion of the lambda variable
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.