README.md
In jordanaron22/PartiallyObservedHMM: HMM for Partiall Observed Three Diagnostic Test Data

R package `PartiallyObservedHMM`

This package creates a hidden Markov model (HMM) that accounts for partially observed data. Partially observed data is represented as a 3-dimensional array where all possibilities for individual "i" at time "t" can be visualized as a vector in row "i" column "t" of the 3-d array. For our model, data must be generated from three unique tests. The first data source must be a trichotomous 3-d array (that may or may not include partially observed data). The second and third data sources must be 2-dimensional matrices. The second and third data sources must be identical in dimension along with the first two dimensions of the first data source.

In our example the first data source is an HPV test. Persistence has been added so the data is now trichotomous (negative, newly-positive, persistent). The second source is a pap smear, and the final source is a colposcopy.

11/27/19: Created package
12/02/19: Added 'read.me'
02/10/20: Added Raftery method functions and documentation
03/05/20: QoL updates to readme
05/14/20: Bug fix for simulated data generation

install.packages("devtools")
devtools::install_github("jordanaron22/PartiallyObservedHMM")
library(PartiallyObservedHMM)

2.1.1 Simulated Data

#####Sample Parameters
n <- 1000
t <- 5
pi_0 <- 0.1
reduce <- T
epsilon <- .005

#####Estimation
three_mats <- GenerateSimulatedData(n,t,pi_0)

data_pattern_list <- CombineandPattern(three_mats[[1]],three_mats[[2]],three_mats[[3]],reduce,t)
data_pattern <- data_pattern_list[[1]]
freq_vec <- data_pattern_list[[2]]

init <- GetInit()
tran <- GetTran()
class <- GetClass()
pi_0 <- .05
max2 <- 3
max3 <- 2

parameters <- EM(data_pattern,freq_vec,epsilon,t,init,tran,class,pi_0,max2,max3)

initial_parameters <- parameters[[1]]
estimated_parameters <- parameters[[2]]
likelihood <- parameters[[3]]

Where n is the number of individuals, t is the number of observations per individual, p is the proportion of stayers, reduce is a T/F for if the third test value should be imputed to 0 if the first two test values are 0.

init, tran, class, and pi_0 are the parameters that we are optimizing. Let k be the number of states. init is a k length vector giving the probability of initially starting in any state. tran is a k x k matrix denoting the probability of transitioning from one state to another (from row to column). class is a k x k matrix with the probabilities of classifying the latent (true) state as the observed state (latent is row, observed is column). pi_0 is the probability of being a stayer. max2 is the number of outcomes of the second test, and max3 is for the third test. epsilon is the likelihood threshold for EM convergence.

initial_parameters and estimated_parameters are lists with similar structure for the initial and estimated parameters, respectivly. The first enrty is the vector of initial probabilities, the second is the matrix of transition probabilities, the third is the matrix of classification probabilities, and the fourth is the stayer proportion. likelihood is the final likelihood.

2.1.2 Other Data

data_pattern_list <- CombineandPattern(mat_one,mat_two,mat_three,reduce,t, max2, max3)
data_pattern <- data_pattern_list[[1]]
freq_vec <- data_pattern_list[[2]]

parameters <- EM(data_pattern, freq_vec, epsilon, t, initial, transition, classification, pi_0, max2, max3)

Where the variables are defined as above and: mat_one, mat_two, and mat_three are data sources for three unique tests, max2 is the number of possible outcomes in mat_two, max2 is the number of possible outcomes in mat_three, initial is an initial estimate for the vector of initial state probabilities, transition is an initial estimate for the state transition matrix, classification is an initial estimate for the state classification matrix, and pi_0 is an initial estimate of the proportion of stayers.

2.2.1 Simulated Data

#####Sample Parameters
n <- 1000
t <- 5
pi_0 <- 0.1
lambda <- c(.75,.25)
reduce <- T
epsilon <- .005

#####Estimation
three_mats <- GenerateSimulatedDataRaff(n, t, pi_0, lambda)
data_pattern_list <- CombineandPattern(three_mats[[1]][[1]], three_mats[[1]][[2]], three_mats[[1]][[3]], reduce, t)
data_pattern <- data_pattern_list[[1]]
freq_vec <- data_pattern_list[[2]]

init <- GetInitRaff()
tran <- GetTran()
class <- GetClass()
pi_0 <- .05
lambda <- c(.7,.3)
max2 <- 3
max3 <- 2

parameters <- EMRaff(data_pattern, freq_vec, epsilon, t, init, tran, class, pi_0, lambda, max2, max3)

initial_parameters <- parameters[[1]]
estimated_parameters <- parameters[[2]]
likelihood <- parameters[[3]]

Variables are defined similarly to 2.1.1, except for init and lambda. init is now a k by k matrix (where k is the number of states) giving the probability of being in the first two states (the row determines the first state and the column determines the second). lambda is a vector of length two that gives the weights for the first and second order transition from the Rafferty method (the first value in the vector corresponds to the first order weight.

initial_parameters and estimated_parameters is defined similarly as above except now the list have five entries, with the final enrty being lambda. likelihood is defined the same as above as well.

2.2.2 Other Data

data_pattern_list <- CombineandPattern(mat_one, mat_two, mat_three, reduce, t, max2, max3)
data_pattern <- data_pattern_list[[1]]
freq_vec <- data_pattern_list[[2]]

parameters <- EMRaff(data_pattern,freq_vec,epsilon,t, initial, transition, classification, pi_0, lambda, max2, max3)

This is extremeley similar to 2.1.2, except for the use of EMRaff and the inclusion of the lambda variable

jordanaron22/PartiallyObservedHMM documentation built on May 21, 2020, 6:49 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jordanaron22/PartiallyObservedHMM
HMM for Partiall Observed Three Diagnostic Test Data

README.md
In jordanaron22/PartiallyObservedHMM: HMM for Partiall Observed Three Diagnostic Test Data

R package `PartiallyObservedHMM`

Updates

1. Installation

2. Examples

2.1 First Order

2.1.1 Simulated Data

2.1.2 Other Data

2.2 Second Order (Raftery Method)

2.2.1 Simulated Data

2.2.2 Other Data

R Package Documentation

Browse R Packages

We want your feedback!

jordanaron22/PartiallyObservedHMM HMM for Partiall Observed Three Diagnostic Test Data

README.md In jordanaron22/PartiallyObservedHMM: HMM for Partiall Observed Three Diagnostic Test Data

R package PartiallyObservedHMM

Updates

1. Installation

2. Examples

2.1 First Order

2.1.1 Simulated Data

2.1.2 Other Data

2.2 Second Order (Raftery Method)

2.2.1 Simulated Data

2.2.2 Other Data

R Package Documentation

Browse R Packages

We want your feedback!

jordanaron22/PartiallyObservedHMM
HMM for Partiall Observed Three Diagnostic Test Data

README.md
In jordanaron22/PartiallyObservedHMM: HMM for Partiall Observed Three Diagnostic Test Data

R package `PartiallyObservedHMM`