mis.cost: Imputation of missing states
In MortenKrebs/diagtraject: Mining of Diagnosis Trajectories

Description Usage Arguments Details Value Examples

View source: R/impute.R

imputes states given state at last observation before censoring and timespecific transition rates and calculate probability weighted substitution costs.

mis.cost(
  seq_mis,
  cens.type = c("right", "left", "both"),
  last = NULL,
  trans.rates = NULL,
  smooth = F,
  sum_to_1 = T,
  MM = T,
  sm = "CONSTANT",
  method = "prob",
  prob.out = F,
  diag = F,
  resol.comp = NULL,
  resol.ratio = 1,
  mc.cores = NULL
)

`seq_mis`	object of class `'stslist'` as created by `seqdef`.
`cens.type`	character indicating the type of censoring. Must be either `"rigth"`, `"left"` or `"both"`.
`last`	optional caracter string containing state levels at last observation before censoring.
`trans.rates`	object of class `'array'` with transition rates in format created by `seqtrate`. If `NULL` (default) transition rates will be calculated using `seqtrate` function.
`smooth`	`'logical'` indicating if transition rates should be smoothed.
`sm`	`'character'` indicating substitution cost setting. Must be `"CONSTANT"` or `"TRATE"` for sm calculated by `TraMineR::seqsubm` or object of class `'matrix'` containing substitution costs.
`method`	Currently only `"prob"` (default) is available. See `Details`.
`prob.out`	logical indicating if imputed probabilities should be included in output. Defaults to `FALSE`.
`diag`	logical indicating if diagonal should be printed in dist object. Defaults to `FALSE`.
`resol.comp`	optional vector of integers. If increaments differ between calculations of dissimmilarities in complete and imputed sequences the differences can be specified for compensation.
`resol.ratio`	optional numeric specified if increaments differ between calculations of dissimmilarities in imputed and complete sequences. Defaults to `1`.
`mc.cores`	optional integer specifying the number of cores for parallel computation.

Calculates dissimilarities for right and left censored state sequence objects using probability weighted substitution costs

d_{inf}(i,j) = ∑\limits_{t=1}^{t_{max}} ∑ Pr(i)_t Pr(j)_t^T \circ SC

Object of class 'dist' containing dissimilarities.

## Creating a sequence object
data(mvad)
mvad.alphabet <- c("employment", "FE", "HE", "joblessness", "school", 
                   "training")
mvad.labels <- c("employment", "further education", "higher education", 
                 "joblessness", "school", "training")
mvad.scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad.seq <- seqdef(mvad, 17:86, alphabet = mvad.alphabet, states = mvad.scodes, 
                   labels = mvad.labels, xtstep = 6)

## Introducing right-censoring
addMissing <- function(x){
if(is.factor(x)) return(factor(x, levels=c(levels(x), "missing")))
return(x)}
mvad.perm <- mvad
mvad.perm <- as.data.frame(lapply(mvad.perm, addMissing))
row.perm.r <- sample(1:nrow(mvad))[1:floor(nrow(mvad)*.5)]
row.perm <- 1:nrow(mvad) %in% row.perm.r 
col.perm.r <- sample(floor(ncol(mvad[,17:86])*.8):ncol(mvad[,17:86]),size = length(row.perm.r),replace = T)     
for(i in 1:length(row.perm.r)){
  mvad.perm[row.perm.r[i],(col.perm.r[i]+16):ncol(mvad)] <- "missing"}
perm.seq <- seqdef(mvad.perm, 17:86, alphabet = mvad.alphabet, states = mvad.scodes, missing = "missing", labels = mvad.labels, xtstep = 6)

## Computing Hamming distance in observed states
perm.seq2 <- seqdef(mvad.perm, 17:86, xtstep = 6)
sub.cost2 <- seqsubm(seqdata = perm.seq2, method = "CONSTANT")
sub.cost2["missing->",] <- sub.cost2[,"missing->"] <- 0
dist.obs <- seqdist(perm.seq2, method = "HAM", sm = sub.cost2)

## Computing Probability weighted Hamming distance in censored states:
dist.mis <- mis.cost(perm.seq, cens.type="right",sum_to_1 = F, 
method = "prob",sm = "CONSTANT",smooth = F)
dist <- dist.obs + as.matrix(dist.mis$dist)

## Obtaining imputed probabilities
prob <- mis.cost(perm.seq, cens.type="right",sum_to_1 = F, 
method = "prob",smooth = F, prob.out=T)