| seqaddNA | R Documentation |
Generation of missing data in sequence based on a Markovian approach.
seqaddNA(
data,
var = NULL,
states.high = NULL,
propdata = 1,
pstart.high = 0.1,
pstart.low = 0.005,
pcont = 0.66,
maxgap = 3,
maxprop = 0.75,
only.traj = FALSE
)
data |
A data frame containing sequences of a categorical (multinomial)
variable, where missing data are coded as |
var |
A vector specifying the columns of the dataset
that contain the trajectories. Default is |
states.high |
A list of states with a higher probability of initiating a subsequent missing data gap. |
propdata |
Proportion of trajectories for which missing data is simulated, as a decimal between 0 and 1. |
pstart.high |
Probability of starting a missing data gap for the
states specified in the |
pstart.low |
Probability of starting a missing data gap for all other states. |
pcont |
Probability of a missing data gap to continue. |
maxgap |
Maximum length of a missing data gap. |
maxprop |
Maximum proportion of missing data allowed in a sequence, as a decimal between 0 and 1. |
only.traj |
Logical, if |
The first time point of a trajectory has a pstart.low probability to
be missing. For the next time points, the probability to be missing depends
on the previous time point. There are four cases:
1. If the previous time point is missing and the maximum length of a
missing gap, which is specified by the argument maxgap, is reached,
the time point is set as observed.
2. If the previous time point is missing, but the maximum length of a gap is
not reached, there is a pcont probability that this time point is missing.
3. If the previous time point is observed and the previous time point belongs
to the list of states specified by pstart.high, the probability to
be missing is pstart.high.
4. If the previous time point is observed but the previous time point does not
belong to the list of states specified by pstart.high, the
probability to be missing is pstart.low.
If the proportion of missing data in a given trajectory exceeds the
proportion specified by maxprop, the missing data simulation is
repeated for the sequence.
A data frame with simulated missing data.
Kevin Emery
# Generate MCAR missing data on the mvad dataset
# from the TraMineR package
## Not run:
data(mvad, package = "TraMineR")
mvad.miss <- seqaddNA(mvad, var = 17:86)
# Generate missing data on mvad where joblessness is more likely to trigger
# a missing data gap
mvad.miss2 <- seqaddNA(mvad, var = 17:86, states.high = "joblessness")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.