extract.prob: Obtain conditional probabilities from training data

Description Usage Arguments Value

View source: R/extct_prob.r

Description

This is the function internally used in insilico.train function.

Usage

1
2
3
4
5
6
7
8
9
extract.prob(
  train,
  gs,
  gstable,
  thre = 0.95,
  type = c("quantile", "fixed", "empirical")[1],
  isNumeric = FALSE,
  impute = TRUE
)

Arguments

train

Training data, it should be in the same format as the testing data and contains one additional column (see cause below) specifying known cause of death. The first column is also assumed to be death ID.

gs

the name of the column in train that contains cause of death.

gstable

The list of causes of death used in training data.

thre

a numerical value between 0 to 1. It specifies the maximum rate of missing for any symptoms to be considered in the model. Default value is set to 0.95, meaning if a symptom has more than 95% missing in the training data, it will be removed.

type

Three types of learning conditional probabilities are provided: “quantile” or “fixed”. Since InSilicoVA works with ranked conditional probabilities P(S|C), “quantile” means the rankings of the P(S|C) are obtained by matching the same quantile distributions in the default InterVA P(S|C), and “fixed” means P(S|C) are matched to the closest values in the default InterVA P(S|C) table. Empirically both types of rankings produce similar results. The third option “empirical” means no rankings are calculated, only the raw P(S|C) values are returned.

isNumeric

Indicator if the input is already in numeric form. If the input is coded numerically such that 1 for “present”, 0 for “absent”, and -1 for “missing”, this indicator could be set to True to avoid conversion to standard InterVA format.

impute

Indicator for whether to impute 1. P(S|C) with P(S) if symptom S does not exist more than the threshold of fractions within death due to C; and 2. values of exact 0 or 1.

Value

cond.prob

raw P(S|C) matrix

cond.prob.alpha

ranked P(S|C) matrix

table.alpha

list of ranks used

table.num

list of median numerical values for each rank

symps.train

training data after removing symptoms with too high missing rate.


InSilicoVA documentation built on Aug. 2, 2021, 5:08 p.m.