Description Usage Arguments Value Author(s) Examples
A complementary implementation using methods common in mmb, such as computing factors or segmenting data. Supports Laplacian smoothing and early-stopping segmenting, as well as PDF and CDF and selecting any subset of features for dependency.
1 2 3 4 5 6 7 8 9 10 | bayesProbabilityNaive(
df,
features,
targetCol,
selectedFeatureNames = c(),
shiftAmount = 0.1,
retainMinValues = 1,
doEcdf = FALSE,
useParallel = NULL
)
|
df |
data.frame that contains all the feature's data |
features |
data.frame with bayes-features. One of the features needs to be the label-column. |
targetCol |
string with the name of the feature that represents the label. |
selectedFeatureNames |
vector default |
shiftAmount |
numeric an offset value used to increase any one probability (factor) in the full built equation. In scenarios with many dependencies, it is more likely that a single conditional probability becomes zero, which would result in the entire probability being zero. Since this is often useless, the 'shiftAmount' can be added to each factor, resulting in a non-zero probability that can at least be used to order samples by likelihood. Note that, with a positive 'shiftAmount', the result of this function cannot be said to be a probability any longer, but rather results in a comparable likelihood (a 'probability score'). |
retainMinValues |
integer to require a minimum amount of data points when segmenting the data feature by feature. |
doEcdf |
default FALSE a boolean to indicate whether to use the
empirical CDF to return a probability when inferencing a continuous
feature. If false, uses the empirical PDF to return the rel. likelihood.
This parameter does not have any effect if all of the variables are
discrete or when doing a regression. Otherwise, for each continuous
variable, the probability to find a value less then or equal - given
the conditions - is returned. Note that the interpretation of probability
using the ECDF much deviates and must be used with care, especially
since it affects each factor in Bayes equation that is continuous. This
is especially true for the case where |
useParallel |
default NULL a boolean to indicate whether to use a
previously registered parallel backend. If no explicit value was given,
calls |
numeric probability (inferring discrete labels) or relative
likelihood (regression, inferring likelihood of continuous value) or most
likely value given the conditional features. If using a positive
shiftAmount
, the result is a 'probability score'.
Sebastian Hönel sebastian.honel@lnu.se
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | feat1 <- mmb::createFeatureForBayes(
name = "Petal.Length", value = mean(iris$Petal.Length))
feat2 <- mmb::createFeatureForBayes(
name = "Petal.Width", value = mean(iris$Petal.Width))
featT <- mmb::createFeatureForBayes(
name = "Species", iris[1,]$Species, isLabel = TRUE)
# Check the probability of Species=setosa, given the other 2 features:
mmb::bayesProbabilityNaive(
df = iris, features = rbind(feat1, feat2, featT), targetCol = "Species")
# Now check the probability of Species=versicolor:
featT$valueChar <- "versicolor"
mmb::bayesProbabilityNaive(
df = iris, features = rbind(feat1, feat2, featT), targetCol = "Species")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.