mle_coerce: Basic discretisation of numerical features

View source: R/coerce.R

mle_coerceR Documentation

Basic discretisation of numerical features

Description

One can use this function for a quick, ad hoc discretisation of numerical features in a data frame, so that it could be passed to vistla using the maximal likelihood estimation (mle, the default). This can be used to simulate legacy behaviour of vistla, which was to automatically perform such conversion with 10 equal-width bins. The non-numeric columns are left as they were, hence this function is idempotent and does nothing when given fully discrete data.

Usage

mle_coerce(x, bins = 3, equal = c("size", "width"))

Arguments

x

Data frame to be converted.

bins

Number of bins to cut each numerical column into.

equal

If given "width", function performs cuts into bins of an equal width, which may thus contain substantially different number of objects. One the other hand, when given "size" (default), cuts are done according to quantiles, hence provide bins with approximately the same number of objects, yet with different widths. Both options are asymptotically equivalent when the distribution of a given column is uniform.

Value

A copy of x, in which numerical columns have been discretised.

Note

While convenient, this function does not necessary provide optimal quantisation of the data (in terms of future vistla performance); especially the bins parameter should be adjusted to the input data, either via optimisation or based on the known properties of the input or mechanisms behind it.

Examples

## Not run: 
data(cchain)
vistla(Y~.,data=mle_coerce(cchain,3,"size")) 

## End(Not run)

vistla documentation built on Sept. 28, 2024, 1:08 a.m.

Related to mle_coerce in vistla...