Description Usage Arguments Details Value
predict.ad
takes a data frame of chemical descriptors and returns the
indices of the molecules that are X-outliers. The determination of outliers
uses a method from a 2015 paper by Roy, Kar, and Ambare that can be found
here: https://doi.org/10.1016/j.chemolab.2015.04.013.
1 |
ad |
An ad object |
df |
A data frame of chemical descriptors |
msg |
Whether to return a message when there are more predictors in the
data frame than in the applicability domain object. Typically, this will not
be a problem as long as the relevant predictors are still present. The
default is |
The first step is to standardize the values. This can be accomplished by
creating an "ad"
class object using [ad()]
. This creates a list of the
means and standard deviations of training data. It is important to only call
ad
on training data because in model-building, the testing data should be
used for evaluation and not be considered in the model-building phase.
predict.ad
will use the information in the "ad"
object and standardize
the descriptor. It will return the descriptors as centered and scaled (mean
of 0 and standard deviation of 1). Additionally, this will be converted to
absolute values. accomplished using center_scale_zero
and returning the Let
the standardized value corresponding to descriptor i
of molecule k
be
referred to as s_ik
.
Next, the maximum deviation of each molecule needs to be found. This requires
examining the entries rowwise. If the maximum s_ik
values is less than 3,
the molecule is not an X-outlier. If the minimum s_ik
is greater than 3,
the molecules is an X-outlier. If the minimum is less than 3 and the maximum
is greater than 3, we recalculate s_newk
.
s_newk
is given as the mean of s_ik
values for molecule k
added to 1.28
times the standard deviation of the s_ik
values for molecule k
. If this
is less than 3, then k
is not an X-outlier.
The function only returns a vector of booleans. To remove the X-outliers in a
data set and return the cleaned data frame, use the function
[remove_xoutlier()]
.
The data frame may have columns that are not chemical descriptors. This will
not hinder the ability to make predictions, though this behavior may not be
expected. By setting msg = T
, extra columns can be detected.
An integer vector of the row indices of X-outliers
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.