predict.ad: Find applicability domain

Description Usage Arguments Details Value

View source: R/predict.ad.R

Description

predict.ad takes a data frame of chemical descriptors and returns the indices of the molecules that are X-outliers. The determination of outliers uses a method from a 2015 paper by Roy, Kar, and Ambare that can be found here: https://doi.org/10.1016/j.chemolab.2015.04.013.

Usage

1
predict(ad_obj, df, ignore_col = NA, ...)

Arguments

ad

An ad object

df

A data frame of chemical descriptors

msg

Whether to return a message when there are more predictors in the data frame than in the applicability domain object. Typically, this will not be a problem as long as the relevant predictors are still present. The default is msg = F.

Details

The first step is to standardize the values. This can be accomplished by creating an "ad" class object using [ad()]. This creates a list of the means and standard deviations of training data. It is important to only call ad on training data because in model-building, the testing data should be used for evaluation and not be considered in the model-building phase.

predict.ad will use the information in the "ad" object and standardize the descriptor. It will return the descriptors as centered and scaled (mean of 0 and standard deviation of 1). Additionally, this will be converted to absolute values. accomplished using center_scale_zero and returning the Let the standardized value corresponding to descriptor i of molecule k be referred to as s_ik.

Next, the maximum deviation of each molecule needs to be found. This requires examining the entries rowwise. If the maximum s_ik values is less than 3, the molecule is not an X-outlier. If the minimum s_ik is greater than 3, the molecules is an X-outlier. If the minimum is less than 3 and the maximum is greater than 3, we recalculate s_newk.

s_newk is given as the mean of s_ik values for molecule k added to 1.28 times the standard deviation of the s_ik values for molecule k. If this is less than 3, then k is not an X-outlier.

The function only returns a vector of booleans. To remove the X-outliers in a data set and return the cleaned data frame, use the function [remove_xoutlier()].

The data frame may have columns that are not chemical descriptors. This will not hinder the ability to make predictions, though this behavior may not be expected. By setting msg = T, extra columns can be detected.

Value

An integer vector of the row indices of X-outliers


awqx/qsarr documentation built on Oct. 2, 2021, 7:05 a.m.