center_scale: Center and scale data and remove predictors with near-zero...

Description Usage Arguments Details Value

View source: R/center_scale.R

Description

center_scale preprocesses data.

Usage

1
center_scale(df, ignore_col = NA, return_ad = F, ad_obj = NA, quiet = T, ...)

Arguments

df

The data frame to be processed.

ignore_col

Columns that will not be preprocessed, given as a character vector. This will likely constitute the response variable. The default is ignore_col = NA. These columns will be reappended to the data frame at the beginning.

return_ad

Whether or not to return the applicability domain object. Set this to return_ad = T if this is training data and the resulting applicability domain object could be useful for testing data.

quiet

Whether to return a message if there are columns in the original data frame that are dropped.

ad

An optional "ad" object to pass. Use this if you are centering and scaling test data. Default is ad = NA and a new object is created and discarded within the function (unless return_ad = T).

Details

The data is centered so that predictors have a mean of 0. The data is scaled so the standard deviation is 1.

After the above cleaning steps, the mean of each descriptor should be 0. The NAs in the data frame will be replaced with 0, using the assumption that missing values can be estimated to be the expectation of the descriptor.

This uses an "ad" S3 object constructed using [ad()].

Because the data frame to be passed to the function will likely include columns that do not need to be transformed (like columns for identification) or response variables, there is an option to ignore these columns using ignore_col. The input to ignore_col should be a character vector.

Value

If return_ad = F (default), a data frame with columns centered and scaled. Columns can be ignored and not be preprocessed. If return_ad = T, a list with the data frame with columns centered and scaled as well as the "ad" class object. These are labeled "df" and "ad", respectively. Note that only the columns in ignore_col and named in the ad_obj provided will be returned. Set quiet = F to be notified if columns are lost.


awqx/qsarr documentation built on Oct. 2, 2021, 7:05 a.m.