View source: R/ts_feature_correction.R
plr_kmeans_test | R Documentation |
The method builds linear models by day, identifies outliers, and performs 2-means clustering by slopes. If the lower identified cluster is significantly less than the higher mean, and constitutes less than 25% of the data, it is identified as soiled and returned. Otherwise, the outlier points are identified as soiled and returned.
plr_kmeans_test( df, var_list, mean_ratio = 0.7, plot = FALSE, file_path, file_name, set_cutoff = FALSE )
df |
A df containing pv data. Should be 'cleaned' by |
var_list |
A list of the dataframe's standard variable names, obtained from
the output of |
mean_ratio |
This scales the higher identified cluster's mean for comparison. Higher values will be more likely to identify the second mean as soiled, and vice versa. Values should range from 0 to 1. |
plot |
optional; Boolean; whether to return the box plot generated by the method to identify outliers. |
file_path |
optional; location to store the boxplot if plot is set TRUE. Note this is not necessary if you select to plot - only if you wish to save it. |
file_name |
optional; name of file to save boxplot if plot is set to TRUE. |
set_cutoff |
Defaults to FALSE; pass a numeric value to cut off all slopes less than the cutoff value. This bypasses entirely the outlier and clustering calculuations to remove slope values you believe to be soiled. |
The method returns a dataframe containing the values that should be removed. If you want to discard them, try using dplyr::filter().
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.