plr_kmeans_test: Statistical k-means Test
In PVplr: Performance Loss Rate Analysis Pipeline

plr_kmeans_test

R Documentation

Statistical k-means Test

Description

The method builds linear models by day, identifies outliers, and performs 2-means clustering by slopes. If the lower identified cluster is significantly less than the higher mean, and constitutes less than 25% of the data, it is identified as soiled and returned. Otherwise, the outlier points are identified as soiled and returned.

Usage

plr_kmeans_test(
  df,
  var_list,
  mean_ratio = 0.7,
  plot = FALSE,
  file_path,
  file_name,
  set_cutoff = FALSE
)

Arguments

`df`	A df containing pv data. Should be 'cleaned' by `plr_cleaning`.
`var_list`	A list of the dataframe's standard variable names, obtained from the output of `plr_variable_check`.
`mean_ratio`	This scales the higher identified cluster's mean for comparison. Higher values will be more likely to identify the second mean as soiled, and vice versa. Values should range from 0 to 1.
`plot`	optional; Boolean; whether to return the box plot generated by the method to identify outliers.
`file_path`	optional; location to store the boxplot if plot is set TRUE. Note this is not necessary if you select to plot - only if you wish to save it.
`file_name`	optional; name of file to save boxplot if plot is set to TRUE.
`set_cutoff`	Defaults to FALSE; pass a numeric value to cut off all slopes less than the cutoff value. This bypasses entirely the outlier and clustering calculuations to remove slope values you believe to be soiled.