analyze.gain: Analyze Potential Gain from Passive Device Installation on...
In gainML: Machine Learning-Based Analysis of Potential Power Gain from Passive Device Installation on Wind Turbine Generators

Description Usage Arguments Details Value Note References See Also Examples

Implements the gain analysis as a whole; this includes data arrangement, period 1 analysis, period 2 analysis, and gain quantification.

analyze.gain(df1, df2, df3, p1.beg, p1.end, p2.beg, p2.end, ratedPW, AEP,
  pw.freq, freq.id = 3, time.format = "%Y-%m-%d %H:%M:%S",
  k.fold = 5, col.time = 1, col.turb = 2, bootstrap = NULL,
  free.sec = NULL, neg.power = FALSE)

`df1`	A dataframe for reference turbine data. This dataframe must include five columns: timestamp, turbine id, wind direction, power output, and air density.
`df2`	A dataframe for baseline control turbine data. This dataframe must include four columns: timestamp, turbine id, wind speed, and power output.
`df3`	A dataframe for neutral control turbine data. This dataframe must include four columns and have the same structure with `df2`.
`p1.beg`	A string specifying the beginning date of period 1. By default, the value needs to be specified in %Y-%m-%d format, for example, `'2014-10-24'`. A user can use a different format as long as it is consistent with the format defined in `time.format` below.
`p1.end`	A string specifying the end date of period 1. For example, if the value is `'2015-10-24'`, data observed until `'2015-10-23 23:50:00'` would be considered for period 1.
`p2.beg`	A string specifying the beginning date of period 2.
`p2.end`	A string specifying the end date of period 2. Defined similarly as `p1.end`.
`ratedPW`	A kW value that describes the (common) rated power of the selected turbines (REF and CTR-b).
`AEP`	A kWh value describing the annual energy production from a single turbine.
`pw.freq`	A matrix or a dataframe that includes power output bins and corresponding frequency in terms of the accumulated hours during an annual period.
`freq.id`	An integer indicating the column number of `pw.freq` that describes the frequency of power bins in terms of the accumulated hours during an annual period. By default, this parameter is set to 3.
`time.format`	A string describing the format of time stamps used in the data to be analyzed. The default value is `'%Y-%m-%d %H:%M:%S'`.
`k.fold`	An integer defining the number of data folds for the period 1 analysis and prediction. In the period 1 analysis, k-fold cross validation (CV) will be applied to choose the optimal set of covariates that results in the least prediction error. The value of `k.fold` corresponds to the k of the k-fold CV. The default value is 5.
`col.time`	An integer specifying the column number of time stamps in wind turbine datasets. The default value is 1.
`col.turb`	An integer specifying the column number of turbines' id in wind turbine datasets. The default value is 2.
`bootstrap`	An integer indicating the current replication (run) number of bootstrap. If set to `NULL`, bootstrap is not applied. The default is `NULL`. A user is not recommended to set this value and directly run bootstrap; instead, use `bootstrap.gain` to run bootstrap.
`free.sec`	A list of vectors defining free sectors. Each vector in the list has two scalars: one for starting direction and another for ending direction, ordered clockwise. For example, a vector of `c(310 , 50)` is a valid component of the list. By default, this is set to `NULL`.
`neg.power`	Either `TRUE` or `FALSE`, indicating whether or not to use data points with a negative power output, respectively, in the analysis. The default value is `FALSE`, i.e., negative power output data will be eliminated.

Builds a machine learning model for a REF turbine (device installed) and a baseline CTR turbine (CTR-b; without device installation and preferably closest to the REF turbine) by using data measurements from a neutral CTR turbine (CTR-n; without device installation). Gain is quantified by evaluating predictions from the machine learning models and their differences during two different time periods, namely, period 1 (without device installation on the REF turbine) and period 2 (device installed on the REF turbine).

The function returns a list of several objects (lists) that includes all the analysis results from all steps.

data: A list of arranged datasets including period 1 and period 2 data as well as k-folded training and test datasets generated from the period 1 data. See also arrange.data.
p1.res: A list containing period 1 analysis results. This includes the optimal set of predictor variables, period 1 prediction for the REF turbine and CTR-b turbine, the corresponding error measures such as RMSE and BIAS, and BIAS curves for both REF and CTR-b turbine models; see analyze.p1 for the details.
p2.res: A list containing period 2 analysis results. This includes period 2 prediction for the REF turbine and CTR-b turbine. See also analyze.p2.
gain.res: A list containing gain quantification results. This includes effect curve, offset curve, and gain curve as well as the measures of effect (gain without offset), offset, and (the final) gain; see quantify.gain for the details.

This function will execute four other functions in sequence, namely, arrange.data, analyze.p1, analyze.p2, quantify.gain.
A user can alternatively run the four funtions by calling them individually in sequence.

H. Hwangbo, Y. Ding, and D. Cabezon, 'Machine Learning Based Analysis and Quantification of Potential Power Gain from Passive Device Installation,' arXiv:1906.05776 [stat.AP], Jun. 2019. https://arxiv.org/abs/1906.05776.

arrange.data, analyze.p1, analyze.p2, quantify.gain

df.ref <- with(wtg, data.frame(time = time, turb.id = 1, wind.dir = D,
 power = y, air.dens = rho))
df.ctrb <- with(wtg, data.frame(time = time, turb.id = 2, wind.spd = V,
 power = y))
df.ctrn <- df.ctrb
df.ctrn$turb.id <- 3

# For Full Sector Analysis
res <- analyze.gain(df.ref, df.ctrb, df.ctrn, p1.beg = '2014-10-24',
 p1.end = '2014-10-25', p2.beg = '2014-10-25', p2.end = '2014-10-26',
 ratedPW = 1000, AEP = 300000, pw.freq = pw.freq, k.fold = 2)
# In practice, one may use annual data for each of period 1 and period 2 analysis.
# One may typically use k.fold = 5 or 10.

# For Free Sector Analysis
free.sec <- list(c(310, 50), c(150, 260))

res <- analyze.gain(df.ref, df.ctrb, df.ctrn, p1.beg = '2014-10-24',
 p1.end = '2014-10-25', p2.beg = '2014-10-25', p2.end = '2014-10-26',
 ratedPW = 1000, AEP = 300000, pw.freq = pw.freq, k.fold = 2,
 free.sec = free.sec)

gain.res <- res$gain.res
gain.res$gain    #This will provide the final gain value.