bootstrap.gain: Construct a Confidence Interval of the Gain Estimate

Description Usage Arguments Details Value References Examples

Description

Estimates gain and its confidence interval at a given level of confidence by using bootstrap.

Usage

1
2
3
4
5
bootstrap.gain(df1, df2, df3, opt.cov, n.rep, p1.beg, p1.end, p2.beg,
  p2.end, ratedPW, AEP, pw.freq, freq.id = 3,
  time.format = "%Y-%m-%d %H:%M:%S", k.fold = 5, col.time = 1,
  col.turb = 2, free.sec = NULL, neg.power = FALSE,
  pred.return = FALSE)

Arguments

df1

A dataframe for reference turbine data. This dataframe must include five columns: timestamp, turbine id, wind direction, power output, and air density.

df2

A dataframe for baseline control turbine data. This dataframe must include four columns: timestamp, turbine id, wind speed, and power output.

df3

A dataframe for neutral control turbine data. This dataframe must include four columns and have the same structure with df2.

opt.cov

A character vector indicating the optimal set of variables (obtained from the period 1 analysis).

n.rep

An integer describing the total number of replications when applying bootstrap. This number determines the confidence level; for example, if n.rep is set to 10, this function will provide an 80% confidence interval.

p1.beg

A string specifying the beginning date of period 1. By default, the value needs to be specified in %Y-%m-%d format, for example, '2014-10-24'. A user can use a different format as long as it is consistent with the format defined in time.format below.

p1.end

A string specifying the end date of period 1. For example, if the value is '2015-10-24', data observed until '2015-10-23 23:50:00' would be considered for period 1.

p2.beg

A string specifying the beginning date of period 2.

p2.end

A string specifying the end date of period 2. Defined similarly as p1.end.

ratedPW

A kW value that describes the (common) rated power of the selected turbines (REF and CTR-b).

AEP

A kWh value describing the annual energy production from a single turbine.

pw.freq

A matrix or a dataframe that includes power output bins and corresponding frequency in terms of the accumulated hours during an annual period.

freq.id

An integer indicating the column number of pw.freq that describes the frequency of power bins in terms of the accumulated hours during an annual period. By default, this parameter is set to 3.

time.format

A string describing the format of time stamps used in the data to be analyzed. The default value is '%Y-%m-%d %H:%M:%S'.

k.fold

An integer defining the number of data folds for the period 1 analysis and prediction. In the period 1 analysis, k-fold cross validation (CV) will be applied to choose the optimal set of covariates that results in the least prediction error. The value of k.fold corresponds to the k of the k-fold CV. The default value is 5.

col.time

An integer specifying the column number of time stamps in wind turbine datasets. The default value is 1.

col.turb

An integer specifying the column number of turbines' id in wind turbine datasets. The default value is 2.

free.sec

A list of vectors defining free sectors. Each vector in the list has two scalars: one for starting direction and another for ending direction, ordered clockwise. For example, a vector of c(310 , 50) is a valid component of the list. By default, this is set to NULL.

neg.power

Either TRUE or FALSE, indicating whether or not to use data points with a negative power output, respectively, in the analysis. The default value is FALSE, i.e., negative power output data will be eliminated.

pred.return

A logical value whether to return the full prediction results; see Details below. The default value is FALSE.

Details

For each replication, this function will make a k of period 1 predictions for each of REF and CTR-b turbine models and an additional period 2 prediction for each model. This results in 2 \times (k + 1) predictions for each replication. With n.rep replications, there will be n.rep \times 2 \times (k + 1) predictions in total.

One can avoid storing such many datasets in the memory by setting pred.return to FALSE; which is the default setting.

Value

The function returns a list of n.rep replication objects (lists) each of which includes the following.

gain.res

A list containing gain quantification results; see quantify.gain for the details.

p1.pred

A list containing period 1 prediction results.

  • pred.REF: A list of k datasets each representing the kth fold's period 1 prediction for the REF turbine.

  • pred.CTR: A list of k datasets each representing the kth fold's period 1 prediction for the CTR-b turbine.

p2.pred

A list containing period 2 prediction results; see analyze.p2 for the details.

References

H. Hwangbo, Y. Ding, and D. Cabezon, 'Machine Learning Based Analysis and Quantification of Potential Power Gain from Passive Device Installation,' arXiv:1906.05776 [stat.AP], Jun. 2019. https://arxiv.org/abs/1906.05776.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
df.ref <- with(wtg, data.frame(time = time, turb.id = 1, wind.dir = D,
 power = y, air.dens = rho))
df.ctrb <- with(wtg, data.frame(time = time, turb.id = 2, wind.spd = V,
 power = y))
df.ctrn <- df.ctrb
df.ctrn$turb.id <- 3

opt.cov = c('D','density','Vn','hour')
n.rep = 2 # just for illustration; a user may use at leat 10 for this.

res <- bootstrap.gain(df.ref, df.ctrb, df.ctrn, opt.cov = opt.cov, n.rep = n.rep,
 p1.beg = '2014-10-24', p1.end = '2014-10-25', p2.beg = '2014-10-25',
 p2.end = '2014-10-26', ratedPW = 1000, AEP = 300000, pw.freq = pw.freq,
 k.fold = 2)

length(res) #2
sapply(res, function(ls) ls$gain.res$gainCurve) #This provides 2 gain curves.
sapply(res, function(ls) ls$gain.res$gain) #This provides 2 gain values.

gainML documentation built on June 28, 2019, 5:05 p.m.