R/extract_event_ftrs.R

Defines functions extract_event_ftrs

Documented in extract_event_ftrs

#' Extracts events from a data stream and computes event features.
#'
#'This function extracts events from a 2D or 3D data stream and computes a set of 30 features for 2D streams and 13 features for 3D streams, by using a moving window. 2D data streams with class labels can be generated by using the function \code{gen_stream}. To get the class labels of the extracted events for the supervised setting, the event position is matched with the \code{details} of the events, which is part of the output of the \code{gen_stream} function.
#'
#'@param stream A data stream. This can be the output of either the \code{gen_stream} function or the \code{stream_from_files} function.
#'@param supervised If \code{TRUE}, event class labels need to be given in \code{details}.
#'@param details Event details. This is also an output of the \code{gen_stream} function. Event details are used to get the class labels of the extracted events, by matching the position.
#'@param win_size The window length of the moving window model, default is set to \code{200}.
#'@param step_size The window is moved by the \code{step_size}, default is \code{20}.
#'@param thres The cut-off quantile. Default is set to \code{0.95}. Values greater than the quantile will be clustered. The rest is not clustered.
#'@param folder If set to a local folder, this is where the jpegs of window data and extracted events are saved for a 2D data stream.
#'@param vis If \code{TRUE}, the window data and the extracted events are plotted for a 2D data stream.
#'@param tt Related to event ages. For example if \code{tt=10} then the event ages are \code{10, 20, 30} and \code{40}.
#'@param epsilon The \code{eps} parameter in \code{dbscan} function in the package \code{dbscan}
#'@param miniPts The \code{minPts} parameter in \code{dbscan} function in the package \code{dbscan}
#'@param rolling This parameter is set to \code{TRUE} if rolling windows are considered. 
#'
#'
#'@return
#'An \code{Nx22x4} array  is returned for 2D data streams and an \code{Nx13x4} array for 3D data streams. Here \code{N} is the total number of  events extracted from all windows. The second dimension has \code{m} features and the class label for the \code{supervised} setting.  The third dimension has \code{4} different event ages : \code{tt, 2tt, 3tt, 4tt}.
#'For example, the element at \code{[10,6,3]} has the 6th feature, of the 10th extracted event when the age of the event is \code{3tt}. The features for 2D streams are listed below. For 3D streams the features \code{cluster_id, pixels, length, width, height, total_value, l2w_ratio, centroid_x, centroid_y, centroid_z, mean, std_dev} and \code{sd_from_global_mean} are computed.
#'   \item{\code{cluster_id}}{An identification number for each event.}
#'   \item{\code{pixels}}{The number of pixels of each event.}
#'   \item{\code{length}}{The length of the event.}
#'   \item{\code{width}}{The width of the event.}
#'   \item{\code{total_value}}{The total value of the pixels.}
#'   \item{\code{l2w_ratio}}{Length to width ratio of event.}
#'   \item{\code{centroid_x}}{x coordinate of event centroid.}
#'   \item{\code{centroid_y}}{y coordinate of event centroid.}
#'   \item{\code{mean}}{Mean value of event pixels.}
#'   \item{\code{std_dev}}{Standard deviation of event pixels.}
#'   \item{\code{avg_slope}}{The slope of an \code{lm} object fitted to the event pixels.}
#'   \item{\code{quad_1}}{The linear coefficient of  a second order polynomial fitted to event pixels using \code{lm}. }
#'   \item{\code{quad_2}}{The quadratic coefficient of a second order polynomial fitted to event pixels using \code{lm}.}
#'   \item{\code{2sd_from_mean}}{The proportion of event pixels/cells that has values greater than 2 global standard deviations from the global mean of the window.}
#'   \item{\code{3sd_from_mean}}{The proportion of event pixels/cells that has values greater than 3 global standard deviations from the global mean of the window.}
#'   \item{\code{4sd_from_mean}}{The proportion of event pixels/cells that has values greater than 4 global standard deviations from the global mean of the window.}
#'   \item{\code{5iqr_from_median}}{A small portion of each window and its column medians and column IQRs are used to construct two smoothing splines: a median spline and an IQR spline. The value of the median smoothing spline at each event centroid is used as the local median for that event. Similarly, the value of the IQR smoothing spline at each event centroid is used as the local IQR for that event. This feature gives the proportion of event pixels/cells  that has values greater than 5 local IQRs from the local median.}
#'   \item{\code{6iqr_from_median}}{The proportion of event pixels/cells that has values greater than 6 local IQRs from the local median computed using splines.}
#'   \item{\code{7iqr_from_median}}{The proportion of event pixels/cells that has values greater than 7 local IQRs from the local median computed using splines.}
#'   \item{\code{8iqr_from_median}}{The proportion of event pixels/cells that has values greater than 8 local IQRs from the local median computed using splines.}
#'   \item{\code{iqr_from_median}}{Let us denote the 75th percentile of the event pixels value by \code{x}. How many local IQRs is \code{x} is away from the local median? Both local IQR and local median are computed using splines. That value is given by this feature. }
#'   \item{\code{sd_from_mean}}{Let us denote the 80th percentile of the event pixels value by \code{x}. How many global standard deviations is \code{x} is away from the global mean? Here both global values are computed from window data. }
#'
#'
#'
#'@examples
#'# 2D data stream example
#' out <- gen_stream(1, sd=15)
#' zz <- as.matrix(out$data)
#' features <- extract_event_ftrs(zz, supervised=TRUE, details = out$details)
#' features
#'
#' # 3D data stream example
#' set.seed(1)
#' arr <- array(rnorm(12000),dim=c(40,25,30))
#' arr[25:33,12:20, 20:23] <- 10
#' # getting events
#' ftrs <- extract_event_ftrs(arr, supervised=FALSE, win_size=10, step_size = 2, tt=2, thres=0.985)
#' ftrs
#'
#'@export
#'@importFrom grDevices dev.off jpeg topo.colors
#'@importFrom graphics axis hist image par plot title
#'@importFrom stats IQR aggregate binomial coef glm lm median optim predict quantile rnorm runif sd smooth.spline
#'@importFrom utils read.csv write.csv
#'@importFrom stats cor model.matrix prcomp

extract_event_ftrs <- function(stream, supervised=FALSE, details=NULL, win_size=200, step_size=20, thres=0.95, folder=NULL, vis=FALSE, tt=10, epsilon =5, miniPts = 10, rolling=TRUE){

  array_dim <- length(dim(stream))
  if(array_dim==2){
    all_train_features <- extract_event_ftrs_2d(stream, supervised, details, win_size, step_size, thres, folder, vis, tt, epsilon, miniPts, rolling=rolling  )
  }else if(array_dim==3){
    all_train_features <- extract_event_ftrs_3d(stream, supervised, details, win_size, step_size, thres, tt, epsilon, miniPts)
  }else{
    stop("Stream needs to be a 2D or 3D object such as a dataframe, matrix or an array!")
  }
  if(exists('all_train_features')){
    out <- all_train_features
  }else{
    out <- NULL
  }
  return(out)

}
sevvandi/eventstream documentation built on May 16, 2022, 11:23 a.m.