Description Usage Arguments Value See Also Examples
Compute quantiles of 'column' within each time window or within each group of records with identical time-stamps, and store results in new columns named '<column>_<quantile value>quantile'
1 | summarize_quantile(ts_rdd, column, p, window = NULL, key_columns = list())
|
ts_rdd |
Timeseries RDD being summarized |
column |
Column to be summarized |
p |
List of quantile probabilities |
window |
Either an R expression specifying time windows to be summarized (e.g., 'in_past("1h")' to summarize data from looking behind 1 hour at each time point, 'in_future("5s")' to summarize data from looking forward 5 seconds at each time point), or 'NULL' to compute aggregate statistics on records grouped by timestamps |
key_columns |
Optional list of columns that will form an equivalence relation associating each record with the time series it belongs to (i.e., any 2 records having equal values in those columns will be associated with the same time series, and any 2 records having differing values in those columns are considered to be from 2 separate time series and will therefore be summarized separately) By default, 'key_colums' is empty and all records are considered to be part of a single time series. |
A TimeSeriesRDD containing the summarized result
Other summarizers:
ols_regression()
,
summarize_avg()
,
summarize_corr2()
,
summarize_corr()
,
summarize_count()
,
summarize_covar()
,
summarize_dot_product()
,
summarize_ema_half_life()
,
summarize_ewma()
,
summarize_geometric_mean()
,
summarize_kurtosis()
,
summarize_max()
,
summarize_min()
,
summarize_nth_central_moment()
,
summarize_nth_moment()
,
summarize_product()
,
summarize_skewness()
,
summarize_stddev()
,
summarize_sum()
,
summarize_var()
,
summarize_weighted_avg()
,
summarize_weighted_corr()
,
summarize_weighted_covar()
,
summarize_z_score()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | library(sparklyr)
library(sparklyr.flint)
sc <- try_spark_connect(master = "local")
if (!is.null(sc)) {
sdf <- copy_to(sc, tibble::tibble(t = seq(10), v = seq(10)))
ts <- fromSDF(sdf, is_sorted = TRUE, time_unit = "SECONDS", time_column = "t")
ts_quantile <- summarize_quantile(
ts, column = "v", p = c(0.5, 0.75, 0.99), window = in_past("3s")
)
} else {
message("Unable to establish a Spark connection!")
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.