sdf_quantile: Compute (Approximate) Quantiles with a Spark DataFrame

View source: R/sdf_interface.R

sdf_quantileR Documentation

Compute (Approximate) Quantiles with a Spark DataFrame

Description

Given a numeric column within a Spark DataFrame, compute approximate quantiles.

Usage

sdf_quantile(
  x,
  column,
  probabilities = c(0, 0.25, 0.5, 0.75, 1),
  relative.error = 1e-05,
  weight.column = NULL
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

column

The column(s) for which quantiles should be computed. Multiple columns are only supported in Spark 2.0+.

probabilities

A numeric vector of probabilities, for which quantiles should be computed.

relative.error

The maximal possible difference between the actual percentile of a result and its expected percentile (e.g., if 'relative.error' is 0.01 and 'probabilities' is 0.95, then any value between the 94th and 96th percentile will be considered an acceptable approximation).

weight.column

If not NULL, then a generalized version of the Greenwald- Khanna algorithm will be run to compute weighted percentiles, with each sample from 'column' having a relative weight specified by the corresponding value in 'weight.column'. The weights can be considered as relative frequencies of sample data points.


rstudio/sparklyr documentation built on Sept. 18, 2024, 6:10 a.m.