approxQuantile: Calculates the approximate quantiles of numerical columns of...
In SparkR: R Front End for 'Apache Spark'

Description Usage Arguments Value Note See Also Examples

Calculates the approximate quantiles of numerical columns of a SparkDataFrame. The result of this algorithm has the following deterministic bound: If the SparkDataFrame has N elements and if we request the quantile at probability p up to error err, then the algorithm will return a sample x from the SparkDataFrame so that the *exact* rank of x is close to (p * N). More precisely, floor((p - err) * N) <= rank(x) <= ceil((p + err) * N). This method implements a variation of the Greenwald-Khanna algorithm (with some speed optimizations). The algorithm was first present in [[https://doi.org/10.1145/375663.375670 Space-efficient Online Computation of Quantile Summaries]] by Greenwald and Khanna. Note that NA values will be ignored in numerical columns before calculation. For columns only containing NA values, an empty list is returned.

1 2	## S4 method for signature 'SparkDataFrame,character,numeric,numeric' approxQuantile(x, cols, probabilities, relativeError)

`x`	A SparkDataFrame.
`cols`	A single column name, or a list of names for multiple columns.
`probabilities`	A list of quantile probabilities. Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
`relativeError`	The relative target precision to achieve (>= 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1.

The approximate quantiles at the given probabilities. If the input is a single column name, the output is a list of approximate quantiles in that column; If the input is multiple column names, the output should be a list, and each element in it is a list of numeric values which represents the approximate quantiles in corresponding column.

approxQuantile since 2.0.0

Other stat functions: corr(), cov(), crosstab(), freqItems(), sampleBy()

## Not run: 
df <- read.json("/path/to/file.json")
quantiles <- approxQuantile(df, "key", c(0.5, 0.8), 0.0)

## End(Not run)

SparkR documentation built on June 3, 2021, 5:05 p.m.

SparkR index

SparkR - Practical Guide

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SparkR
R Front End for 'Apache Spark'

approxQuantile: Calculates the approximate quantiles of numerical columns of...
In SparkR: R Front End for 'Apache Spark'

Description

Usage

Arguments

Value

Note

See Also

Examples

Related to approxQuantile in SparkR...

R Package Documentation

Browse R Packages

We want your feedback!

SparkR R Front End for 'Apache Spark'

approxQuantile: Calculates the approximate quantiles of numerical columns of... In SparkR: R Front End for 'Apache Spark'

Description

Usage

Arguments

Value

Note

See Also

Examples

Related to approxQuantile in SparkR...

R Package Documentation

Browse R Packages

We want your feedback!

SparkR
R Front End for 'Apache Spark'

approxQuantile: Calculates the approximate quantiles of numerical columns of...
In SparkR: R Front End for 'Apache Spark'