summarize: summarize

Description Usage Arguments Details Value Note See Also Examples

Description

Aggregates on the entire SparkDataFrame without groups. The resulting SparkDataFrame will also contain the grouping columns.

Compute aggregates by specifying a list of columns

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
agg(x, ...)

summarize(x, ...)

## S4 method for signature 'GroupedData'
agg(x, ...)

## S4 method for signature 'GroupedData'
summarize(x, ...)

## S4 method for signature 'SparkDataFrame'
agg(x, ...)

## S4 method for signature 'SparkDataFrame'
summarize(x, ...)

Arguments

x

a SparkDataFrame or GroupedData.

...

further arguments to be passed to or from other methods.

Details

df2 <- agg(df, <column> = <aggFunction>) df2 <- agg(df, newColName = aggFunction(column))

Value

A SparkDataFrame.

Note

agg since 1.4.0

summarize since 1.4.0

agg since 1.4.0

summarize since 1.4.0

See Also

Other SparkDataFrame functions: SparkDataFrame-class, alias(), arrange(), as.data.frame(), attach,SparkDataFrame-method, broadcast(), cache(), checkpoint(), coalesce(), collect(), colnames(), coltypes(), createOrReplaceTempView(), crossJoin(), cube(), dapplyCollect(), dapply(), describe(), dim(), distinct(), dropDuplicates(), dropna(), drop(), dtypes(), exceptAll(), except(), explain(), filter(), first(), gapplyCollect(), gapply(), getNumPartitions(), group_by(), head(), hint(), histogram(), insertInto(), intersectAll(), intersect(), isLocal(), isStreaming(), join(), limit(), localCheckpoint(), merge(), mutate(), ncol(), nrow(), persist(), printSchema(), randomSplit(), rbind(), rename(), repartitionByRange(), repartition(), rollup(), sample(), saveAsTable(), schema(), selectExpr(), select(), showDF(), show(), storageLevel(), str(), subset(), summary(), take(), toJSON(), unionAll(), unionByName(), union(), unpersist(), withColumn(), withWatermark(), with(), write.df(), write.jdbc(), write.json(), write.orc(), write.parquet(), write.stream(), write.text()

Examples

1
2
3
4
5
6
## Not run: 
 df2 <- agg(df, age = "sum")  # new column name will be created as 'SUM(age#0)'
 df3 <- agg(df, ageSum = sum(df$age)) # Creates a new column named ageSum
 df4 <- summarize(df, ageSum = max(df$age))

## End(Not run)

SparkR documentation built on June 3, 2021, 5:05 p.m.