corr: corr

Description Usage Arguments Value Note See Also Examples

Description

Computes the Pearson Correlation Coefficient for two Columns.

Usage

1
2
3
4
5
6
corr(x, ...)

## S4 method for signature 'Column'
corr(x, col2)

correlation(x, colName1, colName2, method = "pearson")

Arguments

x

a Column or a SparkDataFrame.

...

additional argument(s). If x is a Column, a Column should be provided. If x is a SparkDataFrame, two column names should be provided.

col2

a (second) Column.

colName1

the name of the first column

colName2

the name of the second column

method

Optional. A character specifying the method for calculating the correlation. only "pearson" is allowed now.

Value

The Pearson Correlation Coefficient as a Double.

Note

corr since 1.6.0

corr since 1.6.0

See Also

Other aggregate functions: approxCountDistinct(), avg(), firstItem(), lastItem()

Other stat functions: approxQuantile(), covariance(), crosstab(), freqItems(), sampleBy()

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
head(select(df, corr(df$mpg, df$hp)))
## End(Not run)

## Not run: 
corr(df, "mpg", "hp")
corr(df, "mpg", "hp", method = "pearson")
## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.