Description Usage Arguments Details Note See Also Examples
Non-aggregate functions defined for Column
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | when(condition, value)
bitwiseNOT(x)
create_array(x, ...)
create_map(x, ...)
expr_col(x)
greatest(x, ...)
input_file_name(x = "missing")
isnan(x)
least(x, ...)
monotonically_increasing_id(x = "missing")
nanvl(y, x)
negate(x)
rand(seed)
randn(seed)
spark_partition_id(x = "missing")
struct(x, ...)
## S4 method for signature 'Column'
bitwiseNOT(x)
## S4 method for signature 'Column'
isnan(x)
## S4 method for signature 'Column'
is.nan(x)
## S4 method for signature 'missing'
monotonically_increasing_id()
## S4 method for signature 'Column'
negate(x)
## S4 method for signature 'missing'
spark_partition_id()
## S4 method for signature 'characterOrColumn'
struct(x, ...)
## S4 method for signature 'Column'
nanvl(y, x)
## S4 method for signature 'Column'
greatest(x, ...)
## S4 method for signature 'Column'
least(x, ...)
## S4 method for signature 'character'
expr_col(x)
## S4 method for signature 'Column'
when(condition, value)
## S4 method for signature 'missing'
rand(seed)
## S4 method for signature 'numeric'
rand(seed)
## S4 method for signature 'missing'
randn(seed)
## S4 method for signature 'numeric'
randn(seed)
## S4 method for signature 'Column'
create_array(x, ...)
## S4 method for signature 'Column'
create_map(x, ...)
## S4 method for signature 'missing'
input_file_name()
|
condition |
the condition to test on. Must be a Column expression. |
value |
result expression. |
x |
Column to compute on. In |
... |
additional Columns. |
y |
Column to compute on. |
seed |
a random seed. Can be missing. |
bitwiseNOT
: Computes bitwise NOT.
isnan
: Returns true if the column is NaN.
is.nan
: Alias for isnan.
monotonically_increasing_id
: Returns a column that generates
monotonically increasing 64-bit integers. The generated ID is guaranteed to
be monotonically increasing and unique, but not consecutive. The current
implementation puts the partition ID in the upper 31 bits, and the record
number within each partition in the lower 33 bits. The assumption is that
the SparkDataFrame has less than 1 billion partitions, and each partition
has less than 8 billion records. As an example, consider a SparkDataFrame
with two partitions, each with 3 records. This expression would return the
following IDs: 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
This is equivalent to the MONOTONICALLY_INCREASING_ID function in SQL.
The method should be used with no argument.
Note: the function is non-deterministic because its result depends on
partition IDs.
negate
: Unary minus, i.e. negate the expression.
spark_partition_id
: Returns the partition ID as a SparkDataFrame
column. Note that this is nondeterministic because it depends on data
partitioning and task scheduling. This is equivalent to the
SPARK_PARTITION_ID
function in SQL.
struct
: Creates a new struct column that composes multiple input
columns.
nanvl
: Returns the first column (y
) if it is not NaN, or the
second column (x
) if the first column is NaN. Both inputs should be
floating point columns (DoubleType or FloatType).
greatest
: Returns the greatest value of the list of column names,
skipping null values. This function takes at least 2 parameters. It will
return null if all parameters are null.
least
: Returns the least value of the list of column names, skipping
null values. This function takes at least 2 parameters. It will return null
if all parameters are null.
expr
: Parses the expression string into the column that it
represents, similar to SparkDataFrame.selectExpr
when
: Evaluates a list of conditions and returns one of multiple
possible result expressions. For unmatched expressions null is returned.
rand
: Generates a random column with independent and identically
distributed (i.i.d.) samples from U[0.0, 1.0].
Note: the function is non-deterministic in general case.
randn
: Generates a column with independent and identically
distributed (i.i.d.) samples from the standard normal distribution.
Note: the function is non-deterministic in general case.
create_array
: Creates a new array column. The input columns must all have the same data
type.
create_map
: Creates a new map column. The input columns must be grouped as key-value
pairs, e.g. (key1, value1, key2, value2, ...).
The key columns must all have the same data type, and can't be null.
The value columns must all have the same data type.
input_file_name
: Creates a string column with the input file name for a given row.
The method should be used with no argument.
bitwiseNOT since 1.5.0
isnan since 2.0.0
is.nan since 2.0.0
negate since 1.5.0
spark_partition_id since 2.0.0
struct since 1.6.0
nanvl since 1.5.0
greatest since 1.5.0
least since 1.5.0
expr since 1.5.0
when since 1.5.0
rand since 1.5.0
rand(numeric) since 1.5.0
randn since 1.5.0
randn(numeric) since 1.5.0
create_array since 2.3.0
create_map since 2.3.0
input_file_name since 2.3.0
coalesce,SparkDataFrame-method
Other non-aggregate functions:
column()
,
not()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ## Not run:
# Dataframe used throughout this doc
df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
## End(Not run)
## Not run:
head(select(df, bitwiseNOT(cast(df$vs, "int"))))
## End(Not run)
## Not run: head(select(df, monotonically_increasing_id()))
## Not run: head(select(df, spark_partition_id()))
## Not run:
tmp <- mutate(df, v1 = struct(df$mpg, df$cyl), v2 = struct("hp", "wt", "vs"),
v3 = create_array(df$mpg, df$cyl, df$hp),
v4 = create_map(lit("x"), lit(1.0), lit("y"), lit(-1.0)))
head(tmp)
## End(Not run)
## Not run:
tmp <- mutate(df, mpg_na = otherwise(when(df$mpg > 20, df$mpg), lit(NaN)),
mpg2 = ifelse(df$mpg > 20 & df$am > 0, 0, 1),
mpg3 = ifelse(df$mpg > 20, df$mpg, 20.0))
head(tmp)
tmp <- mutate(tmp, ind_na1 = is.nan(tmp$mpg_na), ind_na2 = isnan(tmp$mpg_na))
head(select(tmp, coalesce(tmp$mpg_na, tmp$mpg)))
head(select(tmp, nanvl(tmp$mpg_na, tmp$hp)))
## End(Not run)
## Not run:
tmp <- mutate(df, r1 = rand(), r2 = rand(10), r3 = randn(), r4 = randn(10))
head(tmp)
## End(Not run)
## Not run:
tmp <- read.text("README.md")
head(select(tmp, input_file_name()))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.