Description Usage Arguments Details Note Examples

Window functions defined for `Column`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 | ```
.rank(x, ...)
.min_rank(x, ...)
.dense_rank(x, ...)
.percent_rank(x, ...)
.cume_dist(x, ...)
.row_number(...)
.ntile(x, n, ...)
``` |

`x` |
the |

`...` |
additional argument(s). |

`n` |
an integer of the number of tiles used in |

`rank`

: Returns the rank of rows within a window partition.
The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking
sequence when there are ties. That is, if you were ranking a competition using dense_rank
and had three people tie for second place, you would say that all three were in second
place and that the next person came in third. Rank would give me sequential numbers, making
the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the `RANK`

function in SQL.
This can be used with either a `Column`

or a `WindowSpec`

.

`min_rank`

: An alias for `rank`

. In Spark `rank`

and
functions as `min_rank`

. This can be used with either a `Column`

or a `WindowSpec`

.

`dense_rank`

: Returns the rank of rows within a window partition,
without any gaps. The difference between rank and dense_rank is that
dense_rank leaves no gaps in ranking sequence when there are ties. That is,
if you were ranking a competition using dense_rank and had three people tie
for second place, you would say that all three were in second place and that
the next person came in third. Rank would give me sequential numbers, making
the person that came in third place (after the ties) would register as
coming in fifth. This is equivalent to the `DENSE_RANK`

function in
SQL. This can be used with either a `Column`

or a `WindowSpec`

.

`percent_rank`

: Returns the relative rank (i.e. percentile) of rows
within a window partition.
This is computed by: (rank of row in its partition - 1) / (number of rows in
the partition - 1). This is equivalent to the `PERCENT_RANK`

function
in SQL. This can be used with either a `Column`

or a `WindowSpec`

.

`cume_dist`

: Returns the cumulative distribution of values within a
window partition, i.e. the fraction of rows that are below the current row:
(number of values before and including x) / (total number of rows in the
partition). This is equivalent to the `CUME_DIST`

function in SQL.
This can be used with either a `Column`

or a `WindowSpec`

.

`row_number`

: Returns a sequential number starting at 1 within a window
partition. This is equivalent to the `ROW_NUMBER`

function in SQL.
This can be used with either a `Column`

, `WindowSpec`

, or without
an argument, which will order by `monotonically_increasing_id()`

.

`ntile`

: Returns the ntile group id (from 1 to n inclusive) in an
ordered window partition. For example, if n is 4, the first quarter of the
rows will get value 1, the second quarter will get 2, the third quarter will
get 3, and the last quarter will get 4. This is equivalent to the
`NTILE`

function in SQL.

rank since 1.6.0

rank since 1.6.0

dense_rank since 1.6.0

percent_rank since 1.6.0

cume_dist since 1.6.0

row_number since 1.6.0

ntile since 1.6.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ```
## Not run:
# with column names
spark_tbl(iris) %>%
mutate(lead = lead(Petal_Width, 1),
lag = lag(Petal_Width, 2),
n = row_number(),
rank = rank(Petal_Width),
dense_rank = dense_rank(Petal_Width),
percent_rank = percent_rank(Petal_Width),
cume_dist = cume_dist(Petal_Width),
tiles_4 = ntile(Petal_Width, 4)) %>%
collect
# with grouping things are more efficient
spark_tbl(iris) %>%
group_by(Species) %>%
mutate(lead = lead(Petal_Width, 1),
lag = lag(Petal_Width, 2),
n = row_number(),
rank = rank(Petal_Width),
dense_rank = dense_rank(Petal_Width),
percent_rank = percent_rank(Petal_Width),
cume_dist = cume_dist(Petal_Width),
tiles_4 = ntile(Petal_Width, 4)) %>%
collect
# you can also use a WindowSpec() to acheive the same thing
wndw <- partitionBy(windowOrderBy("Petal_Width"), "Species")
spark_tbl(iris) %>%
mutate(n = row_number(),
rank = rank(wndw),
dense_rank = dense_rank(wndw),
percent_rank = percent_rank(wndw),
cume_dist = cume_dist(wndw),
tiles_4 = ntile(wndw, 4)) %>%
collect
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.