Description Usage Arguments Details Note Examples
Window functions defined for Column
.
1 2 3 4 5 6 7 8 9 10 11 12 13 | .rank(x, ...)
.min_rank(x, ...)
.dense_rank(x, ...)
.percent_rank(x, ...)
.cume_dist(x, ...)
.row_number(...)
.ntile(x, n, ...)
|
x |
the |
... |
additional argument(s). |
n |
an integer of the number of tiles used in |
rank
: Returns the rank of rows within a window partition.
The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking
sequence when there are ties. That is, if you were ranking a competition using dense_rank
and had three people tie for second place, you would say that all three were in second
place and that the next person came in third. Rank would give me sequential numbers, making
the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the RANK
function in SQL.
This can be used with either a Column
or a WindowSpec
.
min_rank
: An alias for rank
. In Spark rank
and
functions as min_rank
. This can be used with either a Column
or a WindowSpec
.
dense_rank
: Returns the rank of rows within a window partition,
without any gaps. The difference between rank and dense_rank is that
dense_rank leaves no gaps in ranking sequence when there are ties. That is,
if you were ranking a competition using dense_rank and had three people tie
for second place, you would say that all three were in second place and that
the next person came in third. Rank would give me sequential numbers, making
the person that came in third place (after the ties) would register as
coming in fifth. This is equivalent to the DENSE_RANK
function in
SQL. This can be used with either a Column
or a WindowSpec
.
percent_rank
: Returns the relative rank (i.e. percentile) of rows
within a window partition.
This is computed by: (rank of row in its partition - 1) / (number of rows in
the partition - 1). This is equivalent to the PERCENT_RANK
function
in SQL. This can be used with either a Column
or a WindowSpec
.
cume_dist
: Returns the cumulative distribution of values within a
window partition, i.e. the fraction of rows that are below the current row:
(number of values before and including x) / (total number of rows in the
partition). This is equivalent to the CUME_DIST
function in SQL.
This can be used with either a Column
or a WindowSpec
.
row_number
: Returns a sequential number starting at 1 within a window
partition. This is equivalent to the ROW_NUMBER
function in SQL.
This can be used with either a Column
, WindowSpec
, or without
an argument, which will order by monotonically_increasing_id()
.
ntile
: Returns the ntile group id (from 1 to n inclusive) in an
ordered window partition. For example, if n is 4, the first quarter of the
rows will get value 1, the second quarter will get 2, the third quarter will
get 3, and the last quarter will get 4. This is equivalent to the
NTILE
function in SQL.
rank since 1.6.0
rank since 1.6.0
dense_rank since 1.6.0
percent_rank since 1.6.0
cume_dist since 1.6.0
row_number since 1.6.0
ntile since 1.6.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ## Not run:
# with column names
spark_tbl(iris) %>%
mutate(lead = lead(Petal_Width, 1),
lag = lag(Petal_Width, 2),
n = row_number(),
rank = rank(Petal_Width),
dense_rank = dense_rank(Petal_Width),
percent_rank = percent_rank(Petal_Width),
cume_dist = cume_dist(Petal_Width),
tiles_4 = ntile(Petal_Width, 4)) %>%
collect
# with grouping things are more efficient
spark_tbl(iris) %>%
group_by(Species) %>%
mutate(lead = lead(Petal_Width, 1),
lag = lag(Petal_Width, 2),
n = row_number(),
rank = rank(Petal_Width),
dense_rank = dense_rank(Petal_Width),
percent_rank = percent_rank(Petal_Width),
cume_dist = cume_dist(Petal_Width),
tiles_4 = ntile(Petal_Width, 4)) %>%
collect
# you can also use a WindowSpec() to acheive the same thing
wndw <- partitionBy(windowOrderBy("Petal_Width"), "Species")
spark_tbl(iris) %>%
mutate(n = row_number(),
rank = rank(wndw),
dense_rank = dense_rank(wndw),
percent_rank = percent_rank(wndw),
cume_dist = cume_dist(wndw),
tiles_4 = ntile(wndw, 4)) %>%
collect
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.