factor_ | R Documentation |
factor()
along with cheaper utilitiesA fast version of factor()
using the collapse package.
There are some additional utilities, most of which begin with the prefix
'levels_', such as
as_factor()
which is an efficient way to coerce both vectors and factors,
levels_factor()
which returns the levels of a factor, as a factor,
levels_used()
which returns the used levels of a factor,
levels_unused()
which returns the unused levels of a factor,
levels_add()
adds the specified levels onto the existing levels,
levels_rm()
removes the specified levels,
levels_add_na()
which adds an explicit NA
level,
levels_drop_na()
which drops the NA
level,
levels_drop()
which drops unused factor levels,
levels_rename()
for renaming levels,
levels_lump()
which returns top n levels and lumps all others into the
same category,
levels_count()
which returns the counts of each level,
and finally levels_reorder()
which reorders the levels of x
based on y
using the ordered median values of y
for each level.
factor_(
x = integer(),
levels = NULL,
order = TRUE,
na_exclude = TRUE,
ordered = is.ordered(x)
)
as_factor(x)
levels_factor(x)
levels_used(x)
levels_unused(x)
levels_rm(x, levels)
levels_add(x, levels, where = c("last", "first"))
levels_add_na(x, name = NA, where = c("last", "first"))
levels_drop_na(x)
levels_drop(x)
levels_reorder(x, order_by, decreasing = FALSE)
levels_rename(x, ..., .fun = NULL)
levels_lump(
x,
n,
prop,
other_category = "Other",
ties = c("min", "average", "first", "last", "random", "max")
)
levels_count(x)
x |
A vector. |
levels |
Optional factor levels. |
order |
Should factor levels be sorted? Default is |
na_exclude |
Should |
ordered |
Should the result be an ordered factor? |
where |
Where should |
name |
Name of |
order_by |
A vector to order the levels of |
decreasing |
Should the reordered levels be in decreasing order?
Default is |
... |
Key-value pairs where the key is the new name and
value is the name to replace that with the new name. For example
|
.fun |
Renaming function applied to each level. |
n |
Top n number of levels to calculate. |
prop |
Top proportion of levels to calculate. This is a proportion of the total unique levels in x. |
other_category |
Name of 'other' category. |
ties |
Ties method to use. See |
This operates similarly to collapse::qF()
.
The main difference internally is that collapse::funique()
is used
and therefore s3 methods can be written for it.
Furthermore, for date-times factor_
differs in that it differentiates
all instances in time whereas factor
differentiates calendar times.
Using a daylight savings example where the clocks go back:
factor(as.POSIXct(1729984360, tz = "Europe/London") + 3600 *(1:5))
produces 4 levels whereas
factor_(as.POSIXct(1729984360, tz = "Europe/London") + 3600 *(1:5))
produces 5 levels.
levels_lump()
is a cheaper version of forcats::lump_n()
but returns
levels in order of highest frequency to lowest. This can be very useful
for plotting.
A factor
or character
in the case of levels_used
and levels_unused
.
levels_count
returns a data frame of counts and proportions for each level.
library(cheapr)
x <- factor_(sample(letters[sample.int(26, 10)], 100, TRUE), levels = letters)
x
# Used/unused levels
levels_used(x)
levels_unused(x)
# Drop unused levels
levels_drop(x)
# Top 3 letters by by frequency
lumped_letters <- levels_lump(x, 3)
levels_count(lumped_letters)
# To remove the "other" category, use `levels_rm()`
levels_count(levels_rm(lumped_letters, "Other"))
# We can use levels_lump to create a generic top n function for non-factors too
get_top_n <- function(x, n){
f <- levels_lump(factor_(x, order = FALSE), n = n)
levels_count(f)
}
get_top_n(x, 3)
# A neat way to order the levels of a factor by frequency
# is the following:
levels(levels_lump(x, prop = 1)) # Highest to lowest
levels(levels_lump(x, prop = -1)) # Lowest to highest
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.