n_unik | R Documentation |
This utility tool displays the number of unique elements in one or multiple data.frames as well as their number of NA values.
n_unik(x)
## S3 method for class 'vec_n_unik'
print(x, ...)
## S3 method for class 'list_n_unik'
print(x, ...)
x |
A formula, with data set names on the LHS and variables on the RHS,
like |
... |
Not currently used. |
It returns a vector containing the number of unique values per element. If several data sets were provided, a list is returned, as long as the number of data sets, each element being a vector of unique values.
In the formula, you can use the following special values: "."
, ".N"
, ".U"
, and ".NA"
.
"."
Accesses the default values. If there is only one data set and the
data set is not a data.table
, then the default is to display the number of
observations and the number of unique rows. If the data is a data.table
, the number
of unique items in the key(s) is displayed instead of the number of unique rows
(if the table has keys of course). If there are two or more data sets, then the
default is to display the unique items for: a) the variables common across all data sets,
if there's less than 4, and b) if no variable is shown in a), the number of variables
common across at least two data sets, provided there are less than 5. If the data sets are
data tables, the keys are also displayed on top of the common variables. In any case, the
number of observations is always displayed.
".N"
Displays the number of observations.
".U"
Displays the number of unique rows.
".NA"
Displays the number of rows with at least one NA.
NA
functionThe special function NA
is an equivalent to is.na
but can handle several variables.
For instance, NA(x, y)
is equivalent to is.na(x) | is.na(y)
. You can add as
many variables as you want as arguments. If no argument is provided, as in NA()
,
it is identical to having all the variables of the data set as argument.
Use the "hat", "^"
, operator to combine several variables. For example id^period
will display the number of unique values of id x period combinations.
Use the "super hat", "%^%"
, operator to also include the terms on both sides.
For example, instead of writing id + period + id^period
, you can simply write id%^%period
.
Alternatively, you can use :
for ^
and *
for %^%
.
To show the number of unique values for sub samples, simply use []
.
For example, id[x > 10]
will display the number of unique id
for which x > 10
.
Simple square brackets lead to the inclusion of both the variable and its subset.
For example id[x > 10]
is equivalent to id + id[x > 10]
.
To include only the sub selection, use double square brackets, as in id[[x > 10]]
.
You can add multiple sub selections at once, only separate them with a comma.
For example id[x > 10, NA(y)]
is equivalent to id[x > 10] + id[NA(y)]
.
Use the double negative operator, i.e. !!
, to include both a condition and
its opposite at once. For example id[!!x > 10]
is equivalent to id[x > 10, !x > 10]
.
Double negative operators can be chained, like in id[!!cond1 & !!cond2]
, then the
cardinal product of all double negatived conditions is returned.
Laurent Berge
data = base_did
data$x1.L1 = round(lag(x1~id+period, 1, data))
# By default, just the formatted number of observations
n_unik(data)
# Or the nber of unique elements of a vector
n_unik(data$id)
# number of unique id values and id x period pairs
n_unik(data ~.N + id + id^period)
# use the %^% operator to include the terms on the two sides at once
# => same as id*period
n_unik(data ~.N + id %^% period)
# using sub selection with []
n_unik(data ~.N + period[!NA(x1.L1)])
# to show only the sub selection: [[]]
n_unik(data ~.N + period[[!NA(x1.L1)]])
# you can have multiple values in [],
# just separate them with a comma
n_unik(data ~.N + period[!NA(x1.L1), x1 > 7])
# to have both a condition and its opposite,
# use the !! operator
n_unik(data ~.N[!!NA(x1.L1)])
# the !! operator works within condition chains
n_unik(data ~.N[!!NA(x1.L1) & !!x1 > 7])
# Conditions can be distributed
n_unik(data ~ (id + period)[x1 > 7])
#
# Several data sets
#
# Typical use case: merging
# Let's create two data sets and merge them
data(base_did)
base_main = base_did
base_extra = sample_df(base_main[, c("id", "period")], 100)
base_extra$id[1:10] = 111:120
base_extra$period[11:20] = 11:20
base_extra$z = rnorm(100)
# You can use db1:db2 to compare the common keys in two data sets
n_unik(base_main:base_extra)
tmp = merge(base_main, base_extra, all.x = TRUE, by = c("id", "period"))
# You can show unique values for any variable, as before
n_unik(tmp + base_main + base_extra ~ id[!!NA(z)] + id^period)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.