sift | R Documentation |
It can be hard to find the right column in a dataframe with hundreds or thousands of columns. This function gives you interactive, flexible searching through a dataframe, suggesting columns that are relevant to your query and showing some basic summary stats about what they contain.
sift(.df, ..., .dist = 0, .rebuild = FALSE)
.df |
(Dataframe) A dataframe to search through. |
... |
(Dots) Search query. Case-insensitive. See Details for more information. |
.dist |
(Numeric) The maximum distance allowed for a match when searching
fuzzily. See |
.rebuild |
(Logical) If |
You have three ways to search with sift()
: exact search, fuzzy search, or
orderless search (also called look-around search).
Exact search looks for exact matches to your query. For example, searching for
"weight of"
will only match weight of
.
Fuzzy search gives you results that are close, but not exact, matches to your
query. This is useful because real-world labelling is not always consistent or even
correct, so using a fuzzy search for "baseline"
will helpfully match baseline
or
base line
or even OCR errors or typos like basellne
.
Orderless search matches keywords regardless of the order you give them. This
means that you can ask for cow, number
and get a match for number of cows
.
This is useful when you have an idea of what keywords should be in a variable label,
but not how those keywords are actually used or phrased. Note that this is not
a fuzzy search, so the keywords have to match exactly.
The search that's performed depends on ...
and .dist
:
Orderless search is always used when you pass more than one query term into ...
.
Exact search is done when .dist = 0
.
Fuzzy search must be opted-into by setting the .dist
argument to a value > 0. It
is ignored in orderless searching.
Invisibly returns a dataframe. The contents of that dataframe depend on the query:
If ...
is empty, the full data dictionary for df
is returned.
If the query was matched, only returns matching rows of the data dictionary.
If the query was not matched, return no rows of the dictionary (but all columns).
save_dictionary()
, options_sift()
sift(mtcars_lab) # Builds a dictionary without searching.
sift(mtcars_lab, .) # Show everything up to the print limit (by default, 25 matches).
sift(mtcars_lab, mileage) # Exact search for "mileage".
sift(mtcars_lab, "above avg", .dist = 1) # Fuzzy search (here, space -> underscore).
sift(mtcars_lab, "na", "column") # Orderless searches are always exact.
sift(mtcars_lab, "date|time") # Regular expression
sift(mtcars_lab, "cyl|gear", number) # Orderless search with regular expression
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.