Tools for wrangling, managing, and understanding data.
Core tools:
to_camel()
and to_snake()
: Convert strings between snake_case
and camelCase
| ƒ(x)%+%
: String-concatenation infix operator, a la +
in Python | ⚒loch_missingness_monster()
: Provides an easy-to-interpret breakdown of missingness in datasets | ƒ(x)dup_detect()
: Identifies duplicated values in vectors/columns (beyond what base::duplicated()
offers) and assists in removing them | ƒ(x)display_dist()
: Displays an approximate distribution shape in the console (or other i/o system) using Unicode Block Elements glyphs, e.g.: ▂▇▓▇▄▂▁▁▁▁▁ | ƒ(x)Miscellany:
stat4DS_data()
: Retrieves data sets used in Foundations of Statistics for Data Scientists for use as demo/test data | ƒ(x)softmax()
: Calculates the softmax function for a set of inputs to map real values to a probability distribution | ƒ(x)bray_curtis()
: Calculates the Bray-Curtis dissimilarity index (or Sorensen-Dice similarity index) between two sites (with site compositions given as vectors) | ƒ(x)winograd()
: Fetches a Winograd schema from here for use in bot detection (details below) | ƒ(x)Tags:
winograd()
function
Each time the function is run, it pulls, via web scraping with rvest, the text of one Winograd schema from here (website created by Ernest Davis; available under a CC 4.0 license).
A Winograd schema is a sentence that includes an ambiguous pronoun that could refer to either of two antecedent nouns. Which noun the pronoun is rightly associated with depends on which of two words/phrases is present elsewhere in the sentence. For example:
I spread the cloth on the table in order to [protect/display] it.
If the sentence is written as "...to protect it," then it refers to the table. If the sentence is written as "...to display it," then it refers to the cloth.
Winograd schemas require commonsense human reasoning, and they're difficult for computers to resolve. Picking a sentence construction (e.g., "...to protect it" or "...to display it") and asking a question that tests one's understanding of the pronoun's identity (e.g., "What is being [protected][displayed]?") can be an effective way to distinguish people and bots in online surveys. (This is especially true if multiple Winograd schemas are presented; the chance of a bot successfully "guessing" its way past three Winograds is just 12.5%.)
Back when I ran survey studies, I implemented Winograd schemas to preserve data quality when collecting responses via Prolific/Reddit/MTurk/etc. My experience is that they can do a bit too good of a job of flagging responses as potential bots: It's not hard to give the wrong response to a Winograd schema, especially if you're moving quickly. But I often preferred to be overly conservative in the face of bot risk/low-attention responses.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.