README.md
In jacob-gg/manager: Tools

manager

Tools for wrangling, managing, and understanding data.

Core tools:

to_camel() and to_snake(): Convert strings between snake_case and camelCase | ƒ(x)
%+%: String-concatenation infix operator, a la + in Python | ⚒
loch_missingness_monster(): Provides an easy-to-interpret breakdown of missingness in datasets | ƒ(x)
dup_detect(): Identifies duplicated values in vectors/columns (beyond what base::duplicated() offers) and assists in removing them | ƒ(x)
display_dist(): Displays an approximate distribution shape in the console (or other i/o system) using Unicode Block Elements glyphs, e.g.: ▂▇▓▇▄▂▁▁▁▁▁ | ƒ(x)

Miscellany:

stat4DS_data(): Retrieves data sets used in Foundations of Statistics for Data Scientists for use as demo/test data | ƒ(x)
softmax(): Calculates the softmax function for a set of inputs to map real values to a probability distribution | ƒ(x)
bray_curtis(): Calculates the Bray-Curtis dissimilarity index (or Sorensen-Dice similarity index) between two sites (with site compositions given as vectors) | ƒ(x)
winograd(): Fetches a Winograd schema from here for use in bot detection (details below) | ƒ(x)

Tags:

ƒ(x) - function
⚒ - operator
ℴ - object

Click here for additional details on winograd() function Each time the function is run, it pulls, via web scraping with rvest, the text of one Winograd schema from here (website created by Ernest Davis; available under a CC 4.0 license).

A Winograd schema is a sentence that includes an ambiguous pronoun that could refer to either of two antecedent nouns. Which noun the pronoun is rightly associated with depends on which of two words/phrases is present elsewhere in the sentence. For example:

I spread the cloth on the table in order to [protect/display] it.

If the sentence is written as "...to protect it," then it refers to the table. If the sentence is written as "...to display it," then it refers to the cloth.

Winograd schemas require commonsense human reasoning, and they're difficult for computers to resolve. Picking a sentence construction (e.g., "...to protect it" or "...to display it") and asking a question that tests one's understanding of the pronoun's identity (e.g., "What is being [protected][displayed]?") can be an effective way to distinguish people and bots in online surveys. (This is especially true if multiple Winograd schemas are presented; the chance of a bot successfully "guessing" its way past three Winograds is just 12.5%.)

Back when I ran survey studies, I implemented Winograd schemas to preserve data quality when collecting responses via Prolific/Reddit/MTurk/etc. My experience is that they can do a bit too good of a job of flagging responses as potential bots: It's not hard to give the wrong response to a Winograd schema, especially if you're moving quickly. But I often preferred to be overly conservative in the face of bot risk/low-attention responses.

jacob-gg/manager documentation built on July 2, 2024, 2:09 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com