Description Usage Arguments Examples
This function matches unstructured text data to various dictionaries of organizations by extracting and iterating through consecuetive word sequences (or n-grams). To do this, the function extracts n-grams using the tidytext package, matching all sequences in the unstructured data that have n words and then 'funneling' through all sequences of n-1, n-2, etc. words before matching the single tokens. This process returns a dataframe of ids, organizations, and sectors for only those rows matched within the sectors specified.
1 2 3 4 5 6 7 | text_to_orgs(
data,
id,
input,
output,
sector = c("academic", "business", "government", "nonprofit")
)
|
data |
A data frame or data frame extension (e.g. a tibble). |
id |
A numeric or character vector unique to each entry. |
input |
Character vector of messy or unstructured text that will be unnested as n-grams and matched to dictionary of organizations in specified sector. |
output |
Output column to be created as string or symbol. |
sector |
Sector to match by organizations. Currently, the only option is "academic" with "business", "government", "household", and "nonprofit" in development. |
1 2 3 4 5 6 | library(tidyverse)
library(tidyorgs)
data(github_users)
classified_by_text <- github_users %>%
text_to_orgs(login, company, organization, academic)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.