text_to_orgs: Match messy text data to social organizations using a...

Description Usage Arguments Examples

View source: R/text_to_orgs.R

Description

This function matches unstructured text data to various dictionaries of organizations by extracting and iterating through consecuetive word sequences (or n-grams). To do this, the function extracts n-grams using the tidytext package, matching all sequences in the unstructured data that have n words and then 'funneling' through all sequences of n-1, n-2, etc. words before matching the single tokens. This process returns a dataframe of ids, organizations, and sectors for only those rows matched within the sectors specified.

Usage

1
2
3
4
5
6
7
text_to_orgs(
  data,
  id,
  input,
  output,
  sector = c("academic", "business", "government", "nonprofit")
)

Arguments

data

A data frame or data frame extension (e.g. a tibble).

id

A numeric or character vector unique to each entry.

input

Character vector of messy or unstructured text that will be unnested as n-grams and matched to dictionary of organizations in specified sector.

output

Output column to be created as string or symbol.

sector

Sector to match by organizations. Currently, the only option is "academic" with "business", "government", "household", and "nonprofit" in development.

Examples

1
2
3
4
5
6
library(tidyverse)
library(tidyorgs)
data(github_users)

classified_by_text <- github_users %>%
  text_to_orgs(login, company, organization, academic)

brandonleekramer/tidyorgs documentation built on Dec. 19, 2021, 11:42 a.m.