knitr::opts_chunk$set(echo = TRUE)

tidyorgs: A tidy package that standardizes text data for organizational and sector analysis

Authors: Brandon Kramer with contributions from members of the University of Virginia's Biocomplexity Institute, the National Center for Science and Engineering Statistics, and the 2020 and 2021 UVA Data Science for the Public Good Open Source Software Teams
License: MIT

Installation

You can install this package using the devtools package:

install.packages("devtools")
devtools::install_github("brandonleekramer/tidyorgs") 

The tidyorgs package provides several functions that help standardize messy text data for organizational analysis. More specifically, the package's two core sets of functions detect_{sector}() and email_to_orgs() standardize organizations from across the academic, business, government and nonprofit sectors based on unstructured text and email domains. The package is intended to support linkage across multiple datasets, bibliometric analysis, and sector classification for social, economic, and policy analysis.

Matching organizations with the detect_orgs() function

The detect_{sector}() functions detects patterns in messy text data and then standardizes them into organizations based on a curated dictionary. For example, messy bio information scraped from GitHub can be easily codified so that statistical analysis can be done on academic users.

detect_academic()

library(tidyverse)
library(tidyorgs)
data(github_users)

classified_academic <- github_users %>%
  detect_academic(login, company, organization, email) %>% 
  filter(academic == 1) %>% 
  select(login, organization, company) 

classified_academic

detect_business()

classified_businesses <- github_users %>%
  detect_business(login, company, organization, email) %>% 
  filter(business == 1) %>%
  select(login, organization, company)
classified_businesses

detect_government()

classified_government <- github_users %>%
  detect_government(login, company, organization, email) %>% 
  filter(government == 1) %>% 
  select(login, organization, company)
classified_government

detect_nonprofit()

classified_nonprofit <- github_users %>%
  detect_nonprofit(login, company, organization, email) %>% 
  filter(nonprofit == 1) %>% 
  select(login, organization, company, email)
classified_nonprofit

Matching users to organizations by emails using email_to_orgs()

For those that only have email information, the email_to_orgs() function matches users to organizations based on our curated domain list.

user_emails_to_orgs <- github_users %>%
  email_to_orgs(login, email, country_name, "academic") 

github_users %>% 
  left_join(user_emails_to_orgs, by = "login") %>% 
  drop_na(country_name) %>% 
  select(email, country_name)


brandonleekramer/tidyorgs documentation built on Dec. 19, 2021, 11:42 a.m.