detect_academic: Match messy text and email data to academic, business,...

Description Usage Arguments Examples

View source: R/detect_academic.R

Description

This function standardizes messy text data and/or email information to social organizations. The detect_orgs() function iterates through email domains and unstructured text to match patterns in our curated dictionaries to standardize organizations. This tool is designed to optimize pattern detection for in the linkage of multiple datasets, for bibliometric analysis, and for sector classification in social, economic, and policy analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
detect_academic(
  data,
  id,
  input,
  output,
  email = FALSE,
  country = FALSE,
  parent_org = FALSE,
  org_type = FALSE
)

Arguments

data

A data frame or data frame extension (e.g. a tibble).

id

A numeric or character vector unique to each entry.

input

Character vector of messy or unstructured text that will be matched to organizations from one (or all) of five economic sectors (see sector parameter).

output

Output column to be created as string or symbol.

email

Optional character vector of email or email domain information. Defaults to FALSE.

country

Optional parameter that returns country of organization when available. Defaults to FALSE.

parent_org

Optional parameter that returns the parent organization when available. For the academic sector, this value is the school system of the organization. Defaults to FALSE.

org_type

Optional parameter that returns organization type when available. Current return values include "Public", "Private for-profit", and "Private not-for-profit". Defaults to FALSE.

Examples

1
2
3
4
5
6
library(tidyverse)
library(tidyorgs)
data(github_users)

classified_users <- github_users %>%
  detect_academic(login, company, organization, email, parent_org, org_type)

brandonleekramer/tidyorgs documentation built on Dec. 19, 2021, 11:42 a.m.