magerman_remove_common_words_at_the_beginning: Removes common words at the beginning of a name
In stasvlasov/nstandr: Name Standardization in R

magerman_remove_common_words_at_the_beginning

R Documentation

Removes common words at the beginning of a name

Description

A simple illustration of what this procedure does:

SOCIETE NOVATEC -> NOVATEC

Usage

magerman_remove_common_words_at_the_beginning(
  x,
  patterns = magerman_patterns_common_words_at_the_beginning,
  patterns_col = 1,
  patterns_mode = "all",
  patterns_mode_col = NULL,
  patterns_type = "begins",
  patterns_type_col = NULL,
  patterns_replacements_col = 2,
  replacements = if (is.atomic(patterns)) "" else NULL,
  ...
)

Arguments

`x`	Vector or table to standardize.
`patterns`	Accepts both vector or table. If patterns is a table can also include replacements column.
`patterns_col`	If patterns is not a vector which column to use. Default is 1.
`patterns_mode`	Mode of matching. Could be one of c("all", "first", "last"). The default is "all" (it is 2x faster than "first" and "last" because of handy stri_replace_all_* functions). Also possible to pass a vector (same length as patterns)
`patterns_mode_col`	Column in patterns table with the mode of matching
`patterns_type`	Type of pattern for matching. Default is "fixed" (calling codestri_replace_all_fixed). Other options are:
`patterns_type_col`	Column with the type of pattern in case when patterns should have different types
`patterns_replacements_col`	If patterns is not a vector and includes replacements which column to use for replacements. Default is 2.
`replacements`	If patterns does not have column with replacements provide it here.
`...`	Arguments passed on to `standardize_options` `col` Column of interest (the one we need to standardize) in the `x` object (if it is data.frame like). `rows` Logical vector to filter records of interest. Default is NULL which means do not filter records. `omitted_rows_value` If `rows` parameter is set then merge `omitted_rows_value` with the results (filtered by `rows`). Either single string or a character vector of length `nrow(x)`. If NULL (the default) then original values of `col` are merged with results. `output_placement` Where to inset retults (standardized vector) in the `x` object. Default options is 'replace_col' which overwrides the `col` in `x` with results. Other options: 'omit' :: do not write results back to table (usually used when `append_output_copy` is set for temporary values) 'prepend_to_col' :: prepend to `col` 'append_to_col' :: append to `col` 'prepend_to_x' :: prepend to `x` data.frame like object 'append_to_x' :: append to `x` data.frame like object `x_atomic_name` If `x` is vector use this name for original column if it is in results. Default is "x". If `x` is table the name of `col` will be used. `output_col_name` Use this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are: `append_output_copy` Whether to append a copy of result vector to `x` object `output_copy_col_name` How the append copy wiil be named

Value

If nothing was indicated to cbind to results then it returns standardized vector. If something needs to be cbind then it returns data.table

Other magerman: cockburn_detect_corp(), cockburn_detect_govt(), cockburn_detect_hosp(), cockburn_detect_indiv(), cockburn_detect_inst_conds_1(), cockburn_detect_inst(), cockburn_detect_univ(), cockburn_detect_uspto(), cockburn_remove_standard_names(), cockburn_remove_uspto(), cockburn_replace_compustat_names(), cockburn_replace_compustat(), cockburn_replace_derwent(), cockburn_replace_govt(), cockburn_replace_univ(), magerman_condense(), magerman_detect_characters(), magerman_detect_comma_period_irregularities(), magerman_detect_legal_form_beginning(), magerman_detect_legal_form_end(), magerman_detect_legal_form_middle(), magerman_detect_umlaut(), magerman_remove_common_words_anywhere(), magerman_remove_common_words_at_the_end(), magerman_remove_double_quotation_marks_beginning_end(), magerman_remove_double_quotation_marks_irregularities(), magerman_remove_double_spaces(), magerman_remove_html_codes(), magerman_remove_non_alphanumeric_at_the_beginning(), magerman_remove_non_alphanumeric_at_the_end(), magerman_remove_special_characters(), magerman_replace_accented_characters(), magerman_replace_comma_period_irregularities_all(), magerman_replace_comma_period_irregularities(), magerman_replace_legal_form_beginning(), magerman_replace_legal_form_end(), magerman_replace_legal_form_middle(), magerman_replace_proprietary_characters(), magerman_replace_sgml_characters(), magerman_replace_spelling_variation(), standardize_eee_ppat()