stringPreprocessing: Preprocess German occupational text

View source: R/stringPreprocessing.R

stringPreprocessingR Documentation

Preprocess German occupational text

Description

Function replaces some common characters / character sequences (e.g., Ä, Ü, "DIPL.-ING.") with their uppercase equivalents and removes punctuation, empty spaces and the word "Diplom".

Usage

stringPreprocessing(verbatim, lang = "de")

Arguments

verbatim

a character vector.

lang

(default de) Everything else will throw an error.

Details

charToRaw helps to find UTF-8 characters.

Value

the same character vector after processing

Examples

(x <- c("Verkauf von B\xfcchern, Schreibwaren", "Fach\xe4rzin f\xfcr Kinder- und Jugendmedizin im \xf6ffentlichen Gesundheitswesen", "Industriemechaniker", "Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)"))
stringPreprocessing(x)

malsch/occupationCoding documentation built on March 14, 2024, 8:09 a.m.