replace_field: Replace or process a specified field in a data.table

View source: R/replace_field.R

replace_fieldR Documentation

Replace or process a specified field in a data.table

Description

This function processes a specified field in a data.table by using a journal abbreviation table (either user-specified, system-provided, or both) to abbreviate journal names in the oldfield. Additionally, a custom function can be applied to the processed field if specified.

Usage

replace_field(
  dt,
  oldfield,
  newfield,
  user_table = NULL,
  use_sys_table = TRUE,
  fun = NULL,
  ...
)

Arguments

dt

A data.table. The input data table that contains the field to be processed.

oldfield

A character string. The name of the field to be processed, typically in uppercase (e.g., "JOURNAL"). Must be a valid column name in dt.

newfield

A character string. The name of the new field where the processed result will be stored. If this field does not exist in dt, it will be created.

user_table

A data.table. Optional. A user-provided journal abbreviation table with at least two columns: journal_lower and journal_abbr. Defaults to NULL. If provided, it will be merged with the system abbreviation table if use_sys_table = TRUE.

use_sys_table

A logical. Whether to use the system-provided journal abbreviation table. Defaults to TRUE. If TRUE, the system abbreviation table is used alongside the user-provided one.

fun

A function. Optional. A custom function to apply to the processed field after abbreviation (if applicable). Defaults to NULL. The function should accept a column from dt as its first argument and return the processed values.

...

Additional arguments passed to the custom function fun, if provided.

Details

If the oldfield is "JOURNAL", the function will attempt to apply journal abbreviations using the provided abbreviation table(s). The abbreviation process involves converting the journal names to lowercase and removing excess whitespace before matching against the abbreviation table.

The user_table and use_sys_table parameters allow flexibility in choosing which abbreviation tables to use. If both are used, they will be merged, and duplicates will be removed.

If a custom function is provided via fun, it will be applied to the processed field after any abbreviations.

Value

The function returns the modified data.table with the processed field stored in newfield.

Examples

csvpath <- system.file("extdata", "myabbr.csv", package = "journalabbr", mustWork = TRUE)
file <- system.file("extdata", "testfile_2.bib", package = "journalabbr", mustWork = TRUE)
dt <- read_bib2dt(file)

abbrtable_user = get_abbrtable(user_table = csvpath, use_sys_table =TRUE)
print(head(abbrtable_user))
dm1 = replace_field(dt,
                    oldfield = "JOURNAL",
                    newfield = "JOURNAL",
                    user_table = csvpath,
                    use_sys_table =TRUE,
                    fun = NULL)

myauthor = function(x){ gsub(" and ", " & ", x, perl = TRUE, ignore.case = TRUE) }
dm2 = replace_field(dm1,
                    oldfield = "AUTHOR",
                    newfield = "AUTHOR",
                    user_table = NULL,
                    use_sys_table =FALSE,
                    fun = myauthor)
print(head(dt)[, c("JOURNAL", "AUTHOR"), with = FALSE])
print(head(dm1)[, c("JOURNAL", "AUTHOR"), with = FALSE])
print(head(dm2)[, c("JOURNAL", "AUTHOR"), with = FALSE])


zoushucai/journalabbr documentation built on Dec. 6, 2024, 4:41 p.m.