transform: Text transformation for OpenRefine project

transformR Documentation

Text transformation for OpenRefine project

Description

The text transform functions allow users to pass arbitrary text transformations to a column in an existing OpenRefine project via an API query to /command/core/apply-operations and the core/text-transform operation. Besides the generic refine_transform(), the package includes a series of transform functions that apply commonly used text operations. For more information on these functions see 'Details'.

Usage

refine_transform(
  column_name,
  expression,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_to_lower(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_to_upper(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_to_title(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_to_null(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_to_empty(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_to_text(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_to_number(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_to_date(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_trim_whitespace(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_collapse_whitespace(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

refine_unescape_html(
  column_name,
  mode = "row-based",
  on_error = "set-to-blank",
  project.name = NULL,
  project.id = NULL,
  verbose = FALSE,
  validate = TRUE,
  ...
)

Arguments

column_name

Name of the column on which text transformation should be performed

expression

Expression defining the text transformation to be performed

mode

Mode of operation; must be one of "row-based" or "record-based"; default is "row-based"

on_error

Behavior if there is an error on new column creation; must be one of "set-to-blank", "keep-original", or "store-error"; default is "set-to-blank"

project.name

Name of project

project.id

Unique identifier for project

verbose

Logical specifying whether or not query result should be printed; default is FALSE

validate

Logical as to whether or not the operation should validate parameters against existing data in project; default is TRUE

...

Additional parameters to be inherited by refine_path; allows users to specify host and port arguments if the OpenRefine instance is running at a location other than http://127.0.0.1:3333

Details

The refine_transform() function allows the user to pass arbitrary text transformations to a given column in an OpenRefine project. The package includes a set of functions that wrap refine_transform() to execute common transformations:

  • refine_to_lower(): Coerce text to lowercase

  • refine_to_upper(): Coerce text to uppercase

  • refine_to_title(): Coerce text to title case

  • refine_to_null(): Set values to NULL

  • refine_to_empty(): Set text values to empty string ("")

  • refine_to_text(): Coerce value to string

  • refine_to_number(): Coerce value to numeric

  • refine_to_date(): Coerce value to date

  • refine_trim_whitespace(): Remove leading and trailing whitespaces

  • refine_collapse_whitespace(): Collapse consecutive whitespaces to single whitespace

  • refine_unescape_html(): Unescape HTML in string

Value

Operates as a side-effect passing operations to the OpenRefine instance. However, if verbose=TRUE then the function will return an object of the class "response".

Examples

## Not run: 
fp <- system.file("extdata", "lateformeeting.csv", package = "rrefine")
refine_upload(fp, project.name = "lfm")

refine_add_column(new_column = "dotw",
                 base_column = "what day whas it",
                 value = "grel:value",
                 project.name = "lfm")

refine_export("lfm")$dotw
refine_to_lower("dotw", project.name = "lfm")
refine_export("lfm")$dotw
refine_to_upper("dotw", project.name = "lfm")
refine_export("lfm")$dotw
refine_to_title("dotw", project.name = "lfm")
refine_export("lfm")$dotw
refine_to_null("dotw", project.name = "lfm")
refine_export("lfm")$dotw
refine_remove_column("dotw", project.name = "lfm")

refine_add_column(new_column = "date",
                 base_column = "theDate",
                 value = "grel:value",
                 project.name = "lfm")

refine_export("lfm")$date
refine_to_date("date", project.name = "lfm")
refine_export("lfm")$date
refine_remove_column("date", project.name = "lfm")


## End(Not run)


vpnagraj/rrefine documentation built on Nov. 21, 2022, 12:20 a.m.