Home

/

GitHub

/

georgemirandajr/identifyr

/

README.md

README.md
In georgemirandajr/identifyr: Clean Unique Identifiers

Identifyr 0.1.3

Purpose

This package provides efficient tools for cleaning unique identifiers used by the Los Angeles County Probation Department. It contains functions that standardize identifiers in preparation for analysis by removing extraneous characters and padding where necessary. Users also have an option to utilize an internal dataset to validate or obtain identifiers based on another identifier. A wrapper function that applies multiple identifyr functions at once is also included (similar to dplyr::mutate, but is maybe more user friendly for this purpose).

How to Use It

Users can apply the individual functions to an identifier of interest, or use the wrapper clean_id() to indicate the column numbers or names and corresponding functions to apply. The user-supplied columns and the functions to apply must be in the same order (otherwise you can apply a function to the wrong column and get an error!). The result of the wrapper function is a dataframe of the original length with replaced values in the indicated columns. Identifyr utilizes the magrittr pipe operator because it is designed to be used in conjunction with other data cleaning packages that use this operator such as dplyr and tidyr.

You can apply one function at a time.

clean_x("X6789")
clean_case("PB123")

You can pass additional arguments to clean_x that verifies the identifier against a reference table.

# If the case number is known, you can use this to obtain/verify the X-Number. 
# The 'using' argument is how you would like to verify the ID and 'value' is the actual known ID.
clean_x("X00020", using = "CASE", value = "PB021665")

You can obtain an identifier if you have another identifier that could be cross-referenced in the built-in table. Currently, you can obtain either X or CII numbers.

obtain_id(obtain = "X",  # what you want to obtain
  using = "CASE",        # the type of ID that you already know
  value = "PB021665")    # the actual ID that you know

Apply the cleaning functions at once (similar to dplyr::mutate)

df %>%
  clean_id(
    cols = c(1, 3),  # reference the desired column index or name to manipulate
    FUN = c("clean_x", "clean_case")  # apply these functions in this order
  )

Where to Find It

The latest version of identifyr is available for download from github if you have the devtools package for R. The initial release is available on the Comprehensive R Archive Network (CRAN) and all future major releases will be available on CRAN.

install.packages("devtools")

devtools::install_github("georgemirandajr/identifyr")

install.packages("identifyr")

georgemirandajr/identifyr documentation built on May 17, 2019, 1:15 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

georgemirandajr/identifyr
Clean Unique Identifiers

README.md
In georgemirandajr/identifyr: Clean Unique Identifiers

Identifyr 0.1.3

Purpose

How to Use It

Cleaning Functions

Obtain_id

Clean_id

Where to Find It

From Github

From Comprehensive R Archive Network (CRAN)

R Package Documentation

Browse R Packages

We want your feedback!

georgemirandajr/identifyr Clean Unique Identifiers

README.md In georgemirandajr/identifyr: Clean Unique Identifiers

Identifyr 0.1.3

Purpose

How to Use It

Cleaning Functions

Obtain_id

Clean_id

Where to Find It

From Github

From Comprehensive R Archive Network (CRAN)

R Package Documentation

Browse R Packages

We want your feedback!

georgemirandajr/identifyr
Clean Unique Identifiers

README.md
In georgemirandajr/identifyr: Clean Unique Identifiers