uniqueValuesPerField: uniqueValuesPerField

Description Usage Arguments Value Author(s)

Description

This function takes a dataframe, and for each column, it finds 1) the unique values that exist, and 2) the number of occurrences of each value. This can be useful when trying to find blunders in data entry, where an odd value was entered. Additionally, the function can attempt to find instances where a certain number of records were categorized with a particular coding via the target_n parameter.

Usage

1
2
uniqueValuesPerField(df, cols_ignore = NULL, target_n = NULL,
  target_n_wiggle = 0)

Arguments

df

a dataframe to be analyzed.

cols_ignore

the default is NULL. This is a vector of column names that you want to ignore. Any columns listed here will not be included in the results.

target_n

the default is NULL. This is an integer that will be used to seek cases where some value is listed in a field this many times. For example, while comparing data to another source, you might notice that you are off by 26 records. By adding target_n=26, you can return only those columns which have a parameter that had 26 entries in the data.

target_n_wiggle

the default is 0. This parameter allows some wiggle-room on matching the value specified by target_n. target_n_wiggle=0 is strict, and only returns exact matches, but positive values specify a percentage (that is applied to the number of rows) on the supplied df. For example, if the dataframe is 1000 rows, the target_n =500, and the target_n_wiggle =10, the acceptable value range will be 400:600 (10 target_n = 5 and the same target_n_wiggle =10, the acceptable value range will be 4:6.

Value

a list of column names from the original df, with the unique values found in each, as well as the number of records for each unique value. Fields that were found to be either unique (i.e. every value different) or uniform (i.e. every value the same) will be dropped.

Author(s)

Mike McMahon, Mike.McMahon@dfo-mpo.gc.ca


Maritimes/bio.qcdata documentation built on May 14, 2019, 2:40 p.m.