possible_ids: Find possible unique identifies of data frame
In randrescastaneda/joyn: Tool for Diagnosis of Tables Joins and Complementary Join Features

possible_ids

R Documentation

Find possible unique identifies of data frame

Description

Identify possible combinations of variables that uniquely identifying dt

Usage

possible_ids(
  dt,
  vars = NULL,
  exclude = NULL,
  include = NULL,
  exclude_classes = NULL,
  include_classes = NULL,
  verbose = getOption("possible_ids.verbose", default = FALSE),
  min_combination_size = 1,
  max_combination_size = 5,
  max_processing_time = 60,
  max_numb_possible_ids = 100,
  get_all = FALSE
)

Arguments

`dt`	data frame
`vars`	character: A vector of variable names to consider for identifying unique combinations.
`exclude`	character: Names of variables to exclude from analysis
`include`	character: Name of variable to be included, that might belong to the group excluded in the `exclude`
`exclude_classes`	character: classes to exclude from analysis (e.g., "numeric", "integer", "date")
`include_classes`	character: classes to include in the analysis (e.g., "numeric", "integer", "date")
`verbose`	logical: If FALSE no message will be displayed. Default is TRUE
`min_combination_size`	numeric: Min number of combinations. Default is 1, so all combinations.
`max_combination_size`	numeric. Max number of combinations. Default is 5. If there is a combinations of identifiers larger than `max_combination_size`, they won't be found
`max_processing_time`	numeric: Max time to process in seconds. After that, it returns what it found.
`max_numb_possible_ids`	numeric: Max number of possible IDs to find. See details.
`get_all`	logical: get all possible combinations based on the parameters above.

Value

list with possible identifiers

Number of possible IDs

The number of possible IDs in a dataframe could be very large. This is why, possible_ids() makes use of heuristics to return something useful without wasting the time of the user. In addition, we provide multiple parameter so that the user can fine tune their search for possible IDs easily and quickly.

Say for instance that you have a dataframe with 10 variables. Testing every possible pair of variables will give you 90 possible unique identifiers for this dataframe. If you want to test all the possible IDs, you will have to test more 5000 combinations. If the dataframe has many rows, it may take a while.

Examples

library(data.table)
x4 = data.table(id1 = c(1, 1, 2, 3, 3),
                id2 = c(1, 1, 2, 3, 4),
                t   = c(1L, 2L, 1L, 2L, NA_integer_),
                x   = c(16, 12, NA, NA, 15))
possible_ids(x4)

randrescastaneda/joyn documentation built on Dec. 20, 2024, 6:51 a.m.

randrescastaneda/joyn index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

randrescastaneda/joyn
Tool for Diagnosis of Tables Joins and Complementary Join Features

possible_ids: Find possible unique identifies of data frame
In randrescastaneda/joyn: Tool for Diagnosis of Tables Joins and Complementary Join Features

Find possible unique identifies of data frame

Description

Usage

Arguments

Value

Number of possible IDs

Examples

Related to possible_ids in randrescastaneda/joyn...

R Package Documentation

Browse R Packages

We want your feedback!

randrescastaneda/joyn Tool for Diagnosis of Tables Joins and Complementary Join Features

possible_ids: Find possible unique identifies of data frame In randrescastaneda/joyn: Tool for Diagnosis of Tables Joins and Complementary Join Features

Find possible unique identifies of data frame

Description

Usage

Arguments

Value

Number of possible IDs

Examples

Related to possible_ids in randrescastaneda/joyn...

R Package Documentation

Browse R Packages

We want your feedback!

randrescastaneda/joyn
Tool for Diagnosis of Tables Joins and Complementary Join Features

possible_ids: Find possible unique identifies of data frame
In randrescastaneda/joyn: Tool for Diagnosis of Tables Joins and Complementary Join Features