identify_nonvalid_ids_with_matched_names: Identify non-valid IDs in a dataframe based on IDs in another...

View source: R/check_timci_quality.R

identify_nonvalid_ids_with_matched_namesR Documentation

Identify non-valid IDs in a dataframe based on IDs in another dataframe (TIMCI-specific function)

Description

This function takes in two data frames and two column names, and identifies the non-valid IDs in the first data frame based on the IDs in the second data frame. It returns a list of two data frames, one containing the IDs and dates at which the ID has been allocated, and the other containing the cleaned data.

Usage

identify_nonvalid_ids_with_matched_names(
  df1,
  col_id1,
  df2,
  col_id2,
  col_date1,
  ldate_diff,
  udate_diff,
  matched_names = FALSE,
  cleaning = "none"
)

Arguments

df1

A dataframe containing the data to check for non-valid IDs.

col_id1

The column name containing IDs in df1.

df2

A reference dataframe containing the valid IDs to compare with.

col_id2

The column name containing IDs in df2.

col_date1

The name of the column containing the date in the df1 dataframe.

ldate_diff

Lower date difference (default is same day), negative numbers indicate a difference in the past, positive numbers indicate a difference in the future.

udate_diff

Upper date difference (default is same day), negative numbers indicate a difference in the past, positive numbers indicate a difference in the future.

matched_names

Boolean indicating whether to perform matching based on names.

cleaning

The cleaning option, which can be "drop_all" to remove non-valid IDs from df1.

Value

A list containing two data frames. The first data frame contains the IDs and dates at which the ID has been allocated in different columns. The second data frame contains the cleaned data.


Thaliehln/timci documentation built on April 8, 2024, 3:38 p.m.