util_int_duplicate_content_segment: Check for duplicated content

View source: R/util_int_duplicate_content_segment.R

util_int_duplicate_content_segmentR Documentation

Check for duplicated content

Description

This function tests for duplicates entries in the data set. It is possible to check duplicated entries by study segments or to consider only selected segments.

Usage

util_int_duplicate_content_segment(
  level = c("segment"),
  study_segment,
  study_data,
  meta_data
)

Arguments

level

character a character vector indicating whether the assessment should be conducted at the study level (level = "dataframe") or at the segment level (level = "segment").

study_segment

vector the vector that contains the name of the identifier to be used in the assessment. For the study level, corresponds to the names of the different data frames. For the segment level, indicates the name of the segments.

study_data

data.frame the data frame that contains the measurements, mandatory.

meta_data

data.frame the data frame that contains metadata attributes of the study data, mandatory.

Value

a list with

  • SegmentData: data frame with the results of the quality check for duplicated entries

  • SegmentTable: data frame with selected duplicated entries check results, used for the data quality report.

  • Duplicates: vector with row indices of duplicated entries, if any, otherwise NULL.

See Also

Other integrity_indicator_functions: util_int_duplicate_content_dataframe(), util_int_duplicate_ids_dataframe(), util_int_duplicate_ids_segment(), util_int_unexp_records_set_dataframe(), util_int_unexp_records_set_segment()

Examples

## Not run: 
study_data <- readRDS(system.file("extdata", "ship.RDS", package = "dataquieR"))
meta_data <- readRDS(system.file("extdata", "ship_meta.RDS", package = "dataquieR"))

# Segment level
int_duplicate_content(
  level = "segment",
  study_segment = c("INTRO", "INTERVIEW"),
  study_data = study_data,
  meta_data = meta_data
)

# Studies or data frame level
study_tables <- list(
  "sd1" = readRDS(system.file("extdata", "ship.RDS", package = "dataquieR")),
  "sd2" = readRDS(system.file("extdata", "ship.RDS", package = "dataquieR"))
)

int_duplicate_content(
  level = "dataframe",
  study_segment = c("sd1", "sd2"),
  study_data = study_tables,
  meta_data = meta_data
)

## End(Not run)


dataquieR documentation built on May 29, 2024, 7:18 a.m.