camr_SWA_linking_code: Link Records for School-Wide Assessment Data

View source: R/R12-SWA_linking_code.R

camr_SWA_linking_codeR Documentation

Link Records for School-Wide Assessment Data

Description

Function to link records (e.g., across different time points) using a set of linking items.

Usage

camr_SWA_linking_code(
  dtf_long,
  lst_link_across = NULL,
  obj_link_using = NULL,
  lst_link_combo = NULL,
  lst_ignore_nonmissing = NULL,
  chr_progress = "bar",
  lgc_matches_only = TRUE
)

Arguments

dtf_long

A data frame, must have a column with integer values for time points ('SSS.INT.Time_point') and the relevant columns for the linking items.

lst_link_across

A list of lists, with each sublist specifying 'Base' and 'Add' logical vectors for the pair of data subsets in dtf_long to link over (e.g., 'Base' would subset the first time point and 'Add' would subset the second time point). If NULL the functions infers all possible pairings over time points from the 'SSS.INT.Time_point' variable. If the 'Base' and 'Add' logical vectors are for the same subset, the function checks for duplicate records instead.

obj_link_using

Either a character vector with the column names for the linking items, or a list of character vectors, one vector for each set defined in lst_link_across. Passing a list with separate vectors allows using different linking items for different sets when necessary. If NULL assumes the standard set of linking items: SSS.INT.School.Code, IDX.INT.Origin.LASID, SBJ.FCT.Sex, SBJ.FCT.Link.BirthMonth, SBJ.FCT.Link.OlderSiblings, SBJ.FCT.Link.OlderSiblings, SBJ.FCT.Link.EyeColor, SBJ.FCT.Link.EyeColor, SBJ.FCT.Link.MiddleInitial, SBJ.CHR.Link.Streetname, and SBJ.INT.Link.KindergartenYearEst.

lst_link_combo

A list of lists, where each sublist consists of an integer vector indexing the combination of linking items to consider in order of priority. One sublist of integer vectors must be defined for each set defined by lst_link_across. For a given sublist, indices apply to the character vector defined for the relevant set in obj_link_using, meaning that if character vectors differ across sets, indices should be defined accordingly.

lst_ignore_nonmissing

A list of lists, similar to lst_link_combo, indicating items to ignore even if they are not missing when computing a dissimilarity score over a given combination (thereby allowing records to be linked even if some items do not match). If c() (the default) the function will not ignore non-missing mismatches.

chr_progress

A character string, used to specify how progress of the function is tracked. If 'section', prints the completed sections for the different parts of the linking process to the console window; if 'bar', a simple progress bar is shown on the console window (default); if '' no progress is displayed.

lgc_matches_only

A logical value; if TRUE only computes returns dissimilarity scores for confirmed matches (results in faster computation).

Value

A data frame.

Author(s)

Michael Pascale and Kevin Potter

Examples

# Linking across time points
dtf_demo <- camr_SWA_linking_code_simulate('demo')
dtf_demo_linked <- camr_SWA_linking_code(dtf_demo)

# Identifying duplicate records
dtf_dup <- camr_SWA_linking_code_simulate( 'duplicate' )
dtf_dup_linked <- camr_SWA_linking_code(
  dtf_dup,
  lst_link_across = list(
    DR2023F = list(
      Base = rep( TRUE, nrow(dtf_dup) ),
      Add = rep( TRUE, nrow(dtf_dup) )
    )
  )
)


rettopnivek/camrprojects documentation built on June 9, 2025, 4 p.m.