MergeDataSetsByVariable: Merge Data Sets by Variable

View source: R/mergedatasetsbyvariable.R

MergeDataSetsByVariableR Documentation

Merge Data Sets by Variable

Description

Merges multiple data sets by combining variables, matching cases either using ID variables or by simply joining data sets side-by-side.

Usage

MergeDataSetsByVariable(
  data.set.names,
  merged.data.set.name = NULL,
  id.variables = NULL,
  include.or.omit.variables = rep("Include all variables except those manually omitted",
    length(data.set.names)),
  variables.to.include.or.omit = NULL,
  only.keep.cases.matched.to.all.data.sets = FALSE,
  include.merged.data.set.in.output = FALSE
)

Arguments

data.set.names

A character vector of names of data sets from the Displayr cloud drive to merge (if run from Displayr) or file paths of local data sets.

merged.data.set.name

A character scalar of the name of the merged data set in the Displayr cloud drive (if run from Displayr) or the local file path of the merged data set.

id.variables

A character vector of ID variable names corresponding to each data set. ID variables should generally contain unique IDs, but otherwise an ID can only be duplicated in at most one data set. The ID variable in the merged data set will use the name and label from the ID variable from the first input data set. NULL if ID variables are not used, in which case the input data sets are simply combined side-by-side, and the input data sets are required to have the same number of cases.

include.or.omit.variables

A character vector where each element corresponds to an input data set, and indicates whether variables from the input data set are to be specified in the merged data set by specifying the variables to include ("Only include manually specified variables") or the variables to omit ("Include all variables except those manually omitted").

variables.to.include.or.omit

A list of character vectors corresponding to each data set. Each element in a character vector contains comma-separated names of variables to include or omit (depending on the option for the data set in include.or.omit.variables). Ranges of variables can be specified by separating variable names by '-'. Wildcard matching of names is supported using the asterisk character '*'.

only.keep.cases.matched.to.all.data.sets

A logical scalar which controls whether to only keep cases if they are present in all data sets, and discard the rest.

include.merged.data.set.in.output

A logical scalar which controls whether to include the merged data set in the output object, which can be used for diagnostic purposes in R.

Value

A list of class MergeDataSetByVariable with the following elements:

  • merged.data.set If include.merged.data.set.in.output, is TRUE, this is a data frame of the merged data set.

  • input.data.sets.metadata A list containing metadata on the the input data sets such as variable names, labels etc. See the function metadataFromDataSets for more information.

  • merged.data.set.metadata A list containing metadata on the the merged data set such as variable names, labels etc. See the function metadataFromDataSet for more information.

  • source.data.set.indices An integer vector corresponding to the variables in the merged data set. Each element contains the index of the input data set from which the variable originated. The data set index for the ID variable will be 1 even though ID variables are present in all data sets when ID variables are specified.

  • omitted.variable.names.list A list whose elements correspond to the input data sets. Each element contains the names of variables from a data set that were omitted from the merged data set.

  • merged.id.variable.name A character scalar of the name of the ID variable in the merged data set. It is NULL if there is no ID variable.

  • id.variable.names A character vector corresponding to the input data sets. Each element is an ID variable name from an input data set.

  • example.id.values A character vector corresponding to the input data sets. Each element is an example ID value from an ID variable from an input data set.

  • is.saved.to.cloud A logical scalar indicating whether the merged data set was saved to the Displayr cloud drive.

Examples

path <- c(system.file("examples", "cola15.sav", package = "flipData"),
          system.file("examples", "cola16.sav", package = "flipData"))
print(MergeDataSetsByVariable(path, id.variables = c("Attr1","PartyID")))

NumbersInternational/flipData documentation built on March 2, 2024, 10:52 a.m.