View source: R/mergedatasetsbycase.R
MergeDataSetsByCase | R Documentation |
Merges multiple data sets by case where the data sets contain similar variables but different cases, e.g., data sets from different time periods.
MergeDataSetsByCase(
data.set.names,
merged.data.set.name = NULL,
auto.select.what.to.match.by = TRUE,
match.by.variable.names = TRUE,
match.by.variable.labels = TRUE,
match.by.value.labels = TRUE,
ignore.case = TRUE,
ignore.non.alphanumeric = TRUE,
min.match.percentage = 90,
variables.to.combine = NULL,
variables.to.not.combine = NULL,
variables.to.keep = NULL,
variables.to.omit = NULL,
include.merged.data.set.in.output = FALSE,
when.multiple.labels.for.one.value = "Create new values for the labels",
use.names.and.labels.from = "First data set",
data.sets.whose.variables.are.kept = seq_along(data.set.names),
min.value.label.match.percentage = 90
)
data.set.names |
A character vector of names of data sets from the Displayr cloud drive to merge (if run from Displayr) or file paths of local data sets. |
merged.data.set.name |
A character scalar of the name of the merged data set in the Displayr cloud drive (if run from Displayr) or the local file path of the merged data set. |
auto.select.what.to.match.by |
If TRUE, the metadata to match by is
chosen automatically, whereas if FALSE, the metadata to match by is
specified by setting the flags |
match.by.variable.names |
Logical scalar indicating whether to match using variable names. |
match.by.variable.labels |
Logical scalar indicating whether to match using variable labels. |
match.by.value.labels |
Logical scalar indicating whether to match using value labels of categorical variables. |
ignore.case |
Logical scalar indicating whether to ignore case when matching text (variable names and labels and value labels). |
ignore.non.alphanumeric |
Logical scalar indicating whether to ignore non-alphanumeric characters when matching text (variable names and labels and value labels) except when numeric characters appear both before and after non-alphanumeric characters e.g., "24 - 29", in which case the characters are still ignored but the separation between the numbers is noted. |
min.match.percentage |
A numeric scalar of a percentage (number from 0 to 100) which determines how close matches need to be in order for matches to be accepted. Applies to variable names and labels and value labels. |
variables.to.combine |
A character vector of comma-separated variable names indicating which variables are to appear together. Ranges of variables can be specified by separating variable names by '-'. Variables can be specified from specific data sets by appending '(x)' to the variable name where x is the data set index. |
variables.to.not.combine |
A character vector of comma-separated variable names specifying variables that should never be combined together. To specify variables from a specific data set, suffix variable names with the data set index in parentheses, e.g., 'Q2(3)'. |
variables.to.keep |
Character vector of variable names to keep in the merged data set. To specify variables from a specific data set, suffix the name with the data set index in parentheses, e.g., 'Q2(3)'. Ranges of variables can be specified by separating variable names by '-'. Wildcard matching of names is supported using the asterisk character '*'. This parameter is only useful when data.sets.whose.variables.are.kept is used (i.e., when variables are left out). |
variables.to.omit |
Character vector of variable names to omit from the merged data set. To specify variables from a specific data set, suffix the name with the data set index in parentheses, e.g., 'Q2(3)'. Ranges of variables can be specified by separating variable names by '-'. Wildcard matching of names is supported using the asterisk character '*'. |
include.merged.data.set.in.output |
A logical scalar which controls whether to include the merged data set in the output object, which can be used for diagnostic purposes in R. |
when.multiple.labels.for.one.value |
Character scalar that is either "Use label from preferred data set" or "Create new values for the labels". When the former is the case, the label from the earliest/latest data set will be chosen if use.names.and.labels.from is "First data set"/"Last data set". If the latter is the case, new values are generated for the extra labels. |
use.names.and.labels.from |
Character scalar that is either "First data set" or "Last data set". This sets the preference for either the first or last data set when choosing which names and labels to use in the merged data set. |
data.sets.whose.variables.are.kept |
An integer vector of indices of data sets where merged variables are only included if they contain input variables from these data sets. |
min.value.label.match.percentage |
Numeric scalar of the minimum percentage match for value labels to be considered the same when combining value attributes from different variables. |
A list with the following elements:
merged.data.set
If include.merged.data.set.in.output
,
is TRUE, this is a data frame of the merged data set.
input.data.sets.metadata
A list containing metadata on the
the input data sets such as variable names, labels etc. See the function
metadataFromDataSets
for more information.
merged.data.set.metadata
A list containing metadata on the
the merged data set such as variable names, labels etc. See the function
metadataFromDataSet
for more information.
matched.names
A character matrix whose rows correspond to the
variables in the merged data set. The elements in each row correspond to
the input data sets and contain the names of the variables from the input
data sets that have been combined together to create a merged variable.
This matrix also has the attributes "is.fuzzy.match" and "matched.by".
is.fuzzy.match is a logical matrix of the same size as matched.names
indicating if an input variable was matched using fuzzy matching.
matched.by is a character matrix of the same size as matched.names
containing the strings "Variable name", "Variable label", "Value label"
and "Manual" indicating what data was used to match an input variable or
if the variable was matched manually.
merged.names
A character vector containing the names of the
variables in the merged data set.
omitted.variable.names.list
A list whose elements correspond to the
input data sets. Each element is a character vector that contains the
names of variables from an input data set that have been omitted from the
merged data set.
input.value.attributes.list
A list whose elements correspond to the
variables in the merged data set. Each element is another list whose
elements correspond to the input data sets, which each of these elements
containing a named numeric vector representing the values and value labels
of a categorical input variable. This is NULL if the input variable is
not categorical.
is.saved.to.cloud
Logical scalar that indicates whether the
merged data set was saved to the Displayr cloud drive.
data.set.names <- c(system.file("examples", "cola1.sav", package = "flipData"),
system.file("examples", "cola2.sav", package = "flipData"),
system.file("examples", "cola5.sav", package = "flipData"),
system.file("examples", "cola8.sav", package = "flipData"))
print(MergeDataSetsByCase(data.set.names = data.set.names,
data.sets.whose.variables.are.kept = 1,
variables.to.combine = "Q4_A_3,Q4_A_3_new"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.