Nothing
#' @title Update the correspondence table between statistical classifications A and B when A has been updated to version A*
#' @description Update the correspondence table between statistical classifications A and B when A has been updated to version A*.
#' @param A A string of the type \code{character} containing the name of a csv file that contains the original classification A.
#' @param B A string of the type \code{character} containing the name of a csv file that contains classification B.
#' @param AStar A string of the type \code{character} containing the name of a csv file that contains the updated version A*.
#' @param AB A string of the type \code{character} containing the name of a csv file that contains the previous correspondence table A:B.
#' @param AAStar A string of the type character containing the name of a csv file that contains the \emph{concordance table} A:A*,
#' which contains the mapping between the codes of the two versions of the classification.
#' @param CSVout The preferred name for the \emph{output csv files} that will contain the updated correspondence table and
#' information about the classifications involved. The valid values are \code{NULL} or strings of type \code{character}. If
#' the selected value is \code{NULL}, the default, no output file is produced. If the value is a string, then the output is
#' exported into two csv files whose names contain the provided name (see "Value" below).
#' @param Reference The reference classification among A and B. If a classification is the reference to the other, and
#' hence \emph{hierarchically superior} to it, each code of the other classification is expected to be mapped to at most one
#' code of the reference classification. The valid values are \code{"none"}, \code{"A"}, and \code{"B"}. If the selected
#' value is \code{"A"} or \code{"B"}, a "Review" flag column is included in the output (see "Explanation of the flags" below).
#' @param MismatchToleranceB The maximum acceptable proportion of rows in the updated correspondence table which contain no
#' code of the target classification B, among those which contain a code of A, of A*, or of both. The default value
#' is \code{0.2}. The valid values are real numbers in the interval [0, 1].
#' @param MismatchToleranceAStar The maximum acceptable proportion of rows in the updated correspondence table which contain
#' no code of the updated classification A*, among those which contain a code of A, of B, or of both. The default value
#' is \code{0.2}. The valid values are real numbers in the interval [0, 1].
#' @export
#' @details
#' File and file name requirements:
#' \itemize{
#' \item The files that correspond to arguments \code{A}, \code{B}, \code{AStar}, \code{AB}, \code{AAStar} must be
#' in \emph{csv format with comma as delimiter}. If full paths are not provided, then these files must be available
#' in the working directory. No two filenames provided must be identical.
#' \item If any of the two files where the output will be stored is read protected (for instance because it is open
#' elsewhere) an error message will be reported and execution will be halted.
#' }
#' Classification table requirements:
#' \itemize{
#' \item The files that correspond to arguments \code{A}, \code{B} and \code{AStar} must contain at least one column
#' and at least two rows. The first column contains the codes of the respective classification. The first row contains
#' column headers. The name of the first column is the name of the respective classification (e.g., "CN 2021").
#' \item The classification codes contained in a classification file (expected in its first column as mentioned above)
#' must be unique. No two identical codes are allowed in the column.
#' \item If any of the files that correspond to arguments \code{A}, \code{B} and \code{AStar} has additional columns
#' the first one of them is considered as containing the labels of the respective classification codes.
#' }
#' Correspondence and concordance table requirements:
#' \itemize{
#' \item The files that correspond to arguments \code{AB} and \code{AAStar} must contain at least two columns and at least
#' two rows. The first column of the file that corresponds to \code{AB} contains the codes of classification A. The second
#' column contains the codes of classification B. Similar requirements apply to the file that corresponds to \code{AAStar}.
#' The first row of each of these files contains column headers. The names of the first two columns are the names of the
#' respective classifications.
#' \item The pairs of classification codes contained in the concordance and the correspondence table files (expected in
#' their first two columns as mentioned above) must be unique. No two identical pairs of codes are allowed in the first
#' two columns.
#' }
#' Interdependency requirements:
#' \itemize{
#' \item At least one code of classification A must appear in both the file of concordance table A:A* and the file of
#' correspondence table A:B.
#' \item At least one code of classification A* must appear in both the file of
#' classification A* and the file of
#' concordance table A:A*.
#' \item At least one code of classification B must appear in both the file of classification B and the file of
#' correspondence table A:B.
#' }
#' Mismatch tolerance:
#' \itemize{
#' \item The ratio that is compared with \code{MismatchToleranceB} has as numerator the number of rows of the updated
#' correspondence table which contain a code for A, for A*, or for both, but no code for B and as denominator the number of
#' rows which contain a code for A, for A*, or for both (regardless of whether there is a code for B or not). If the ratio
#' exceeds \code{MismatchToleranceB} the execution of the function is halted.
#' \item The ratio that is compared with \code{MismatchToleranceAStar} has as numerator the number of rows of the updated
#' correspondence table which contain a code for A, for B, or for both, but no code for A* and as denominator the number of
#' rows which contain a code for A, for B*, or for both (regardless of whether there is a code
#' for A* or not). If the ratio exceeds \code{MismatchToleranceAStar} the execution of the function is halted.
#' }
#' If any of the conditions required from the arguments is violated an error message is produced and execution is stopped.
#'
#'
#' @return
#' \code{updateCorrespondenceTable()} returns a list with two elements, both of which are data frames.
#' \itemize{
#' \item The first element is the updated correspondence table A*:B augmented with flags "CodeChange", "Review" (if
#' applicable), "Redundancy", "NoMatchToAStar", "NoMatchToB", "NoMatchFromAStar", "NoMatchFromB", "LabelChange", and
#' with all the additional columns of the \code{A}, \code{B}, \code{AStar}, \code{AB} and \code{AAStar} files.
#' \item The second element contains the names of the original classification A, the target classification B, and the
#' updated version A*, as read from the top left-hand side cell of the respective input files.
#' \item If the value of argument \code{CSVout} is a string of type \code{character}, the elements of the list are
#' exported into files of csv format. The name of the file for the first element is the value of argument \code{CSVout}
#' and the name of the file for the second element is classificationNames_\code{CSVout}. For example, if
#' \code{CSVout} = "updateCorrespondenceTable.csv", the elements of the list are exported into
#' "updateCorrespondenceTable.csv" and "classificationNames_updateCorrespondenceTable.csv", respectively.
#' }
#'
#' @section Explanation of the flags:
#'
#' \itemize{
#' \item For each row of the updated correspondence table, the value of "CodeChange" is equal to \code{1} if the code of
#' A contained in this row maps -in this or any other row of the table- to a different code of A*, and \code{0} otherwise.
#' The value of "CodeChange" is empty if either the code of A, or the code of A*, or both are missing.
#' \item The "Review" flag is produced only if argument \code{Reference} has been set equal to "\code{A}" or "\code{B}".
#' For each row of the updated correspondence table, if \code{Reference} = "\code{A}" the value of "Review" is equal to
#' \code{1} if the code of B maps to more than one code of A*, and \code{0} otherwise. If \code{Reference} = "\code{B}" the
#' value of "Review" is equal to \code{1} if the code of A* maps to more than one code of B, and \code{0} otherwise. The value
#' of the flag is empty if either the code of A*, or the code of B, or both are missing.
#' \item For each row of the updated correspondence table, the value of "Redundancy" is equal to \code{1} if the row contains
#' a combination of codes of A* and B that also appears in at least one other row of the updated correspondence table. The
#' value of the flag is empty if both the code of A* and the code of B are missing.
#' \item For each row of the updated correspondence table, the value of "NoMatchToAStar" is equal to \code{1} if there is a
#' code for A, for B, or for both, but no code for A*. The value of the flag is \code{0} if there are codes for both A and
#' A* (regardless of whether there is a code for B or not). Finally, the value of "NoMatchToAStar" is empty if neither A nor B
#' have a code in this row.
#' \item For each row of the updated correspondence table, the value of "NoMatchToB" is equal to \code{1} if there is a code
#' for A, for A*, or for both, but no code for B. The value of the flag is \code{0} if there are codes for both A and B
#' (regardless of whether there is a code for A* or not). Finally, the value of "NoMatchToB" is empty if neither A nor
#' A* have a code in this row.
#' \item For each row of the updated correspondence table, the value of "NoMatchFromAStar" is equal to \code{1} if the row
#' contains a code of A* that appears in the table of classification A* but not in the concordance table A:A*. The value of
#' the flag is \code{0} if the row contains a code of A* that appears in both the table of classification
#' A* and the concordance table A:A*. Finally, the value of the flag is empty if the row contains no code of A* or if it
#' contains a code of A* that appears in the concordance table A:A* but not in the table of classification A*.
#' \item For each row of the updated correspondence table, the value of "NoMatchFromB" is equal to \code{1} if the row
#' contains a code of B that appears in the table of classification B but not in the correspondence table A:B. The value of
#' the flag is \code{0} if the row contains a code of B that appears in both the table of classification B and the
#' correspondence table A:B. Finally, the value of the flag is empty if the row contains no code of B or if it contains a code
#' of B that appears in the correspondence table A:B but not in the table of classification B.
#' \item For each row of the updated correspondence table, the value of "LabelChange" is equal to \code{1} if the labels of
#' the codes of A and A* are different, and \code{0} if they are the same. Finally, the value of "LabelChange" is empty if
#' either of the labels, or both labels, are missing. Lower and upper case are considered the same, and punctuation characters
#' are ignored when comparing code labels.
#' }
#' @section Sample datasets included in the package:
#'
#' Running \code{browseVignettes("correspondenceTables")} in the console opens an html page in the user's default browser.
#' Selecting HTML from the menu, users can read information about the use of the sample datasets that are included in the
#' package.
#' If they wish to access the csv files with the sample data, users have two options:
#' \itemize{
#' \item Option 1: Unpack into any folder of their choice the tar.gz file into which the package has arrived. All sample datasets
#' may be found in the "inst/extdata" subfolder of this folder.
#' \item Option 2: Go to the "extdata" subfolder of the folder in which the package has been installed in their PC's \code{R}
#' library. All sample datasets may be found there.
#' }
#' @examples
#' {
#' ## Application of function updateCorrespondenceTable() with NAICS 2017 being the
#' ## original classification A, NACE being the target classification B, NAICS 2022
#' ## being the updated version A*, NAICS 2017:NACE being the previous correspondence
#' ## table A:B, and NAICS 2017:NAICS 2022 being the A:A* concordance table. The desired
#' ## name for the csv file that will contain the updated correspondence table is
#' ## "updateCorrespondenceTable.csv", there is no reference classification, and the
#' ## maximum acceptable proportions of unmatched codes between the original
#' ## classification A and the target classification B, and between the original
#' ## classification A and the updated classification A* are 0.5 and 0.3, respectively.
#'
#' tmp_dir<-tempdir()
#' A <- system.file("extdata", "NAICS2017.csv", package = "correspondenceTables")
#' AStar <- system.file("extdata", "NAICS2022.csv", package = "correspondenceTables")
#' B <- system.file("extdata", "NACE.csv", package = "correspondenceTables")
#' AB <- system.file("extdata", "NAICS2017_NACE.csv", package = "correspondenceTables")
#' AAStar <- system.file("extdata", "NAICS2017_NAICS2022.csv", package = "correspondenceTables")
#'
#' UPC <- updateCorrespondenceTable(A,
#' B,
#' AStar,
#' AB,
#' AAStar,
#' file.path(tmp_dir,"updateCorrespondenceTable.csv"),
#' "none",
#' 0.5,
#' 0.3)
#'
#' summary(UPC)
#' head(UPC$updateCorrespondenceTable)
#' UPC$classificationNames
#' csv_files<-list.files(tmp_dir, pattern = ".csv")
#' if (length(csv_files)>0) unlink(csv_files)
#' }
updateCorrespondenceTable <- function(A, B, AStar, AB, AAStar, CSVout = NULL, Reference = "none",
MismatchToleranceB = 0.2, MismatchToleranceAStar = 0.2) {
# Check if files exist in working directory
test.names <- as.character(c(A, B, AStar, AB, AAStar))
if (!all(file.exists(test.names))) {
for (i in which(file.exists(test.names) == FALSE)) {
stop(simpleError(paste("There is no file with name", test.names[i], "in your working directory.")))
}
}
if (length(unique(test.names)) != 5) {
stop(simpleError("At least two of your filenames are the same."))
}
# Check CSVout
if (!is.null(CSVout)) {
while (file.exists(CSVout)) {
message(paste("Your working directory contains already a file with the name that you selected for the output file: ",
CSVout))
answer <- utils::menu(c("Yes", "No"), title = "Do you want to overwrite it?")
if (answer == 2) {
CSVout <- readline(prompt = "Please enter a new name for the output file: ")
}
if (answer == 1) {
break
}
}
}
# Check Reference
if (!(Reference %in% c("A", "B", "none"))) {
stop(simpleError("You entered a non-allowed value for Reference. The allowed values are \"A\", \"B\" and \"none\"."))
}
# Check MismatchToleranceB
if (is.character(MismatchToleranceB) || MismatchToleranceB < 0 || MismatchToleranceB >
1) {
stop(simpleError("You entered a non-allowed value for MismatchToleranceB. The allowed values are numbers in the interval [0, 1]."))
}
# Check MismatchToleranceAStar
if (is.character(MismatchToleranceAStar) || MismatchToleranceAStar < 0 || MismatchToleranceAStar >
1) {
stop(simpleError("You entered a non-allowed value for MismatchToleranceAStar. The allowed values are numbers in the interval [0, 1]."))
}
removeBOM <- function(headers) {
gsub("\\xef\\xbb\\xbf", "", headers, useBytes = T)
}
# The following code lines read the classifications A, AStar, and B.
classA <- utils::read.csv(A, sep = ",", header = TRUE, check.names = FALSE, colClasses = c("character"),
encoding = "UTF-8")
colnames(classA) <- removeBOM(colnames(classA))
classAStar <- utils::read.csv(AStar, sep = ",", header = TRUE, check.names = FALSE,
colClasses = c("character"), encoding = "UTF-8")
colnames(classAStar) <- removeBOM(colnames(classAStar))
classB <- utils::read.csv(B, sep = ",", header = TRUE, check.names = FALSE, colClasses = c("character"),
encoding = "UTF-8")
colnames(classB) <- removeBOM(colnames(classB))
# The following code lines read the correspondence tables AAStar and AB.
corrAAStar <- utils::read.csv(AAStar, sep = ",", header = TRUE, check.names = FALSE,
colClasses = c("character"), encoding = "UTF-8")
colnames(corrAAStar) <- removeBOM(colnames(corrAAStar))
corrAB <- utils::read.csv(AB, sep = ",", header = TRUE, check.names = FALSE, colClasses = c("character"),
encoding = "UTF-8")
colnames(corrAB) <- removeBOM(colnames(corrAB))
# The correspondence tables without codes in A
if (length(which(corrAAStar[, 1] == "" & corrAAStar[, 2] != "")) >= 1) {
NoCorrAAStar <- corrAAStar[-which(corrAAStar[, 1] == "" & corrAAStar[, 2] !=
""), ]
} else {
NoCorrAAStar <- corrAAStar
}
if (length(which(corrAB[, 1] == "" & corrAB[, 2] != "")) >= 1) {
NoCorrAB <- corrAB[-which(corrAB[, 1] == "" & corrAB[, 2] != ""), ]
} else {
NoCorrAB <- corrAB
}
# Check the dimensions of the files
test.dimClass <- list()
test.dimClass[[1]] = classA
test.dimClass[[2]] = classB
test.dimClass[[3]] = classAStar
for (i in 1:3) {
if (ncol(test.dimClass[[i]]) < 1 || nrow(test.dimClass[[i]]) < 1) {
stop(simpleError(paste("File", test.names[i], "should have at least one column and two rows (including the row of headers).")))
}
}
test.dimCorr <- list()
test.dimCorr[[1]] = corrAB
test.dimCorr[[2]] = corrAAStar
for (i in 1:2) {
if (ncol(test.dimCorr[[i]]) <= 1 || nrow(test.dimCorr[[i]]) < 1) {
stop(simpleError(paste("File", test.names[i + 3], "should have at least two columns and two rows (including the row of headers).")))
}
}
# Check for unique entries in classifications.
for (i in 1:3) {
if (sum(duplicated(test.dimClass[[i]][, 1])) >= 1) {
stop(simpleError(paste("At least one code of ", colnames(test.dimClass[[i]])[1],
" appears more than once in file ", test.names[i], ". This is an error. Each code must appear only once in the file.",
sep = "")))
}
}
# Check for unique entries in correspondence tables.
for (i in 1:2) {
if (nrow(test.dimCorr[[i]][, 1:2]) != nrow(unique(test.dimCorr[[i]][, 1:2]))) {
stop(simpleError(paste("At least one pair of codes of ", colnames(test.dimCorr[[i]])[1],
" and ", colnames(test.dimCorr[[i]])[2], " appears more than once in file ",
test.names[i + 3], ". This is an error. Each pair of codes must appear only once in the file.",
sep = "")))
}
}
# Check for at least one match in classifications and correspondence
# tables.
if (sum(!is.na(match(classAStar[, 1], corrAAStar[, 2]))) == 0) {
stop(simpleError(paste("There is no code of ", colnames(classAStar)[1], " that appears in both ",
test.names[3], " and ", test.names[5], ". This is an error. The files should have at least one code of ",
colnames(classAStar)[1], " in common to allow the generation of the candidate correspondence table.",
sep = "")))
}
if (sum(!is.na(match(corrAAStar[, 1], corrAB[, 1]))) == 0) {
stop(simpleError(paste("There is no code of ", colnames(corrAAStar)[1], " that appears in both ",
test.names[4], " and ", test.names[5], ". This is an error. The files should have at least one code of ",
colnames(corrAAStar)[1], " in common to allow the generation of the candidate correspondence table.",
sep = "")))
}
if (sum(!is.na(match(classB[, 1], corrAB[, 2]))) == 0) {
stop(simpleError(paste("There is no code of ", colnames(classB)[1], " that appears in both ",
test.names[2], " and ", test.names[4], ". This is an error. The files should have at least one code of ",
colnames(classB)[1], " in common to allow the generation of the candidate correspondence table.",
sep = "")))
}
if (sum(!is.na(match(classA[, 1], corrAAStar[, 1]))) == 0) {
message(paste("WARNING: there is no code of ", colnames(classA)[1], " that appears in both ",
test.names[1], " and ", test.names[5], ". When the execution of the function is over, please check the files to ensure that this is not the result of a mistake in their preparation or declaration.",
sep = ""))
}
if (sum(!is.na(match(classA[, 1], corrAB[, 1]))) == 0) {
message(paste("WARNING: there is no code of ", colnames(classA)[1], " that appears in both ",
test.names[1], " and ", test.names[4], ". When the execution of the function is over, please check the files to ensure that this is not the result of a mistake in their preparation or declaration.",
sep = ""))
}
# The following if statement checks which classifications is the reference
# one (if any). Based on which classification is the reference one (if
# any), idx and idx.thres are created, in order to be used for the creation
# of the 'review' flag and 'label change' flag.
tryCatch({
if (Reference == "A") {
idx <- 2
idx.thres <- 7
} else if (Reference == "B") {
idx <- 1
idx.thres <- 7
} else if (Reference == "none") {
idx <- 3
idx.thres <- 6
}
# The codeChange function constructs the 'code change' flag.
codeChange <- function(X) {
diffX <- as.matrix(X[, 1:2])
Diff <- rep(0, nrow(diffX))
Diff[which(diffX[, 1] != diffX[, 2])] <- 1
difference <- data.frame(diffX, Diff)
dif1 <- difference[order(difference[, 1]), ]
dif1.values <- lapply(split(seq_along(dif1[, 1]), dif1[, 1]), function(x) {
dif1[x, 3]
})
dif1 <- cbind(dif1, unlist(mapply(rep, lapply(dif1.values, max), lapply(dif1.values,
length))))
dif2 <- difference[order(difference[, 2]), ]
dif2.values <- lapply(split(seq_along(dif2[, 2]), dif2[, 2]), function(x) {
dif2[x, 3]
})
dif2 <- cbind(dif2, unlist(mapply(rep, lapply(dif2.values, max), lapply(dif2.values,
length))))
dif2.new <- dif2[order(dif2[, 1]), ]
final.diff <- cbind(dif1, dif2.new[, 4])
final.diff <- cbind(dif1, dif2.new[, 4], apply(final.diff[, 4:5], 1,
max))
return(final.diff)
}
# The review function constructs the 'review' flag.
review <- function(X, Y) {
X1 <- X[!is.na(match(X[, 1], Y[, 1])), ]
Y1 <- Y[!is.na(match(Y[, 1], X[, 1])), ]
X1 <- X1[order(X1[, 1]), ]
Y1 <- Y1[order(Y1[, 1]), ]
x1 <- unlist(lapply(split(seq_along(X1[, 1]), X1[, 1]), length))
y1 <- unlist(lapply(split(seq_along(Y1[, 1]), Y1[, 1]), length))
Review <- data.frame(X1[rep(1:nrow(X1), rep(y1, x1)), 1:2], Y1[unlist(rep(split(seq_along(Y1[,
1]), Y1[, 1]), x1)), 2], 0)
colnames(Review) <- c("R1", "R2", "R3", "R4")
q1 <- 0
if (length(which(is.na(match(Y[, 1], X[, 1])) == TRUE)) >= 1) {
Y2 <- matrix(unlist(Y[is.na(match(Y[, 1], X[, 1])), 1:2]), ncol = 2)
Y2 <- cbind(Y2[, 1], rep("", nrow(Y2)), Y2[, 2], 0)
colnames(Y2) <- c("R1", "R2", "R3", "R4")
q1 <- 1
}
q2 <- 0
if (length(which(is.na(match(X[, 1], Y[, 1])) == TRUE)) >= 1) {
YY2 <- matrix(unlist(X[is.na(match(X[, 1], Y[, 1])), 1:2]), ncol = 2)
YY2 <- cbind(YY2, rep("", nrow(YY2)), 0)
colnames(YY2) <- c("R1", "R2", "R3", "R4")
q2 <- 1
}
if (q1 == 1) {
Review <- rbind(Review, Y2)
}
if (q2 == 1) {
Review <- rbind(Review, YY2)
}
Review <- Review[!duplicated(Review[, c(1:3)]), ]
F1 <- Review[apply(Review, 1, function(x) {
length(which(x == ""))
}) == 0, ]
F2 <- Review[apply(Review, 1, function(x) {
length(which(x == ""))
}) >= 1, ]
f <- stats::aggregate(unique(F1[, 2:3])[, idx], list(num = unique(F1[, 2:3])[,
idx]), length)[which(stats::aggregate(unique(F1[, 2:3])[, idx], list(num = unique(F1[,
2:3])[, idx]), length)[, 2] > 1), 1]
F1[which(F1[, idx + 1] %in% f), 4] <- 1
Review <- rbind(F1, F2)
ChangeReview <- data.frame(Review[, 1:3], 0, Review[, 4])
ChangeReview[which(ChangeReview[, 1] %in% unique(codeChange(X)[which(codeChange(X)[,
6] == 1), 1])), 4] <- 1
ChangeReview[which(ChangeReview[, 2] == ""), 4] <- ""
return(ChangeReview)
}
# From this point, the 'redundancy' flag starts to be constructed.
X1 <- NoCorrAAStar[!is.na(match(NoCorrAAStar[, 1], NoCorrAB[, 1])), ]
Y1 <- NoCorrAB[!is.na(match(NoCorrAB[, 1], NoCorrAAStar[, 1])), ]
X1 <- X1[order(X1[, 1]), ]
Y1 <- Y1[order(Y1[, 1]), ]
x1 <- unlist(lapply(split(seq_along(X1[, 1]), X1[, 1]), length))
y1 <- unlist(lapply(split(seq_along(Y1[, 1]), Y1[, 1]), length))
Redun <- data.frame(X1[rep(1:nrow(X1), rep(y1, x1)), 1:2], Y1[unlist(rep(split(seq_along(Y1[,
1]), Y1[, 1]), x1)), 2], 0)
colnames(Redun) <- c("R1", "R2", "R3", "R4")
q1 <- 0
if (length(which(is.na(match(NoCorrAB[, 1], NoCorrAAStar[, 1])) == TRUE)) >=
1) {
Y2 <- matrix(unlist(NoCorrAB[is.na(match(NoCorrAB[, 1], NoCorrAAStar[,
1])), 1:2]), ncol = 2)
Y2 <- cbind(Y2[, 1], rep("", nrow(Y2)), Y2[, 2], 0)
colnames(Y2) <- c("R1", "R2", "R3", "R4")
q1 <- 1
}
q2 <- 0
if (length(which(is.na(match(NoCorrAAStar[, 1], NoCorrAB[, 1])) == TRUE)) >=
1) {
YY2 <- matrix(unlist(NoCorrAAStar[is.na(match(NoCorrAAStar[, 1], NoCorrAB[,
1])), 1:2]), ncol = 2)
YY2 <- cbind(YY2, rep("", nrow(YY2)), 0)
colnames(YY2) <- c("R1", "R2", "R3", "R4")
q2 <- 1
}
if (q1 == 1) {
Redun <- rbind(Redun, Y2)
}
if (q2 == 1) {
Redun <- rbind(Redun, YY2)
}
Redun <- Redun[!duplicated(Redun[, c(1:3)]), ]
F1 <- Redun[apply(Redun, 1, function(x) {
length(which(x == ""))
}) == 0, ]
F2 <- Redun[apply(Redun, 1, function(x) {
length(which(x == ""))
}) >= 1, ]
if (idx != 3) {
# The 'redundancy' flag is constructed if the reference
# classification is either A or B.
f1 <- stats::aggregate(F1[, 2:3], by = F1[, 2:3], length)[1:(ncol(F1[, 2:3]) +
1)][which(stats::aggregate(F1[, 2:3], by = F1[, 2:3], length)[1:(ncol(F1[,
2:3]) + 1)][, 3] >= 2), 1:2]
F1[which(apply(F1[, 2:3], 1, paste, collapse = " ") %in% apply(f1, 1,
paste, collapse = " ")), 4] <- 1
Redundancy <- rbind(F1, F2)
ChangeReviewRedundancy <- data.frame(review(NoCorrAAStar, NoCorrAB),
Redundancy[, 4])
ChangeReviewRedundancy <- ChangeReviewRedundancy[order(ChangeReviewRedundancy[,
2], ChangeReviewRedundancy[, 3], decreasing = FALSE), ]
List <- cbind(ChangeReviewRedundancy, 0, 0)
List[which(ChangeReviewRedundancy[, 2] == ""), 7] <- 1
List[which(ChangeReviewRedundancy[, 3] == ""), 8] <- 1
colnames(List) <- c(colnames(classA[1]), colnames(classAStar[1]), colnames(classB[1]),
"CodeChange", "Review", "Redundancy", "NoMatchToAStar", "NoMatchToB")
} else {
# The 'redundancy' flag is constructed if none of classifications A
# and B is the reference one.
if (nrow(F2) >= 1) {
F2 <- cbind(F2, 0)
colnames(F2) <- c("R1", "R2", "R3", "R4", "R5")
}
f1 <- stats::aggregate(F1[, 2:3], by = F1[, 2:3], length)[1:(ncol(F1[, 2:3]) +
1)][which(stats::aggregate(F1[, 2:3], by = F1[, 2:3], length)[1:(ncol(F1[,
2:3]) + 1)][, 3] >= 2), 1:2]
F1 <- data.frame(F1[, 1:3], 0, 0)
F1[which(apply(F1[, 2:3], 1, paste, collapse = " ") %in% apply(f1, 1,
paste, collapse = " ")), 5] <- 1
colnames(F1) <- c("R1", "R2", "R3", "R4", "R5")
ChangeRedundancy <- rbind(F1, F2)
ChangeRedundancy[which(ChangeRedundancy[, 1] %in% unique(codeChange(NoCorrAAStar)[which(codeChange(NoCorrAAStar)[,
6] == 1), 1])), 4] <- 1
ChangeRedundancy[which(ChangeRedundancy[, 2] == ""), 4] <- ""
ChangeRedundancy <- ChangeRedundancy[order(ChangeRedundancy[, 2], ChangeRedundancy[,
3], decreasing = FALSE), ]
List <- cbind(ChangeRedundancy, 0, 0)
List[which(ChangeRedundancy[, 2] == ""), 6] <- 1
List[which(ChangeRedundancy[, 3] == ""), 7] <- 1
colnames(List) <- c(colnames(classA[1]), colnames(classAStar[1]), colnames(classB[1]),
"CodeChange", "Redundancy", "NoMatchToAStar", "NoMatchToB")
}
# Final table and final flags
NoMatchFromAStar <- rep("", nrow(List))
NoMatchFromB <- rep("", nrow(List))
List <- cbind(List, NoMatchFromAStar, NoMatchFromB)
inA1 <- which(is.na(match(classA[, 1], corrAAStar[, 1])) == TRUE)
inA2 <- which(is.na(match(classA[, 1], corrAB[, 1])) == TRUE)
inA <- intersect(inA1, inA2)
if (length(inA) >= 1) {
InA <- cbind(matrix(classA[inA, 1], length(inA), 1), matrix("", length(inA),
2), matrix("", length(inA), idx.thres - 2))
InA <- cbind(InA, matrix("", length(inA), 2))
InA <- data.frame(InA)
colnames(InA) <- colnames(List)
List <- rbind(List, InA)
}
inAStar <- which(is.na(match(classAStar[, 1], corrAAStar[, 2])) == TRUE)
if (length(inAStar) >= 1) {
InAStar <- cbind(matrix("", length(inAStar), 1), matrix(classAStar[inAStar,
1], length(inAStar), 1), matrix("", length(inAStar), idx.thres -
1))
InAStar <- cbind(InAStar, matrix("", length(inAStar), 2))
InAStar <- data.frame(InAStar)
colnames(InAStar) <- colnames(List)
List <- rbind(List, InAStar)
}
noInA <- which(corrAAStar[, 1] == "" & corrAAStar[, 2] != "")
if (length(noInA) >= 1) {
NoInA <- cbind(matrix("", length(noInA), 1), matrix(corrAAStar[noInA,
2], length(noInA), 1), matrix("", length(noInA), idx.thres - 1))
NoInA <- cbind(NoInA, matrix("", length(noInA), 2))
NoInA <- data.frame(NoInA)
colnames(NoInA) <- colnames(List)
List <- rbind(List, NoInA)
}
inB <- which(is.na(match(classB[, 1], corrAB[, 2])) == TRUE)
if (length(inB) >= 1) {
InB <- cbind(matrix("", length(inB), 2), matrix(classB[inB, 1], length(inB),
1), matrix("", length(inB), idx.thres - 2))
InB <- cbind(InB, matrix("", length(inB), 2))
InB <- data.frame(InB)
colnames(InB) <- colnames(List)
List <- rbind(List, InB)
}
noInB <- which(corrAB[, 1] == "" & corrAB[, 2] != "")
if (length(noInB) >= 1) {
NoInB <- cbind(matrix("", length(noInB), 2), matrix(corrAB[noInB, 2],
length(noInB), 1), matrix("", length(noInB), idx.thres - 2))
NoInB <- cbind(NoInB, matrix("", length(noInB), 2))
NoInB <- data.frame(NoInB)
colnames(NoInB) <- colnames(List)
List <- rbind(List, NoInB)
}
# The final NoMatchFrom and NoMatchTo flags are created NoMatchFrom
yesAstarClass <- which(!is.na(match(List[, 2], classAStar[, 1])) == TRUE)
yesAstarCorr <- which(!is.na(match(List[, 2], corrAAStar[, 2])) == TRUE)
noAstarCorr <- which(is.na(match(List[, 2], corrAAStar[, 2])) == TRUE)
List$NoMatchFromAStar[intersect(yesAstarClass, yesAstarCorr)] <- 0
List$NoMatchFromAStar[intersect(yesAstarClass, noAstarCorr)] <- 1
yesBClass <- which(!is.na(match(List[, 3], classB[, 1])) == TRUE)
yesBCorr <- which(!is.na(match(List[, 3], corrAB[, 2])) == TRUE)
noBCorr <- which(is.na(match(List[, 3], corrAB[, 2])) == TRUE)
List$NoMatchFromB[intersect(yesBClass, yesBCorr)] <- 0
List$NoMatchFromB[intersect(yesBClass, noBCorr)] <- 1
# NoMatchTo
noA <- which(List[, 1] == "")
yesA <- which(List[, 1] != "")
noAstar <- which(List[, 2] == "")
yesAstar <- which(List[, 2] != "")
noB <- which(List[, 3] == "")
yesB <- which(List[, 3] != "")
List$NoMatchToAStar <- 1
List$NoMatchToAStar[intersect(intersect(noA, noB), yesAstar)] <- ""
List$NoMatchToAStar[intersect(intersect(yesA, yesAstar), noB)] <- 0
List$NoMatchToAStar[intersect(intersect(yesA, yesAstar), yesB)] <- 0
List$NoMatchToB <- 1
List$NoMatchToB[intersect(intersect(noA, noAstar), yesB)] <- ""
List$NoMatchToB[intersect(intersect(yesA, yesB), noAstar)] <- 0
List$NoMatchToB[intersect(intersect(yesA, yesAstar), yesB)] <- 0
# Final review flag
if ((Reference %in% c("A", "B"))) {
List$Review[which(List[, 2] == "")] <- ""
List$Review[which(List[, 3] == "")] <- ""
}
# Final redundancy flag
List$Redundancy <- 0
f1 <- stats::aggregate(List[, 2:3], by = List[, 2:3], length)[1:(ncol(List[, 2:3]) +
1)][which(stats::aggregate(List[, 2:3], by = List[, 2:3], length)[1:(ncol(List[,
2:3]) + 1)][, 3] >= 2), 1:2]
List$Redundancy[which(apply(List[, 2:3], 1, paste, collapse = " ") %in% apply(f1,
1, paste, collapse = " "))] <- 1
List$Redundancy[intersect(which(List[, 2] == ""), which(List[, 3] == ""))] <- ""
}, error = function(e) {
stop(simpleError("An error has occurred and execution needs to stop. Please check the input data."))
})
# The following if statement checks for the maximum acceptable proportion
# of unmatched codes between A and B, and between A and AAStar.
if (length(which(List[, idx.thres] == 1))/length(which(List[, idx.thres] != "")) >
MismatchToleranceAStar) {
StopAAStar <- 1
} else {
StopAAStar <- 0
}
if (length(which(List[, idx.thres + 1] == 1))/length(which(List[, idx.thres +
1] != "")) > MismatchToleranceB) {
StopAB <- 1
} else {
StopAB <- 0
}
if (StopAAStar == 1 && StopAB == 0) {
stop("The updated correspondence table (resulting from the joining of the concordance table A:A* with the correspondence table A:B) contains too many missing values for A*. \n Please review your input data or adjust the MismatchToleranceAStar parameter.")
} else if (StopAAStar == 0 && StopAB == 1) {
stop("The updated correspondence table (resulting from the joining of the concordance table A:A* with the correspondence table A:B) contains too many missing values for B. \n Please review your input data or adjust the MismatchToleranceB parameter.")
} else if (StopAAStar == 1 && StopAB == 1) {
stop("The updated correspondence table (resulting from the joining of the concordance table A:A* with the correspondence table A:B) contains too many missing values for A* and for B. \n Please review your input data or adjust the MismatchToleranceAStar and MismatchToleranceB parameters.")
} else if (StopAAStar == 0 && StopAB == 0) {
# The following if statements checks if there are any label as well as
# supplementary columns in the classifications A, AStar, B, and in the
# correspondence tables AAStar and AB, in order to include them in the
# final table.
tryCatch({
if (ncol(classA) >= 2) {
A1 <- as.matrix(classA[match(List[, 1], unlist(classA[, 1])), 2:ncol(classA)])
A1[is.na(A1)] <- ""
colnames(A1) <- paste(colnames(classA)[1], colnames(classA)[2:ncol(classA)],
sep = "_")
List <- cbind(List, A1)
}
if (ncol(classAStar) >= 2) {
A2 <- as.matrix(classAStar[match(List[, 2], unlist(classAStar[, 1])),
2:ncol(classAStar)])
A2[is.na(A2)] <- ""
colnames(A2) <- paste(colnames(classAStar)[1], colnames(classAStar)[2:ncol(classAStar)],
sep = "_")
List <- cbind(List, A2)
}
if (ncol(classB) >= 2) {
B1 <- as.matrix(classB[match(List[, 3], unlist(classB[, 1])), 2:ncol(classB)])
B1[is.na(B1)] <- ""
colnames(B1) <- paste(colnames(classB)[1], colnames(classB)[2:ncol(classB)],
sep = "_")
List <- cbind(List, B1)
}
if (ncol(corrAAStar) >= 3) {
AA1 <- as.matrix(corrAAStar[match(data.frame(t(List[, c(1, 2)])),
data.frame(t(corrAAStar[, 1:2]))), 3:ncol(corrAAStar)])
AA1[is.na(AA1)] <- ""
colnames(AA1) = paste(paste(colnames(corrAAStar)[1], colnames(corrAAStar)[2],
sep = " - "), colnames(corrAAStar)[3:ncol(corrAAStar)], sep = "_")
List <- cbind(List, AA1)
}
if (ncol(corrAB) >= 3) {
AB1 <- as.matrix(corrAB[match(data.frame(t(List[, c(1, 3)])), data.frame(t(corrAB[,
1:2]))), 3:ncol(corrAB)])
AB1[is.na(AB1)] <- ""
colnames(AB1) = paste(paste(colnames(corrAB)[1], colnames(corrAB)[2],
sep = " - "), colnames(corrAB)[3:ncol(corrAB)], sep = "_")
List <- cbind(List, AB1)
}
}, error = function(e) {
stop(simpleError("An error has occurred and execution needs to stop. Please check the input data."))
})
}
# The 'label change' flag is constructed.
tryCatch({
if (ncol(classA) >= 2 && ncol(classAStar) >= 2) {
LabelChange <- rep(1, nrow(List))
LabelChange[which(tolower(gsub("[[:punct:] ]+", " ", List[, idx.thres +
4])) == tolower(gsub("[[:punct:] ]+", " ", List[, idx.thres + 4 +
ncol(classA) - 1])))] <- 0
LabelChange[which(List[, idx.thres + 4] == "")] <- ""
LabelChange[which(List[, idx.thres + 4 + ncol(classA) - 1] == "")] <- ""
List <- cbind(List[, 1:(idx.thres + 3)], LabelChange, List[, (idx.thres +
4):ncol(List)])
}
# A data frame that contains the names of the classifications A, AStar,
# and B is constructed.
CsvNames <- data.frame(matrix(0, 3, 1))
CsvNames[1, 1] <- paste("A:", colnames(List)[1], sep = " ")
CsvNames[2, 1] <- paste("B:", colnames(List)[3], sep = " ")
CsvNames[3, 1] <- paste("AStar:", colnames(List)[2], sep = " ")
CsvNames <- data.frame(CsvNames)
pos <- regexpr("\\/[^\\/]*$", CSVout)
Name1 <- substr(CSVout, 1, pos[[1]])
Name2 <- substr(CSVout, pos[[1]] + 1, nchar(CSVout))
pos <- regexpr("\\.[^\\.]*$", Name2)
if (pos[[1]] == -1) {
Name <- substr(Name2, pos[[1]] + 1, nchar(Name2))
} else {
Name <- substr(Name2, 1, pos[[1]] - 1)
}
colnames(CsvNames) <- paste("Classification:", "Name", sep = " ")
Final <- apply(List, 2, function(x) {
gsub(" ", " ", x)
})
if (is.null(dim(Final))) {
Final <- t(data.frame(Final))
rownames(Final) <- 1
}
}, error = function(e) {
stop(simpleError("An error has occurred and execution needs to stop. Please check the input data."))
})
# The following if statement check if the user wants a .csv file of the
# final table. If yes, a .csv file that contains the names of the
# classifications A, AStar, and B is created as well.
tryCatch({
if (!is.null(CSVout)) {
data.table::fwrite(data.frame(Final, check.names = FALSE), file = CSVout, quote = TRUE)
utils::write.csv(CsvNames, file = paste0(Name1, "classificationNames_", Name2),
row.names = FALSE)
}
}, error = function(e) {
stop(simpleError("An error occurred while trying to write the output to the specified files. Please check the respective input parameters."))
})
# The following list contains the final table as well as the data frame
# that contains the names of the classifications A, AStar, and B in a list.
# This list is the output of the updateCorrespondenceTable function.
tryCatch({
FinalResults <- list()
FinalResults[[1]] <- data.frame(Final, check.names = FALSE, row.names = NULL)
FinalResults[[2]] <- CsvNames
names(FinalResults) <- c("updateCorrespondenceTable", "classificationNames")
return(FinalResults)
}, error = function(e) {
stop(simpleError("An error has occurred and execution needs to stop. Please check the input data."))
})
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.