#' Parallel coordinate plot for both numeric and categorical data
#'
#' The parallel coordinate plot displays multiple y-axes, and shows the observations across
#' several dimensions as ploi-lines. This function work well with both numeric and categorical
#' variables at the same time after proper scaling.
#'
#' \code{method} is a character string that denotes how to scale the variables
#' in the parallel coordinate plot. Options are named in the same way as the options in `ggparcoord` (GGally):
#' \itemize{
#' \item{\code{raw}}{: raw data used, no scaling will be done.}
#' \item{\code{std}}{: univariately, subtract mean and divide by standard deviation. To get values into a [0,1] interval we use a linear transformation of f(y) = y/4+0.5. }
#' \item{\code{robust}}{: univariately, subtract median and divide by median absolute deviation. To get values into a [0,1] interval we use a linear transformation of f(y) = y/4+0.5. }
#' \item{\code{uniminmax}}{: univariately, scale so the minimum of the variable is zero, and the maximum is one.}
#' \item{\code{globalminmax}}{: gobal scaling; the global maximum is mapped to 1,
#' global minimum across the variables is mapped to 0. }
#' }
#'
#' \code{overplot} is a character string that denotes how to conduct overplotting
#' in the parallel coordinate plot. The lines from \code{geom_pcp()} are drawn according to the order they shown in your data set in default.
#' Note that this argument provides a framework, the order in the original data still has a role in overplotting,
#' especially for lines outside factor blocks(for \code{hierarchical} only), plots with \code{resort} turned on(for methods except \code{hierarchical}):
#' \itemize{
#' \item{\code{original}}{: use the original order, first shown first drawn.}
#' \item{\code{hierarchical}}{: hierarchically drawn according to the combinations of levels of factor variables,
#' which will change according to different level structures of factor variables you provided.
#' This was done separately for each factor block. The right most factor variables have the largest weight across a sequence of factor variables,
#' the last level of a factor variable has the largest weight within a factor variable.
#' Groups of lines with larger weight will be drawn on top. Lines outside of factor blocks still use the original order, which is different from other methods.}
#' \item{\code{smallfirst}}{: smaller groups of lines are drawn first, placing large groups of lines on top.}
#' \item{\code{largefirst}}{: larger groups of lines are drawn first, placing small groups of lines on top.}
#' }
#' @param mapping Set of aesthetic mappings created by [aes()] or
#' [aes_()]. If specified and `inherit.aes = TRUE` (the
#' default), it is combined with the default mapping at the top level of the
#' plot. You must supply `mapping` if there is no plot mapping.
#' @param data The data to be displayed in this layer. There are three
#' options:
#'
#' If `NULL`, the default, the data is inherited from the plot
#' data as specified in the call to [ggplot()].
#'
#' A `data.frame`, or other object, will override the plot
#' data. All objects will be fortified to produce a data frame. See
#' [fortify()] for which variables will be created.
#'
#' A `function` will be called with a single argument,
#' the plot data. The return value must be a `data.frame`, and
#' will be used as the layer data.
#' @param geom The geometric object to use display the data
#' @param position Position adjustment, either as a string, or the result of
#' a call to a position adjustment function.
#' @param na.rm If `FALSE`, the default, missing values are removed with
#' a warning. If `TRUE`, missing values are silently removed.
#' @param show.legend logical. Should this layer be included in the legends?
#' `NA`, the default, includes if any aesthetics are mapped.
#' `FALSE` never includes, and `TRUE` always includes.
#' It can also be a named logical vector to finely select the aesthetics to
#' display.
#' @param inherit.aes If `FALSE`, overrides the default aesthetics,
#' rather than combining with them. This is most useful for helper functions
#' that define both data and aesthetics and shouldn't inherit behaviour from
#' the default plot specification, e.g. [borders()].
#' @param ... Other arguments passed on to [layer()]. These are
#' often aesthetics, used to set an aesthetic to a fixed value, like
#' `colour = "red"` or `size = 3`. They may also be parameters
#' to the paired geom/stat.
#' @param method string specifying the method that should be used for scaling the values
#' in a parallel coordinate plot (see Details).
#' @param freespace A number in 0 to 1 (excluded). The total gap space among levels within each factor variable
#' @param boxwidth A number or a numeric vector (length equal to the number of factor variables) for the widths of the boxes for each factor variable
#' @param rugwidth A number or a numeric vector (length equal to the number of numeric variables) for the widths of the rugs for numeric variable
#' @param interwidth A number or a numeric vector (length equal to the number of variables minus 1) for the width for the lines between every neighboring variables, either
#' a scalar or a vector.
#' @param resort A integer or a integer vector to indicate the positions of vertical axes inside (can't be the boundary of) a sequence of factors.
#' To break three or more factors into sub factor blocks,
#' and conduct resort at the axes. Makes the plot clearer for adjacent factor variables.
#' @param overplot methods used to conduct overplotting when overplotting becomes an issue.
#' @param reverse reverse the plot, useful especially when you want to reverse the structure in factor blocks,
#' i.e. to become more ordered from right to left
#' @import ggplot2
#' @importFrom dplyr %>% group_by ungroup arrange filter
#' @importFrom tidyr spread
#' @importFrom assertthat assert_that has_name
#' @export
stat_pcp <- function(mapping = NULL, data = NULL,
geom = "segment", position = "identity",
...,
freespace = 0.1,
method = "uniminmax",
boxwidth = 0,
rugwidth = 0,
interwidth = 1,
resort = NULL,
overplot = "hierarchical",
reverse = FALSE,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatPcp,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
method = method,
freespace = freespace,
boxwidth = boxwidth,
rugwidth = rugwidth,
interwidth = interwidth,
resort = resort,
overplot = overplot,
reverse = reverse,
...
)
)
}
#' @rdname stat_pcp
#' @export
StatPcp <- ggproto(
"StatPcp", Stat,
# required_aes = c("vars"), # vars is disappeared by the time we get to the stat
default_aes = ggplot2::aes(
width = 0.75, linetype = "solid", fontsize=5,
shape = 19, colour = "grey30",
size = .1, fill = "grey30", alpha = .8, stroke = 0.1,
linewidth=.1, weight = 1, method = "uniminmax"),
setup_data = function (data, params) {
#browser()
idx <- grep("x__", names(data))
names(data) <- gsub("x__[0-9]+__", "", names(data))
# x__labels__ works together with data.frame(..., check.names = TRUE) (default) to allow spaces in names, also won't add .1 .2 after same variables
# x__labels__ is just for x labels
x__labels__ <- names(data)[idx]
data <- data.frame(data, stringsAsFactors = TRUE)
data <- gather_pcp(data, idx)
data <- transform_pcp(data, method = params$method)
data$x__labels__ <- c(x__labels__, rep("nothing", times = nrow(data) - length(x__labels__)))
data
},
# want to figure out the number of observations
# want to figure out the number of different classer of the variables
### setup_params accept params from stat_pcp or geom_pcp?
setup_params = function(data, params) {
# browser()
params$freespace <- ifelse(is.null(params$freespace), 0.1, params$freespace)
# And other parameters
params
},
compute_layer = function(self, data, params, layout, ...) {
# browser()
# adjust function to avoid deleting all data
ggplot2:::check_required_aesthetics(
self$required_aes,
c(names(data), names(params)),
ggplot2:::snake_class(self)
)
# Trim off extra parameters
params <- params[intersect(names(params), self$parameters())]
scales <- layout$get_scales(data$PANEL[1])
layout$panel_scales_x <- list(xscale_pcp(data, params, layout, ...)) # only one scale overall - might need one for each panel
args <- c(list(data = quote(data), scales = quote(scales)), params)
gg <- ggplot2:::dapply(data, "PANEL", function(data) {
tryCatch(do.call(self$compute_panel, args), error = function(e) {
warning("Computation failed in `", ggplot2:::snake_class(self), "()`:\n",
e$message, call. = FALSE)
ggplot2:::new_data_frame()
})
})
gg
},
# want to calculate the parameters directly can be used for geom_segment and geom_ribbon
compute_panel = function(data, scales, method = "uniminmax", freespace = 0.1, boxwidth = 0,
rugwidth = 0 , interwidth = 1,
resort = NULL, overplot = "original", reverse = FALSE) {
# Input check and sensible warning
#browser()
# Data preparation: to convert the input data to the form we can directly use
# nobs, classpcp will also be used in other places, so they are kept
# If the input data is changed, we might need other ways to do the following part
# but these three values/outputs are required, should be *exact* the same things
# number of observations
obs_ids <- unique(data$id__)
nobs <- length(unique(data$id__))
# a vector to tell the class of variables
classpcp <- data$class[1 - nobs + (1:(nrow(data)/nobs))*nobs]
data_spread <- prepare_data(data, classpcp, nobs)
# at this time, data_spread is like the original data set, with columns properly defined
# assume numeric variables are properly scaled into 0-1
# several possible combinations: num to num, num to factor, factor to num, factor to factor
# we use the function: 'classify' here to do this classification
# if the names of orignal varibles have "id", something might be wrong. update, now "id__"
if (is.character(resort)) {
resort <- which(names(data_spread) %in% resort) - 1
}
# check if resort input is sensible
if(!is.null(resort)){
resort_check <- lapply(resort, FUN = function(x){
output <- !any(classpcp[c(x-1, x, x+1)] %in% c("numeric", "integer")) & (x>1) & (x<length(classpcp))
output
}) %>% unlist()
assertthat::assert_that(all(resort_check), msg = "resort should be interior positions of a sequence of factor variables, e.g. 2 for sequence 1:3")
}
classification <- classify(classpcp, resort = resort)
# for num to num, set up
if (!length(classification$num2num) == 0) {
# set up ystart, yend.(sometimes we plus one to adjust for ID column)
# for ystart of lines (seems we can use unlist to data.frame directly)
data_final_ystart_num2num <- unlist(data_spread[, classification$num2num + 1])
# for yend of lines
data_final_yend_num2num <- unlist(data_spread[, classification$num2num + 2])
# for xstart of lines
data_final_xstart_num2num <- rep(classification$num2num, each = nobs)
# for xend of lines
data_final_xend_num2num <- rep(classification$num2num + 1, each = nobs)
} else {
data_final_ystart_num2num <- NULL
data_final_yend_num2num <- NULL
data_final_xstart_num2num <- NULL
data_final_xend_num2num <- NULL
}
# for num to factor, set up
if (!length(classification$num2fac) == 0) {
# Here I want to treat the factor(categorical) variable as bands according to its levels,
# so I'm not going to treat it as several points. the end points uniformly distributed within each band
# I also want to order those end points landing on the bands somehow
# for ystart of lines(same as num2num, use unlist withour as.list first)
data_final_ystart_num2fac <- unlist(data_spread[, classification$num2fac + 1])
# for yend of lines
# first calculete the number of levels and number of observations landing in each level
nlevels_list <- lapply(data_spread[, classification$num2fac + 2, drop = FALSE],
FUN = function(x) list(nlevels = nlevels(x),
table = table(x)))
# uniformly assign space for each level and observations within each level
# inserted some space between every two levels here, called 'freespace' for the space in total
# obs_position is the postion assigned for factors
obs_position <- assign_fac(nlevels_list, nobs, freespace = freespace)
# for yend of lines, continued
# write another function to arrange the positions of the end
# points according to the ystart, and the order of data
data_final_yend_num2fac <- unlist(arrange_fac_by_ystart(data_spread,
start_position = classification$num2fac + 1,
end_position = classification$num2fac + 2,
obs_position = obs_position))
# for xstart of lines
data_final_xstart_num2fac <- rep(classification$num2fac, each = nobs)
# for xend of lines
data_final_xend_num2fac <- rep(classification$num2fac + 1, each = nobs)
} else {
data_final_ystart_num2fac <- NULL
data_final_yend_num2fac <- NULL
data_final_xstart_num2fac <- NULL
data_final_xend_num2fac <- NULL
}
# for factor to num, set up (this should be similar to num2fac)
if (!length(classification$fac2num) == 0) {
# need to do some adjustments to make the functions above more general and can be used here
# for xstart of lines
data_final_xstart_fac2num <- rep(classification$fac2num, each = nobs)
# for xend of lines
data_final_xend_fac2num <- rep(classification$fac2num + 1, each = nobs)
# for yend of lines
data_final_yend_fac2num <- unlist(data_spread[, classification$fac2num + 2])
# for ystart of lines (mimic the calculation to num2fac, be careful about the difference)
nlevels_list_2 <- lapply(data_spread[, classification$fac2num + 1, drop = FALSE],
FUN = function(x) list(nlevels = nlevels(x),
table = table(x)))
obs_position_2 <- assign_fac(nlevels_list_2, nobs, freespace = freespace)
# here arrange_fac_by_ystart, actually arranges fac (ystart) by yend
data_final_ystart_fac2num <- unlist(arrange_fac_by_ystart(data_spread,
start_position = classification$fac2num + 2,
end_position = classification$fac2num + 1,
obs_position = obs_position_2))
} else {
data_final_xstart_fac2num <- NULL
data_final_xend_fac2num <- NULL
data_final_yend_fac2num <- NULL
data_final_ystart_fac2num <- NULL
}
# we have to make sure those postions are consistent, which are on the same vertical axis, but shared by different pairs
# even if it is consistent(same), which I think is very likely ensured by our consitent method of dealing with variables
# we can still make some improvement above, to save some calculation to avoid twice calculation of same objects
# the only variables, we need to care are factor variables.
# for factor block, set up
# make use of the function to calculate level_range inside assign_fac(),
# and nlevel_lists as before when dealing with factors
# write a function for this part to assign and match the factors,
# we may first calculate a table of the possible combinations between every two factors, and then assign position
# with a constant freespace = 0.1, we make sure the lenghs of area taken are the same among factors
if (!length(classification$fac2fac) == 0) {
# some values needed
# to find the factor block(more than one factor together)
# produce continuous_fac for each factor_block
# 0622new: accomodate to new classification method, continuous_fac_all_raw
continuous_fac_all_raw <- as.vector(rbind(classification$fac2fac, classification$fac2fac + 1))
continuous_fac_all <- continuous_fac_all_raw[c(which(diff(continuous_fac_all_raw) != 0 & diff(continuous_fac_all_raw) != -1 ),
length(continuous_fac_all_raw))]
break_position <- c(0, which(diff(continuous_fac_all) != 1), length(continuous_fac_all))
continuous_fac_all_list <- lapply(1:(length(break_position) - 1), FUN = function(x) {
continuous_fac_all[(break_position[x] + 1):break_position[x + 1]]
})
# detect if there is a numeric variable prior to the factor block, after the factor block
start_fac2fac <- continuous_fac_all[break_position[-length(break_position)] + 1]
end_fac2fac <- continuous_fac_all[break_position[-1]]
# to identify which columns should be used to sort factor blocks
bywhich <- prepare_bywhich(start_fac2fac, classpcp)
if (is.na(bywhich[1])) {
start_position <- as.data.frame(matrix(rep(1:nobs, length(bywhich)), ncol = length(bywhich)))
} else {
start_position <- data_spread[,bywhich + 1,drop = FALSE]
}
# use Map to apply the function to every factor_block
arranged_fac_block <- Map(f = function(x, y) {
process_fac2fac(data_spread = data_spread,
continuous_fac = x,
start_position = y,
freespace = freespace,
nobs = nobs)},
continuous_fac_all_list,
as.data.frame(start_position))
# 0622new
# organize the output correctly into one
data_final_xstart_fac2fac <- unlist(lapply(arranged_fac_block,
FUN = function(x) x[[1]]$data_final_xstart_fac2fac))
data_final_ystart_fac2fac <- unlist(lapply(arranged_fac_block,
FUN = function(x) x[[1]]$data_final_ystart_fac2fac))
data_final_xend_fac2fac <- unlist(lapply(arranged_fac_block,
FUN = function(x) x[[1]]$data_final_xend_fac2fac))
data_final_yend_fac2fac <- unlist(lapply(arranged_fac_block,
FUN = function(x) x[[1]]$data_final_yend_fac2fac))
# also extract the bandid information
data_final_ystart_fac2fac_bandid <- unlist(lapply(arranged_fac_block,
FUN = function(x) x[[2]]$data_final_ystart_fac2fac_bandid))
data_final_yend_fac2fac_bandid <- unlist(lapply(arranged_fac_block,
FUN = function(x) x[[2]]$data_final_yend_fac2fac_bandid))
} else {
data_final_xstart_fac2fac <- NULL
data_final_ystart_fac2fac <- NULL
data_final_xend_fac2fac <- NULL
data_final_yend_fac2fac <- NULL
}
# calculations and modifications for break points in factor blocks
# There are some parts of those modifications above should be integrated to the original calculation for better performance
# is there possibility that this is problem can be sovled using same band assignment at once as in usual case, instead of recursively calculate that?
if ((!is.null(resort)) & (!length(classification$fac2fac) == 0)) {
# identify the groups of sub-factor blocks, labeled for which sub-factor block
sub_fac_block <- which(c(0, end_fac2fac) == c(start_fac2fac, 0))
diff_sub_fac_block <- c(0, which(diff(sub_fac_block) != 1), length(sub_fac_block))
sub_fac_block_list <- lapply(1:(length(diff_sub_fac_block) - 1), FUN = function(x) {
c(sub_fac_block[diff_sub_fac_block[x] + 1] - 1 ,
sub_fac_block[(diff_sub_fac_block[x] + 1):diff_sub_fac_block[x + 1]])
})
# To get the last variable positions and bandids within the first sub-factor block within a group of a sub-factor block
last_y <- lapply(1:length(sub_fac_block_list), FUN = function(x) {
# temp_yend, temp_yendid are used to save some coding for the long formular
temp_yend <- arranged_fac_block[[sub_fac_block_list[[x]][1]]][[1]][["data_final_yend_fac2fac"]]
temp_yendid <- arranged_fac_block[[sub_fac_block_list[[x]][1]]][[2]][["data_final_yend_fac2fac_bandid"]]
# length(temp_yend) = length(temp_yendid)
last_yend <- temp_yend[(length(temp_yend) - nobs + 1):length(temp_yend)]
last_yendid <- temp_yendid[(length(temp_yendid) - nobs + 1):length(temp_yendid)]
output <- list(last_yend = last_yend, last_yendid = last_yendid)
output
})
# use dirty "for loops", we should have a better way of applying this
# the following works for each sub-factors blocks group, we use Map to apply for each group
data_break_y <- Map(f = function(x, y) {
last_yend <- y$last_yend
last_yendid <- y$last_yendid
data_break_ystart <- vector()
data_break_yend <- vector()
data_break_yend_bandid <- vector()
# the following variables used to replace those values for factor block which were calculated before
# actually, we might save some calculation by adding some conditions on the previous calculation
data_replace_block_ystart <- vector()
data_replace_block_yend <- vector()
data_replace_block_xstart <- vector()
data_replace_block_xend <- vector()
for (i in x[-1]) {
# we need to save this ystart here
data_break_ystart <- c(data_break_ystart, last_yend)
# need some modificaation on process_fac2fac to take the exsiting bandid for numeric values into consideration
# and works for the current task for sub-factor blocks
# we shouldn't work on the first sub-factor block
arranged_fac_block_sub<- process_fac2fac(data_spread = data_spread,
continuous_fac = continuous_fac_all_list[[i]],
start_position = last_yend,
freespace = freespace,
nobs = nobs,
subfac = TRUE,
start_bandid = last_yendid)
# save some typing same as in calculate lasy_y
temp_yend <- arranged_fac_block_sub[[1]]$data_final_yend_fac2fac
temp_yendid <- arranged_fac_block_sub[[2]]$data_final_yend_fac2fac_bandid
last_yend <- temp_yend[(length(temp_yend) - nobs + 1):length(temp_yend)]
last_yendid <- temp_yendid[(length(temp_yendid) - nobs + 1):length(temp_yendid)]
# Here, we made a choice to only return the segments between the sub factor block
# Actually, we can return all the lines except the first sub-factor block(we can even also do this by using the original start_position first),
# in that case, we can use these calculation to replace the orignal factor block calculations to remove replicated calculations
# We might do that later, just be careful about the xstart, xend value, since it has 0
# we also don't return xstart, and xend, since they are calculated later
# When use process_fac2fac(subfac = TRUE), the returned xstart, xend positions are not correct, we can change this in that function
# or regenerate afterwards like what I will do in the following, since it is trivil calculation
# the followings are used to replace those values calculated for factor blocks before
data_replace_block_xstart <- c(data_replace_block_xstart, rep(continuous_fac_all_list[[i]][-length(continuous_fac_all_list[[i]])], each = nobs))
data_replace_block_xend <- c(data_replace_block_xend, rep(continuous_fac_all_list[[i]][-length(continuous_fac_all_list[[i]])], each = nobs) + 1)
data_replace_block_ystart <- c(data_replace_block_ystart, temp_yend[1:length(rep(continuous_fac_all_list[[i]][-length(continuous_fac_all_list[[i]])], each = nobs))])
data_replace_block_yend <- c(data_replace_block_yend, temp_yend[-(1:nobs)])
data_break_yend <- c(data_break_yend, temp_yend[1:nobs])
data_break_yend_bandid <- c(data_break_yend_bandid, temp_yendid[1:nobs])
}
data_break_y <- list(data_break_ystart = data_break_ystart,
data_break_yend = data_break_yend,
data_break_yend_bandid = data_break_yend_bandid,
data_replace_block_xstart = data_replace_block_xstart,
data_replace_block_xend = data_replace_block_xend,
data_replace_block_ystart = data_replace_block_ystart,
data_replace_block_yend = data_replace_block_yend)
data_break_y
},
sub_fac_block_list,
last_y)
replace_position_xstart <- unique(unlist(lapply(data_break_y, FUN = function(x) {
x$data_replace_block_xstart
})))
data_final_ystart_fac2fac[data_final_xstart_fac2fac %in% replace_position_xstart] <- unlist(lapply(data_break_y, FUN = function(x) {
x$data_replace_block_ystart
}))
data_final_yend_fac2fac[data_final_xstart_fac2fac %in% replace_position_xstart] <- unlist(lapply(data_break_y, FUN = function(x) {
x$data_replace_block_yend
}))
}
# Modification parts begin--------------------------------------------------
# Because a factor variable can only be ordered according to one side, we do the modification
# Because we arranged both num-fac and factor block separately, we need to overwirte the num-fac when the fac is actually a fac block
# some modification for num2fac_blcok (num2fac, fac is a factor block, more than one factor)
# detect those variables
# consider zero or multiple factor blocks
num2fac_block_relative <- which((classification$num2fac + 1) %in% classification$fac2fac)
num2fac_block_fac_relative <- which((unique(classification$fac2fac) -1) %in% classification$num2fac)
num2fac_change <- unlist(lapply(num2fac_block_relative, FUN = function(x) {
((x - 1)*nobs + 1):(x*nobs)
}))
num2fac_fac_input <- unlist(lapply(num2fac_block_fac_relative, FUN = function(x) {
((x - 1)*nobs + 1):(x*nobs)
}))
# make modification
data_final_yend_num2fac[num2fac_change] <- data_final_ystart_fac2fac[num2fac_fac_input]
# some modification for fac2num_blcok (fac2num, fac is a factor block, more than one factor)
# for multiple factor blocks, and multiple places to modify
# detect those variables
# consider zero or multiple factor blocks
fac2num_block_relative <- which(classification$fac2num %in% (classification$fac2fac + 1))
fac2num_change <- unlist(lapply(fac2num_block_relative, FUN = function(x) {
((x - 1)*nobs + 1):(x*nobs)
}))
fac2num_block_fac_relative <- which((unique(classification$fac2fac) + 1) %in% classification$fac2num)
fac2num_fac_input <- unlist(lapply(fac2num_block_fac_relative, FUN = function(x) {
((x - 1)*nobs + 1):(x*nobs)
}))
# make modification
data_final_ystart_fac2num[fac2num_change] <- data_final_yend_fac2fac[fac2num_fac_input]
# some other modification for fac2num (for num-fac-num case)
# we don't need to adjust fac2num for num-fac-num case, since it is adjusted by num2fac, to keep consistent
# so we need to modify it back
# detect those variables
fac2num_numfacnum <- classification$fac2num[classification$fac2num %in% (classification$num2fac + 1)]
fac2num_numfacnum_relative <- which(classification$fac2num %in% fac2num_numfacnum)
num2fac_numfacnum_relative <- which(classification$num2fac %in% (fac2num_numfacnum - 1))
if ((length(fac2num_numfacnum_relative) != 0)&(length(num2fac_numfacnum_relative) != 0)){
# for ystart of lines, keep it same as the num2fac result; this is the only thing that should be corrected
# data_final_ystart_fac2num[((fac2num_numfacnum_relative - 1)*nobs + 1):(fac2num_numfacnum_relative*nobs)] <-
# data_final_yend_num2fac[((num2fac_numfacnum_relative - 1)*nobs + 1):(num2fac_numfacnum_relative*nobs)]
if (length(fac2num_numfacnum_relative) != length(num2fac_numfacnum_relative)) stop("unexpected error in fac-num-fac case, please report/fix")
for (i in seq_along(fac2num_numfacnum_relative)) {
data_final_ystart_fac2num[((fac2num_numfacnum_relative[i] - 1)*nobs + 1):(fac2num_numfacnum_relative[i]*nobs)] <-
data_final_yend_num2fac[((num2fac_numfacnum_relative[i] - 1)*nobs + 1):(num2fac_numfacnum_relative[i]*nobs)]
}
}
# put everything together
# DO NOT change the order for num2num, num2fac, fac2num, fac2fac, they are used in the creation of id column of data_boxwidth
data_final_xstart <- c(data_final_xstart_num2num,
data_final_xstart_num2fac,
data_final_xstart_fac2num,
data_final_xstart_fac2fac)
data_final_xend <- c(data_final_xend_num2num,
data_final_xend_num2fac,
data_final_xend_fac2num,
data_final_xend_fac2fac)
data_final_ystart <- c(data_final_ystart_num2num,
data_final_ystart_num2fac,
data_final_ystart_fac2num,
data_final_ystart_fac2fac)
data_final_yend <- c(data_final_yend_num2num,
data_final_yend_num2fac,
data_final_yend_fac2num,
data_final_yend_fac2fac)
data_final <- data.frame(x = data_final_xstart,
xend = data_final_xend,
y = data_final_ystart,
yend = data_final_yend)
# This has different length from the original data coming to compute_panel
# interval length, boxwidth, rugwidth ajustment preparation
width_adjusted <- prepare_width_ajustment(classpcp, boxwidth, rugwidth, interwidth, reverse = reverse)
# adjust the data according to the adjusted position
data_boxwidth <- data_final
data_boxwidth$x <- width_adjusted$boxwidth_xend[data_boxwidth$x]
data_boxwidth$xend <- width_adjusted$boxwidth_xstart[data_boxwidth$xend]
# add parallel lines(segments, rugs) after adjusted for boxwidth
# add segments for boxes
segment_which <- which(!(data_final$x %in% resort))
segment_xstart <- data_final$x[segment_which]
data_segment_xstart <- width_adjusted$boxwidth_xstart[segment_xstart]
data_segment_xend <- width_adjusted$boxwidth_xend[segment_xstart]
data_segment_ystart <- data_final$y[segment_which]
data_segment_yend <- data_segment_ystart
# for the last variable seperately
data_segment_xstart <- c(data_segment_xstart, width_adjusted$boxwidth_xstart[rep(length(classpcp), nobs)])
data_segment_xend <- c(data_segment_xend, width_adjusted$boxwidth_xend[rep(length(classpcp), nobs)])
data_segment_ystart <- c(data_segment_ystart,
data_final[which(data_final$xend == length(classpcp)), "yend"])
data_segment_yend <- data_segment_ystart
if (!is.null(resort)) {
# lines for the break points
# a naive try, This is one choice(no sorted lines)
resort_which <- which(data_final$x %in% resort)
resort_xstart <- data_final$x[resort_which]
data_break_xstart <- width_adjusted$boxwidth_xstart[resort_xstart]
data_break_xend <- width_adjusted$boxwidth_xend[resort_xstart]
# we need to distinguish between the replicated variables, also used in above modifications
# is this safe?: using different selections, can they match each other(in this break case, they are all inside factor blocks, they should match)
# data_break_ystart <- data_final$yend[which(data_final$xend %in% resort)]
# data_break_yend <- data_final$y[resort_which]
# we need to make sure the the x, y, xend, yend match each other, now it is ensured by that the calculations in fac2fac related chuncks
data_break_ystart <- unlist(lapply(data_break_y, FUN = function(x) {
x$data_break_ystart
}))
data_break_yend <- unlist(lapply(data_break_y, FUN = function(x) {
x$data_break_yend
}))
data_break_yend_bandid <- unlist(lapply(data_break_y, FUN = function(x) {
x$data_break_yend_bandid
}))
}
# !!!Notice: the following reorder part, only ajusts the order of drawing(geom_segment draws according to the order in the data), and doesn't mean to affect anything else.
# To reorder the observations for factor blocks(seperate for each block) to address undesired overlap
# We need to work here, because the some parts(most parts to do modifications between different sections) rely on the orginal order
# We will reorder the main part first, and then the part inside resort, which used different line arrangement method.
# We are benefited from:
# 1. the work of keeping bandid(cluster id) for factor blocks
# 2. the fact that bandid is hiearchically assigned (some magic is used inside bandid() helper function)
# The resorts make the case more complex, we will see how it will finally work out.(how bandid is carried over resort?)
# we need to reorder the obs_id for those parts too later.
# parallel segments inside boxes are created very late! (after we have data_final, and is based on that)
assert_that(!is.null(overplot))
assert_that(overplot %in% c("original", "hierarchical", "smallfirst", "largefirst"))
if (overplot == "hierarchical") {
turn_on_off <- TRUE
} else {
turn_on_off <- FALSE
}
if ((!length(classification$fac2fac) == 0) & turn_on_off) {
# data_final_ystart_fac2fac_bandid and data_final_yend_fac2fac_bandid are actually same
data_final_ystart_fac2fac_split <- split(data_final_ystart_fac2fac, f = rep(1:(length(data_final_ystart_fac2fac)/nobs), each = nobs))
data_final_yend_fac2fac_split <- split(data_final_yend_fac2fac, f = rep(1:(length(data_final_ystart_fac2fac)/nobs), each = nobs))
data_final_ystart_fac2fac_bandid_split <- split(data_final_ystart_fac2fac_bandid, f = rep(1:(length(data_final_ystart_fac2fac)/nobs), each = nobs))
data_final_ystart_fac2fac <- unlist(Map(f = function(x, y) {
# order() is used here, potetial problem of dealing with ties, it seems to use a method like rank(ties.method = "first"), which we used before
# maybe we can use a rank based method to be consistent: order(rank(,ties.method = "first"))
# This should not be a problem anyway, since the order now is only for drawing segments
x <- x[order(y)]
},
data_final_ystart_fac2fac_split,
data_final_ystart_fac2fac_bandid_split
))
data_final_yend_fac2fac <- unlist(Map(f = function(x, y) {
# order() is used here, potetial problem of dealing with ties, it seems to use a method like rank(ties.method = "first"), which we used before
# maybe we can use a rank based method to be consistent: order(rank(,ties.method = "first"))
x <- x[order(y)]
},
data_final_yend_fac2fac_split,
data_final_ystart_fac2fac_bandid_split
))
# To adjust the id orders accordingly
# a little different calculation but same idea as above
obs_ids_fac2fac_split <- lapply(data_final_ystart_fac2fac_split, FUN = function(x) obs_ids)
obs_ids_fac2fac <- unlist(Map(f = function(x, y) {
# order() is used here, potetial problem of dealing with ties, it seems to use a method like rank(ties.method = "first"), which we used before
# maybe we can use a rank based method to be consistent: order(rank(,ties.method = "first"))
x <- x[order(y)]
},
obs_ids_fac2fac_split,
data_final_ystart_fac2fac_bandid_split
))
# The part for resort:
if(!is.null(resort)) {
data_break_ystart_split <- split(data_break_ystart, f = rep(1:(length(data_break_ystart)/nobs), each = nobs))
data_break_yend_split <- split(data_break_yend, f = rep(1:(length(data_break_ystart)/nobs), each = nobs))
data_break_yend_bandid_split <- split(data_break_yend_bandid, f = rep(1:(length(data_break_ystart)/nobs), each = nobs))
data_break_ystart <- unlist(Map(f = function(x, y) {
x <- x[order(y)]
},
data_break_ystart_split,
data_break_yend_bandid_split
))
data_break_yend <- unlist(Map(f = function(x, y) {
x <- x[order(y)]
},
data_break_yend_split,
data_break_yend_bandid_split
))
obs_ids_break_split <- lapply(data_break_ystart_split, FUN = function(x) obs_ids)
obs_ids_break <- unlist(Map(f = function(x, y) {
# order() is used here, potetial problem of dealing with ties, it seems to use a method like rank(ties.method = "first"), which we used before
# maybe we can use a rank based method to be consistent: order(rank(,ties.method = "first"))
x <- x[order(y)]
},
obs_ids_break_split,
data_break_yend_bandid_split
))
# Some preparation for later combination of all data, to insert correct ids
n_break <- length(data_break_ystart)
} else {
n_break <- 0
obs_ids_break <- NULL
}
# NEXT: to replace those values in data_boxwidth, because of the segment part, we have to make this detour now
# following calculation relies on the order we created data_final
# number of rows in fac2fac parts
n_fac2fac <- length(data_final_ystart_fac2fac)
data_boxwidth$y[(nrow(data_boxwidth) - n_fac2fac + 1):nrow(data_boxwidth)] <- data_final_ystart_fac2fac
data_boxwidth$yend[(nrow(data_boxwidth) - n_fac2fac + 1):nrow(data_boxwidth)] <- data_final_yend_fac2fac
} else {
# number of rows in fac2fac parts
n_fac2fac <- 0
obs_ids_fac2fac <- NULL
# for resort
n_break <- 0
obs_ids_break <- NULL
}
# To combine main part, resort, ordinary segments in boxes
if (!is.null(resort)) {
data_boxwidth <- data.frame(x = c(data_break_xstart, data_segment_xstart, data_boxwidth$x),
y = c(data_break_ystart, data_segment_ystart, data_boxwidth$y),
xend = c(data_break_xend, data_segment_xend, data_boxwidth$xend),
yend = c(data_break_yend, data_segment_yend, data_boxwidth$yend))
} else {
data_boxwidth <- data.frame(x = c(data_segment_xstart, data_boxwidth$x),
y = c(data_segment_ystart, data_boxwidth$y),
xend = c(data_segment_xend, data_boxwidth$xend),
yend = c(data_segment_yend, data_boxwidth$yend))
}
# data_boxwidth$id <- rep(obs_ids, times = nrow(data_boxwidth)/nobs)
# we have put fac2fac section to the very last of the data set, put break in the very beginning
data_boxwidth$id__ <- c(obs_ids_break, rep(obs_ids, times = (nrow(data_boxwidth) - n_fac2fac - n_break)/nobs), obs_ids_fac2fac)
datanames <- setdiff(names(data), c("name", "value", "level", "class", "value_text"))
# don't include the pcp specific variables - those are dealt with
data_boxwidth <- left_join(data_boxwidth, unique(data[,datanames]), by = "id__", suffix=c("",".2"))
# some of the reorder/arrange
if (overplot == "smallfirst") {
output_data <- data_boxwidth %>% group_by(group) %>% mutate(n = n()) %>% arrange(n)
}
if (overplot == "largefirst") {
output_data <- data_boxwidth %>% group_by(group) %>% mutate(n = n()) %>% arrange(desc(n))
}
if (overplot %in% c("original", "hierarchical")) {
output_data <- data_boxwidth
}
# reverse: to make the plot readable from left to right, especially for factor block design
# also adjusted the grids in scales.R and in the "prepare_width_ajustment" (should be adjustment though) part of this file
# also to improve the original output
output_data <- output_data %>% dplyr::filter(x != xend)
# no more needed
# if(reverse == TRUE){
# x_value <- levels(factor(output_data$x))
# xend_value <- levels(factor(output_data$xend))
#
# mirror_x <- factor(output_data$x)
# levels(mirror_x) <- rev(xend_value)
# mirror_x <- as.numeric(as.character(mirror_x))
#
# mirror_xend <- factor(output_data$xend)
# levels(mirror_xend) <- rev(x_value)
# mirror_xend <- as.numeric(as.character(mirror_xend))
#
# output_data$x <- mirror_x
# output_data$xend <- mirror_xend
#
# }
output_data
}
)
# used to identify the type of neighboring classes, return the position of the first one in a pair(like num-num, num-fac)
# for num to num, we use lines
# for num to factor, we use lines
# for factor to num, we use lines
# for factor to factor, we use ribbons
# 0622new: accomodate resort
classify <- function(classpcp, resort) {
classpcp <- as.numeric(classpcp %in% c("numeric", "integer"))
classpcp_diff <- diff(classpcp)
num2fac <- (1:(length(classpcp)-1))[classpcp_diff == -1]
fac2num <- (1:(length(classpcp)-1))[classpcp_diff == 1]
num2num <- (1:(length(classpcp)-1))[classpcp_diff == 0 & classpcp[-length(classpcp)] == TRUE]
fac2fac <- (1:(length(classpcp)-1))[classpcp_diff == 0 & classpcp[-length(classpcp)] == FALSE]
# resort come in, need to check resort are suitable
fac2fac <- sort(c(fac2fac, resort))
classification <- list(num2num = num2num,
num2fac = num2fac,
fac2num = fac2num,
fac2fac = fac2fac)
classification
}
# used to assign postion for the factors
# here is a long calculation formula
# defined another function inside this function, make sure they are correctly nested
assign_fac <- function(nlevels_list, nobs, freespace = 0.1) {
# adjustment offset for freespace
eachobs <- (1 - freespace)/nobs
level_range <- lapply(nlevels_list,
FUN = function(x) {
freespace_offset <- rep(freespace/(x$nlevels - 1), times = x$nlevels - 1)
freespace_offset_cum <- cumsum(freespace_offset)
c(0, rep(cumsum(eachobs*x$table), each = 2)[-2*x$nlevels]) +
c(0, 0, rep(freespace_offset_cum, each = 2))
})
# eachobs <- 1/nobs
# assign each level
# level_range <- lapply(nlevels_list,
# FUN = function(x) c(0, rep(cumsum(eachobs*x$table)[-x$nlevels], each = 2) +
# freespace/(x$nlevels-1)/2*rep(c(-1, 1), times = x$nlevels-1), 1))
# assign each obs
obs_position <- Map(f = function(x, y){
obs_position_each <- vector("list", length = length(x)/2)
for (i in 1:(length(x)/2)) {
obs_position_each[[i]] <- seq(from = x[2*i-1] + 0.5*eachobs,
to = x[2*i] - 0.5*eachobs,
length.out = y$table[i])
}
names(obs_position_each) <- names(y$table)
obs_position_each},
level_range, nlevels_list)
return(obs_position)
}
# used to arrange the factor positions properly to match the ystart, the same observation.
# start_position = classification$num2fac + 1, for num2fac. Just to make it a little more general.
# end_position = start_position + 1, for num2fac. For fac2num, it's different, be careful.
# start_position = classification$num2fac + 2, end_position = start_position - 1, for fac2num.
# start_position is the one we need to refer to, end_position is the one we actually adjust.
# make sure the levels of factors are used correctly here(match the data, match the order). How was it decided before?
arrange_fac_by_ystart <- function(data_spread, start_position, end_position, obs_position) {
# Map is usd here to deal with three lists in parallel
arranged_postion <- Map(f = function(y, x, z) {
# lapply uses like x[[i]] to extract sublist, assume Map is the same. And it works
# is it safe to use name here? it should work in my example, and it does work in my example
for (i in 1:length(z)) {
x <- replace(x,
list = which(y == names(z)[i]),
z[[i]][rank(x[y == names(z)[i]], ties.method="first")])
}
x
},
data_spread[ ,end_position, drop = FALSE],
data_spread[ ,start_position, drop = FALSE],
obs_position)
arranged_postion
}
# a function to assign box in each level in each factor
# fac_table is a summary of the exsiting combination of factors
# level_range_2 is calculated as level_range inside assign_fac to get postions of levels
# names_to_group is the names of those factors, for convenience
# 0622new: we don't actually need this
# assign_box <- function(fac_table, level_range_2, nlevels_list_con_fac, names_to_group) {
#
# box_position <- Map(f = function(x, y, z){
# box_proportion <- fac_table %>%
# group_by_(z) %>%
# mutate(proportion = Freq/sum(Freq)) %>%
# ungroup()
# box_position <- list()
# for(i in 1:(y$nlevels)){
# box_proportion_each <- box_proportion[box_proportion[z] == names(y$table)[i],]
# eachlevel <- x[c(2*i-1, 2*i)]
# eachbox <- eachlevel[1] + (eachlevel[2] - eachlevel[1])*cumsum(box_proportion_each$proportion)
# names(eachlevel) <- NULL
# box_position[[i]] <- c(eachlevel[1], eachbox)
# names(box_position)[i] <- names(y$table)[i]
# }
# box_position
# },
# level_range_2, nlevels_list_con_fac, names_to_group)
#
# box_position
# }
# calculate the bandid (all possible combinations of factors) for the data
# assign observations to different band
# continuous_fac is the position of factor block
bandid <- function(data_spread, continuous_fac, nobs, nlevels_list_con_fac) {
aa <- as.data.frame(lapply(data_spread[,continuous_fac + 1, drop = FALSE],
FUN = function(x) as.numeric(x) - 1))
dd <- vector()
for (i in 1:length(continuous_fac)) {
dd[i] <- nlevels_list_con_fac[[i]][[1]]
}
dd
as.matrix(aa)%*%c(1, cumprod(dd)[-length(continuous_fac)]) + 1
}
# used to arrange observation positions within each bandid by ystart
# the previous arrange_fac_by_ystart arrange observation positions with in each level
# to properly adjust the lines to avoid overlap, to match the positions in a factor to the numeric value
# within each level, arrange the positions according to bandid from small to big, which is consistent with box_position
# notice that, to accomadate no numeric variable case, I changed the stat_position to a data vector
# since this function is only used to one start column and one end column(by lapply and Map to generalize)
# start_position indicates the numeric variable in data_spread, to be adjusted by
# end_position indicates the first factor variable in a factor block, or the last one when adjust backward
# obs_position is the return value of assign_fac(nlevels_list_con_fac, nobs)
# names_to_group for convience
# can potentially works for each factor block, but used for one variable each time in process_fac2fac
# Added bandid for observations as part of output
arrange_fac_by_ystart_bandid <- function(data_spread, start_position, end_position,
obs_position, fac_table, names_to_group) {
# actually, we may not need Map, unless when we deal with more than one factor block
# but we do deal with it
arranged_position <- Map(f = function(y, x, z, l) {
x_bandid <- x
# i for each level
for (i in 1:length(z)) {
aa <- fac_table[fac_table[l]==names(z[i]), ]
aa$position_in_level <- cumsum(aa$Freq)
bb <- aa$position_in_level
# j for each box(smallest band)
for (j in 1:length(bb)) {
position_in_box <- z[[i]][(bb[j]-aa$Freq[j] + 1):bb[j]]
x <- replace(x,
list = which(data_spread$bandid == aa$bandid[j]),
values = position_in_box[rank(x[data_spread$bandid == aa$bandid[j]], ties.method="first")]
)
# 0622new: A new added part to label the bandid in the process
x_bandid <- replace(x_bandid,
list = which(data_spread$bandid == aa$bandid[j]),
values = aa$bandid[j]
)
}
}
output <- list(x = x, x_bandid = x_bandid)
output
},
# notice that, to accomadate no numeric variable case, I changed the stat_position to a data vector
data_spread[, end_position, drop = FALSE],
as.data.frame(start_position),
obs_position,
names_to_group)
arranged_position
}
# summarize the steps for one factor block into a function
# input: continuous_fac, data_spread and bywhich, freespace, nobs
# bywhich: positions, indicate which numeric variable to use for adjustment, prior or after
# output: arranged_position_inband, or: data.frame for the final xstart, xend, ystart, yend
# works for each factor block
# 0622new: also return bandid, and modification for dealing with sub-factor blocks
process_fac2fac <- function(data_spread, continuous_fac, start_position, freespace, nobs, subfac = FALSE, start_bandid = NULL) {
# 0622new: to treat the start_bandid as a factor to replace the data_spread$id place, which is in the first place
# so we may come here if want to deal with original id later(although the id is kept as the order of every output)
if (subfac) {
data_spread[1] <- as.factor(start_bandid)
continuous_fac <- c(0, continuous_fac)
}
# nlevels_list for those "continued" factors, for further use
nlevels_list_con_fac <- lapply(data_spread[, continuous_fac+1, drop = FALSE],
FUN = function(x) list(nlevels = nlevels(x),
table = table(x)))
# to calculate the exsiting combinations of levels, for further use
fac_table <- as.data.frame(table(data_spread[, continuous_fac + 1, drop = FALSE]))
fac_table <- fac_table[fac_table$Freq != 0, ]
fac_table$bandid <- as.numeric(rownames(fac_table))
# names of the factor variables, for convenience
names_to_group <- names(data_spread[, continuous_fac + 1, drop = FALSE])
# the calculation is used to calculate the position for levels within factors, used inside assign_fac()
# freespace is 0.1
# eachobs <- (1 - freespace)/nobs
# level_range_2 <- lapply(nlevels_list_con_fac,
# FUN = function(x) {
# freespace_offset <- rep(freespace/(x$nlevels - 1), times = x$nlevels - 1)
# freespace_offset_cum <- cumsum(freespace_offset)
# c(0, rep(cumsum(eachobs*x$table), each = 2)[-2*x$nlevels]) +
# c(0, 0, rep(freespace_offset_cum, each = 2))
# })
# calculate the positions for boxes(within each level within each factor)
# 0622new: We actually don't use: assign_box and box_position
# box_position <- assign_box(fac_table, level_range_2, nlevels_list_con_fac, names_to_group)
# postion for the observations in the factor block
obs_position_2 <- assign_fac(nlevels_list_con_fac, nobs, freespace = freespace)
# bandid for the original data_spread
data_spread$bandid <- bandid(data_spread, continuous_fac, nobs, nlevels_list_con_fac)
# match the positions in the factor block to the prior numeric variable, considering band
### we need to use similar method to work with the numeric variable after the factor block,
# when there is no numeric variable prior to the factor block
# input position adjusted for id column in data_spread
# it also arranged the positions inside the factor block
### we need to come back for multiple factor blocks
# note, start_position now is a data vector for factors to be adjusted by
arranged_position_inband <- lapply(seq_along(continuous_fac),
FUN = function(x) arrange_fac_by_ystart_bandid(
data_spread,
start_position = as.data.frame(start_position),
continuous_fac[x] + 1,
obs_position_2[x],
fac_table,
names_to_group = names(data_spread)[continuous_fac[x] + 1]))
#0622new: output bandid
arranged_fac_block_x <- list(data_final_xstart_fac2fac = rep(continuous_fac[-length(continuous_fac)], each = nobs),
data_final_ystart_fac2fac = unlist(lapply(arranged_position_inband[-length(arranged_position_inband)],
FUN = function(y) y[[1]][[1]])),
data_final_xend_fac2fac = rep(continuous_fac[-1], each = nobs),
data_final_yend_fac2fac = unlist(lapply(arranged_position_inband[-1],
FUN = function(y) y[[1]][[1]])))
arranged_fac_block_x_bandid <- list(data_final_ystart_fac2fac_bandid = unlist(lapply(arranged_position_inband[-length(arranged_position_inband)],
FUN = function(y) y[[1]][[2]])),
data_final_yend_fac2fac_bandid = unlist(lapply(arranged_position_inband[-1],
FUN = function(y) y[[1]][[2]])))
# arranged_fac_block_x is the originally returned arranged_fac_block before
arranged_fac_block <- list(arranged_fac_block_x = arranged_fac_block_x,
arranged_fac_block_x_bandid = arranged_fac_block_x_bandid)
arranged_fac_block
}
# Function for data preparation: to convert the input data to the form which can be used directly in the following: data_spread
# argument classpcp and nobs are kept because they are also useful outside this function
prepare_data <- function(data, classpcp, nobs) {
# to make R CMD CHECK happy
level <- value <- name <- NULL
# make adjustment to accept proper data set
# make sure the output data_spread has the same correct expected column order
data$name <- factor(data$name, levels = unique(data$name))
data_spread <- spread(data[, c("id__", "name", "value")], key = name, value = value)
ncol <- nrow(data)/nobs
nvar <- length(levels(data$name))
# ncol should be the same as nvar
# this may not work with tibble
num <- classpcp %in% c("numeric", "integer")
fac <- classpcp %in% c("factor", "ordered factor")
# char <- classpcp %in% c("character")
data_spread[, c(FALSE, num)] <- lapply(data_spread[, c(FALSE, num), drop = FALSE],
FUN = function(x) as.numeric(as.character(x)))
# to deal with factors, assign proper levels
# same name, value may break down this,
# if the user choose to put the same variable into the data twive
if (sum(fac) != 0) {
original_levels <- unique(data[which(data$class %in% c("factor", "ordered factor")),c("name", "value", "level")])
original_levels$name <- droplevels(original_levels$name)
original_levels <- original_levels %>%
group_by(name) %>%
arrange(level, .by_group = TRUE) %>%
ungroup()
original_levels <- split(original_levels, f = original_levels$name)
# here it is robust for the change of values(in transform_pcp), because we use the arrange(level) and factor(, level = value)
data_spread[, c(FALSE, fac)] <- Map(f = function(x, y){
factor(x, levels = y$value)
},
data_spread[, c(FALSE, fac), drop = FALSE],
original_levels)
}
# if(sum(char) != 0){
# data_spread[, c(FALSE, char)] <- lapply(data_spread[, c(FALSE, char), drop = FALSE],
# FUN = function(x) as.factor(as.character(x)))
# }
data_spread
}
# The function for the preparation of adjusment for boxwidth, rugwidth, interwidth
# used to calculate the adjusted values, which are used later in other places to actually do the adjustment
prepare_width_ajustment <- function(classpcp, boxwidth, rugwidth, interwidth, reverse = FALSE) {
# interval length, boxwidth, rugwidth
# adjusted for different lengths
if (reverse != TRUE) {
if (length(interwidth) == 1) {
interwidth <- rep(interwidth, times = length(classpcp) - 1)
}
interwidth <- cumsum(c(1, interwidth))
if (length(boxwidth) == 1) {
boxwidth <- rep(boxwidth, times = sum(classpcp == "factor"))
}
if (length(rugwidth) == 1) {
rugwidth <- rep(rugwidth, times = sum(!classpcp == "factor"))
}
# calculate accumulated changes
boxrugwidth <- seq_along(classpcp)
boxrugwidth[classpcp == "factor"] <- boxwidth
boxrugwidth[!classpcp == "factor"] <- rugwidth
cumboxrugwidth <- cumsum(boxrugwidth)
# calculate the adjusted position
boxwidth_xend <- interwidth + cumboxrugwidth
boxwidth_xstart <- boxwidth_xend - boxrugwidth
width_adjusted <- list(boxwidth_xstart = boxwidth_xstart,
boxwidth_xend = boxwidth_xend,
breaks = boxwidth_xend - boxrugwidth/2)
width_adjusted
} else {
if (length(interwidth) == 1) {
interwidth <- rep(interwidth, times = length(classpcp) - 1)
}
interwidth <- cumsum(c(-1, -interwidth))
if (length(boxwidth) == 1) {
boxwidth <- rep(boxwidth, times = sum(classpcp == "factor"))
}
if (length(rugwidth) == 1) {
rugwidth <- rep(rugwidth, times = sum(!classpcp == "factor"))
}
# calculate accumulated changes
boxrugwidth <- seq_along(classpcp)
boxrugwidth[classpcp == "factor"] <- -boxwidth
boxrugwidth[!classpcp == "factor"] <- -rugwidth
cumboxrugwidth <- cumsum(boxrugwidth)
# calculate the adjusted position
boxwidth_xend <- interwidth + cumboxrugwidth
boxwidth_xstart <- boxwidth_xend - boxrugwidth
width_adjusted <- list(boxwidth_xstart = boxwidth_xstart,
boxwidth_xend = boxwidth_xend,
breaks = boxwidth_xend - boxrugwidth/2)
width_adjusted
}
width_adjusted
}
# The function is used to identify which positions factors/factor blocks should use to sort the values
# it returns the positions of which variables should be used, but when use for data_spread, we need bywhich + 1 for the first column adjustment
prepare_bywhich <- function(start_fac2fac, classpcp) {
bywhich <- start_fac2fac - 1
# all sub-factor blocks sorted by a same numeric variable as before
if(sum(classpcp %in% c("numeric", "integer")) != 0){
# work well with bywhich[1] = 0, or go out of the bound(by else)
while(sum(classpcp[bywhich[1]] %in% c("numeric", "integer")) == 0) {
bywhich[1] <- bywhich[1] + 1
}
# work well with classpcp[0]
while(sum(classpcp[bywhich] == "factor") != 0) {
num_of_0 <- sum(bywhich == 0)
bywhich_adjust_position <- c(rep(FALSE, num_of_0), classpcp[bywhich] == "factor")
bywhich[bywhich_adjust_position] <- bywhich[bywhich_adjust_position] - 1
}
bywhich[bywhich == 0] <- bywhich[1]
} else {
bywhich <- rep(NA, length(bywhich))
}
bywhich
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.