pcp_select: Data wrangling for GPCPs: Step 1 variable selection

View source: R/pcp_select.r

pcp_selectR Documentation

Data wrangling for GPCPs: Step 1 variable selection

Description

The pcp_select function allows a selection of variables from a data set. These variables are transformed into an embellished long form of the data.

Usage

pcp_select(data, ...)

Arguments

data

a dataframe or tibble

...

choose the columns to be used in the parallel coordinate plot. Variables can be selected by position, name or any of the tidyselect selector functions.

Details

The data pipeline feeding any of the geom layers in the ggpcp package is implemented in a three-step modularized form rather than as the stat functions more typical for ggplot2 extensions. The three steps of data pre-processing are:

command data processing step
pcp_select variable selection (and horizontal ordering)
pcp_scale (vertical) scaling of values
pcp_arrange dealing with tie-breaks on categorical axes

Note that these data processing steps are executed before the call to ggplot2 and the identity function is used by default in all of the ggpcp specific layers. Besides the speed-up by only executing the processing steps once for all layers, the separation has the additional benefit, that it provides the users with the possibility to make specific choices at each step in the process. Additionally, separation allows for a cleaner user interface: parameters affecting the data preparation process can be moved to the relevant (set of) function(s) only, thereby reducing the number of arguments without any loss of functionality.

Value

dataframe of a long form of the selected variables with extra columns:

variable functionality
pcp_x, pcp_y values for the mappings to x and y axes
pcp_yend vertical endpoint of a line segment
pcp_class type of each of the input variables
pcp_level preserves order of levels in categorical variables
pcp_id identifier for each observation

The dimensions of the returned data set are: 6 + the number of input variables for its columns. The number of rows is given as the multiple of the number of selected variables and the number of rows in the original data.

See Also

pcp_scale(), pcp_arrange()

Examples

data(Carcinoma)
dim(Carcinoma)
# select all variables
pcp_data <- Carcinoma |> pcp_select(1:9)
dim(pcp_data) # 6 more columns, 9 times as many observations
head(pcp_data)

ggpcp documentation built on Nov. 28, 2022, 5:05 p.m.