PlotVar: Create over time variable plots and summary statitsics for...

View source: R/plot_print.R

PlotVarR Documentation

Create over time variable plots and summary statitsics for one variable

Description

For a numerical variable, the output includes

  • side-by-side boxplots grouped by dateGpBp (left),

  • a trace plot of p1, p50, and p99 percentiles, grouped by dateGp (top right),

  • a trace plot of mean and +-1 SD control limits, grouped by dateGp(middle right), and

  • a trace plot of missing and zerorates, grouped by dateGp (bottom right).

For a categorical variable (including a numerical variable with no more than 2 unique levels not including NA), the output includes

  • a frequency bar plot (left), and

  • a grid of trace plots on categories' proportions over time (right). If the variable contains more than kCategories number of categories, trace plots of only the largest kCategories will be plotted.

In addition to plots, a data.table of summary statistics are generated, on global and over time summary statistics.

Usage

PlotVar(dataFl, myVar, weightNm, dateNm, dateGp, dateGpBp = NULL,
  labelFl = NULL, highlightNms = NULL, skewOpt = NULL, kSample = 50000,
  fuzzyLabelFn = NULL, kCategories = 9)

Arguments

dataFl

A data.table containing at least the following columns: myVar, weightNm, dateGp, dateGpBp; usually an output of the PrepData function.

myVar

Name of the variable to be plotted.

weightNm

Name of the variable containing row weights, or NULL for no weights (all rows receiving weight 1).

dateNm

Name of column containing the date variable.

dateGp

Name of the variable that the time series plots should be grouped by. Options are NULL, "weeks", "months", "quarters", "years". See IDate for details. If NULL, then dateNm will be used as dateGp.

dateGpBp

Name of variable the boxplots should be grouped by. Same options as dateGp. If NULL, then dateGp will be used.

labelFl

A data.table containing variable labels, or NULL for no labels; usually an output of PrepLabels.

highlightNms

Either NULL or a character vector of variables to recieve red label. Currently NULL means all variables will get a black legend. Ignored this argument if labelFl == NULL.

skewOpt

Either a numeric constant or NULL. Default is NULL (no transformation). If numeric, say 5, then all box plots of a variable whose skewness exceeds 5 will be on a log10 scale if possible. Negative input of skewOpt will be converted to 3.

kSample

Either NULL or a positive integer. If an integer, indicates the sample size for both drawing boxplots and ordering numerical graphs by R^2. When the data is large, setting kSample to a reasonable value (default is 50K) dramatically improves processing speed. Therefore, for larger datasets (e.g. > 10 percent system memory), this parameter should not be set to NULL, or boxplots may take a very long time to render. This setting has no impact on the accuracy of time series plots on quantiles, mean, SD, and missing and zero rates.

fuzzyLabelFn

Either NULL or a function of 2 parameters: A label file in the format of an output by PrepLabels and a string giving a variable name. The function should return the label corresponding to the variable given by the second parameter. This function should describe how fuzzy matching should be performed to find labels (see example below). If NULL, only exact matches will be retuned.

kCategories

If a categorical variable has more than kCategories, trace plots of only the kCategories most prevalent categories are plotted.

Value

p

A grob (i.e., ggplot grid) object. See the output p of the function or PlotNumVar PlotCatVar for details.

varSummary

A data.table of summary statistics. See the output numVarSummary of the function PlotNumVar, or the output catVarSummary of the function PlotCatVar for details.

varType

Indicator of the variable's type, either "nmrcl" or "ctgrl".

License

Copyright 2017 Capital One Services, LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

See Also

Functions depend on this function: PrintPlots.

This function depends on: PlotCatVar, PlotNumVar, PrepData.

Examples

data(bankData)
bankData <- PrepData(bankData, dateNm = "date", dateGp = "months", 
                     dateGpBp = "quarters")
data(bankLabels)
bankLabels <- PrepLabels(bankLabels)

## PlotVar will treat numerical and categorical data differently. 
## Binary data is always treated as categorical.
plot(PlotVar(bankData, myVar = "duration", weightNm = NULL, dateNm = "date", 
     dateGp = "months", dateGpBp =  "quarters", labelFl = bankLabels)$p)
plot(PlotVar(bankData, myVar = "job", weightNm = NULL, dateNm = "date", 
     dateGp = "months", dateGpBp =  "quarters", labelFl = bankLabels)$p)
plot(PlotVar(bankData, myVar = "loan", weightNm = NULL, dateNm = "date", 
     dateGp = "months", dateGpBp =  "quarters", labelFl = bankLabels)$p)


capitalone/otvPlots documentation built on March 15, 2024, 8:25 a.m.