PlotNumVar: Create plots and summary statistics for a numerical variable

View source: R/numerical.R

PlotNumVarR Documentation

Create plots and summary statistics for a numerical variable

Description

Output plots include a boxplot on the left, grouped by a courser time scale (dateGpBp), and three trace plots on the right, on p1, p50, and p99 qunatiles, mean and +-1 SD control limits, missing and zerorates, all grouped by a finer time scale as in dateGp. In addition to plots, a data.table of summary statistics are generated, on global and over time summary statistics.

Usage

PlotNumVar(myVar, dataFl, weightNm, dateGp, dateGpBp, skewOpt = NULL,
  kSample = 50000)

Arguments

myVar

The name of the variable to be plotted

dataFl

A data.table of data; must be the output of the PrepData function.

weightNm

Name of the variable containing row weights, or NULL for no weights (all rows receiving weight 1).

dateGp

Name of the variable that the time series plots should be grouped by. Options are NULL, "weeks", "months", "quarters", "years". See IDate for details. If NULL, then dateNm will be used as dateGp.

dateGpBp

Name of variable the boxplots should be grouped by. Same options as dateGp. If NULL, then dateGp will be used.

skewOpt

Either a numeric constant or NULL. Default is NULL (no transformation). If numeric, say 5, then all box plots of a variable whose skewness exceeds 5 will be on a log10 scale if possible. Negative input of skewOpt will be converted to 3.

kSample

Either NULL or a positive integer. If an integer, indicates the sample size for both drawing boxplots and ordering numerical graphs by R^2. When the data is large, setting kSample to a reasonable value (default is 50K) dramatically improves processing speed. Therefore, for larger datasets (e.g. > 10 percent system memory), this parameter should not be set to NULL, or boxplots may take a very long time to render. This setting has no impact on the accuracy of time series plots on quantiles, mean, SD, and missing and zero rates.

Value

p

A grob (i.e., ggplot grid) object, including a side-byside boxplot grouped by dateGpBp, a time series plot of p1, p50 (median), and p99 grouped by dateGp, a time series plot of mean and +-1 SD control limits grouped by dateGp, and a time series plot of missing and zerorates grouped by dateGp.

numVarSummary

A data.table, contains global and over time summary statistics, including p1, p25, p50, p75, and p99 quantiles, mean and SD, missing and zero rates.

License

Copyright 2017 Capital One Services, LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

See Also

Functions depend on this function: PlotVar.

This function depends on: SummaryStats, PlotDist, PlotQuantiles, PlotMean, PlotRates, PrepData.

Examples

data(bankData)
bankData <- PrepData(bankData, dateNm = "date", dateGp = "months", 
                    dateGpBp = "years")
plot(PlotNumVar("balance", bankData, NULL, "months", "years", 
                 skewOpt = NULL, kSample = NULL)$p)

capitalone/otvPlots documentation built on March 15, 2024, 8:25 a.m.