PlotCatVar: Create plots and summary statistics for a categorical...

View source: R/categorical.R

PlotCatVarR Documentation

Create plots and summary statistics for a categorical variable

Description

Output plots include a bar plot with cateogries ordered by global counts, and trace plots of categories' proportions over time. This function is also appliable to a binary varible, which is treated as categorical in this package. In addition to plots, a data.table of summary statistics are generated, on global counts and proportions by cateory, and proportions by category over time.

Usage

PlotCatVar(myVar, dataFl, weightNm = NULL, dateNm, dateGp, kCategories = 9,
  normBy = "time")

Arguments

myVar

The name of the variable to be plotted

dataFl

A data.table of data; must be the output of the PrepData function.

weightNm

Name of the variable containing row weights, or NULL for no weights (all rows receiving weight 1).

dateNm

Name of column containing the date variable.

dateGp

Name of the variable that the time series plots should be grouped by. Options are NULL, "weeks", "months", "quarters", "years". See IDate for details. If NULL, then dateNm will be used as dateGp.

kCategories

If a categorical variable has more than kCategories, trace plots of only the kCategories most prevalent categories are plotted.

normBy

The normalization factor for rate plots, can be "time" or "var". If "time", then for each time period of dateGp, counts are normalized by the total counts over all categories in that time period. This illustrates changes of categories' proportions over time. If "var", then for each category, its counts are normalized by the total counts over time from only this category. This illustrates changes of categories' volumes over time.

Value

p

A grob (i.e., ggplot grid) object, including a bar plot, and trace plots of categories' proportions. If the number of categories is larger than kCategories, then trace plots of only the kCategories most prevalent categories are be plotted. For a binary variable, only the trace plot of the less prevalent category is plotted.

catVarSummary

A data.table, contains categories' proportions globally, and over-time in each time period in dateGp. Each row is a category of the categorical (or binary) variable myVar. The row whose category == 'NA' corresponds to missing. Categories are ordered by global prevalence in a descending order.

License

Copyright 2017 Capital One Services, LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

See Also

Functions depend on this function: PlotVar, PrintPlots, vlm.

This function depends on: PlotBarplot, PlotRatesOverTime, PrepData.

Examples

data(bankData)
bankData <- PrepData(bankData, dateNm = "date", dateGp = "months", 
                    dateGpBp = "quarters", weightNm = NULL)
# Single histogram is plotted for job type since there are 12 categories
plot(PlotCatVar(myVar = "job", dataFl = bankData, weightNm =  NULL, 
                     dateNm = "date", dateGp = "months")$p)
                     
plot(PlotCatVar(myVar = "job", dataFl = bankData, weightNm = NULL, 
                     dateNm = "date", dateGp = "months", kCategories = 12)$p)


## Binary data is treated as categorical,  and only the less frequent 
## category is plotted over time.
plot(PlotCatVar(myVar = "default", dataFl = bankData, weightNm = NULL, 
                     dateNm = "date", dateGp = "months")$p)

capitalone/otvPlots documentation built on March 15, 2024, 8:25 a.m.