PrepData: Prepare an input dataset for plotting

Description Usage Arguments Details Value License See Also Examples

View source: R/prep.R

Description

This function prepares an input dataset for use by all plotting functions in this package, including the main function vlm. The input data dataFl must contain, at a minimum, a date column dateNm and a variable to be plotted. dataFl will be converted to a data.table class, and all changes are made to it by reference.

Usage

1
2
3
PrepData(dataFl, dateNm, selectCols = NULL, dropCols = NULL,
  dateFt = "%d%h%Y", dateGp = NULL, dateGpBp = NULL, weightNm = NULL,
  varNms = NULL, dropConstants = FALSE, ...)

Arguments

dataFl

Either the name of an object that can be converted using as.data.table (e.g., a data frame), or a character string containing the name of dataset that can be loaded using fread (e.g., a csv file). If the dataset is not in your working directory then dataFl must include (relative or absolute) path to file.

dateNm

Name of column containing the date variable.

selectCols

Either NULL, or a vector of names or indices of variables to read into memory – must include dateNm, weightNm (if not NULL) and all variables to be plotted. If both selectCols and dropCols are NULL, then all variables will be read in.

dropCols

Either NULL, or a vector of variables names or indices of variables not to read into memory. If both selectCols and dropCols are NULL, then all variables will be read in.

dateFt

strptime format of date variable. The default is SAS format "%d%h%Y". But input data with R date format "%Y-%m-%d" will also be detected. Both of two formats can be parsed automatically.

dateGp

Name of the variable that the time series plots should be grouped by. Options are NULL, "weeks", "months", "quarters", "years". See IDate for details. If NULL, then dateNm will be used as dateGp.

dateGpBp

Name of variable the boxplots should be grouped by. Same options as dateGp. If NULL, then dateGp will be used.

weightNm

Name of the variable containing row weights, or NULL for no weights (all rows receiving weight 1).

varNms

Either NULL or a vector of names or indices of variables to be plotted. If NULL, will default to all columns which are not dateNm or weightNm. Can also be a vector of indices of the column names, after dropCols or selectCols have been applied, if applicable, and not including dateGp, dateGpBp (which will be added to the dataFl by the function PrepData).

dropConstants

Logical, indicates whether or not constant (all duplicated or NA) variables should be dropped from dataFl prior to plotting.

...

Additional parameters to be passed to fread.

Details

If weights (weightNm) are provided, then it is normalized to have a sum of weights equal the total sample size, and the weights are used in all summary statistics calculations and plotting.

Value

A data.table object, formatted for use by all plotting functions in this package otvPlots, including the main function vlm, and the individual variable plotting function PlotVar.

License

Copyright 2017 Capital One Services, LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

See Also

Functions depend on this function: PlotBarplot, PlotRatesOverTime, PlotCatVar, SummaryStats, PlotMean, PlotQuantiles, PlotRates, PlotDist, PlotNumVar, PlotVar, PrintPlots, CalcR2, OrderByR2, vlm.

Examples

1
2
3
4
5
6
## Use the bankData dataset in this package
data(bankData)
bankData <- PrepData(bankData, dateNm = "date", dateGp = "months", 
                     dateGpBp = "quarters")
## Columns have been assigned a plotting class (nmrcl/ctgrl)
str(bankData) 

Example output

The following variables will be plotted:
Numerical: age balance duration campaign pdays previous
Categorical: job marital education default housing loan contact poutcome y

Classesdata.tableand 'data.frame':	45211 obs. of  18 variables:
 $ age      : int  58 44 33 47 33 35 28 42 58 43 ...
 $ job      : 'character' chr  "management" "technician" "entrepreneur" "blue-collar" ...
 $ marital  : 'character' chr  "married" "single" "married" "married" ...
 $ education: 'character' chr  "tertiary" "secondary" "secondary" "unknown" ...
 $ default  : 'character' chr  "no" "no" "no" "no" ...
 $ balance  : int  2143 29 2 1506 1 231 447 2 121 593 ...
 $ housing  : 'character' chr  "yes" "yes" "yes" "yes" ...
 $ loan     : 'character' chr  "no" "no" "yes" "no" ...
 $ contact  : 'character' chr  "unknown" "unknown" "unknown" "unknown" ...
 $ duration : int  261 151 76 92 198 139 217 380 50 55 ...
 $ campaign : int  1 1 1 1 1 1 1 1 1 1 ...
 $ pdays    : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ previous : int  0 0 0 0 0 0 0 0 0 0 ...
 $ poutcome : 'character' chr  "unknown" "unknown" "unknown" "unknown" ...
 $ y        : 'character' chr  "no" "no" "no" "no" ...
 $ date     : IDate, format: "2008-05-05" "2008-05-05" ...
 $ months   : IDate, format: "2008-05-01" "2008-05-01" ...
 $ quarters : IDate, format: "2008-04-01" "2008-04-01" ...
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "sorted")= chr "months"

otvPlots documentation built on May 1, 2019, 6:49 p.m.