plotAvgBy: Plot average of an indicator over bins/categories of another...

Description Usage Arguments Value Examples

Description

Useful for assessing the relationship between two variables of interest. There are cases, especially when outliers are involved, or many obs, that a scatterplot can be difficult to read. This function bins up one of the continuous variables that would be used in a scatterplot and calculates the mean (or other function) of the continuous variable over a range (discretized into categories) of the second continuous variable. It uses an equal depth binning algorithm to compute these bins on the by variable.

It can be used to assess the average of a binary target variable/prediction over a range of levels of a continuous or categorical variable.

Usage

1
plotAvgBy(indv, byv, nbins = 5, data = F, plotNbin = T, ...)

Arguments

indv

vector. This the variable whose mean will be calculated over the categories of the byv vector binned up

byv

vector. This is variable to be binned up, by which the indv variable will be averaged

nbins

numeric. Number of bins to create when discretizing byv. Passed to depthbin function.

data

logical. TRUE returns the aggregated data.table. FALSE returns noting. TRUE is default.

plotNbin

logical. TRUE plots the count of obs in each bin on top of each bar. TRUE is default.

...

additional barplot arguments.

Value

prints a barplot unless data==T, in which case the aggregated data.table is returned

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
plotAvgBy(mtcars[,'mpg'], mtcars[,'drat'], nbins=8)
plotAvgBy(mtcars[,'mpg'], mtcars[,'drat'], nbins=5, plotNbin=F)
plotAvgBy(mtcars[,'mpg'], mtcars[,'drat'], nbins=5, plotNbin=F, data=T)

## Example with missing data
df <- mtcars
df$mpg[sample(1:nrow(mtcars), 5)] <- NA
df$drat[sample(1:nrow(mtcars), 5)] <- NA

plotAvgBy(df[,'mpg'], df[,'drat'], nbins=5, plotNbin=F, data=T)

brooksandrew/Rsenal documentation built on May 13, 2019, 7:50 a.m.