balance_dataset: Balance a Dataset for Model Training/Testing

Description Usage Arguments Details

Description

Classification models work best with "balanced" datasets where the number of positive negative cases are roughly equal. This function takes an input tibble and balances it by randomly downsampling until it has the same number of positive and negative cases.

Usage

1

Arguments

data

A tibble with input data.

var

The name of the column to balance.

Details

Note! It assumes there are more positive than negative cases because this is always true in the customer-review datasets I'm working with. If your data differs, the code is straightforward to modify.


chris31415926535/yelpredict documentation built on Jan. 7, 2021, 9:34 p.m.