getCounts: Compute Frequencies for Categorical Variables
In avirkki/synergetr: Synthetic Data Generation

Count, group and arrange categorical values from the most to the least frequent. Optionally, compute cumulative distribution function (cdf) and suppress entries that occur less than min_count of times.

1
2
3

getCounts(tbl, cols, compute_cdf = TRUE, min_count = 1, default = "''",
  con = options("synergetr_con")[[1]],
  sample_max = options("synergetr_sample_max")[[1]])

`tbl`	Table name to inspect
`cols`	Vector of table fields
`compute_cdf`	Add cumulative distribution function to result set as a 'cdf' column.
`min_count`	The minimum amount of times a distinct value must appear in raw data to be included in the frequency count (this parameter is used to exclude rare values from appearing at all).
`sample_max`	The maximum number of rows to use

If option("synergetr_con") points to a database connection, the computation of the frequencies will be done at the database and the tbl should be a character string (e.g. tbl == "schemaname.table_name"). If "synergetr_con" is not set (i.e. equals NULL), the computations will be done in R memory using the data.table package, and tbl can be either the actual data.frame or its variable name as a character string.

avirkki/synergetr documentation built on May 18, 2019, 9:16 p.m.