knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(tsg)
Throughout the examples, we will use the person_record sample dataset, which is included in the tsg package. This dataset contains demographic information about individuals, including person_id, sex, age, marital_status, employed status, and functional difficulties.
dim(person_record) head(person_record)
The generate_frequency() function creates frequency tables for one or more categorical variables in a data frame. It supports a variety of enhancements, such as sorting, adding totals and percentages, handling missing values, and customizing labels. This function is highly versatile and can work with grouped data, outputting either a single table or a list of tables.
person_record |> generate_frequency(sex)
If you pass multiple variables, it will generate frequency tables for each variable separately in a list.
person_record |> generate_frequency(sex, age, marital_status)
You can also specify grouping using the group_by() from dplyr and it will calculate the frequency table for each group.
person_record |> dplyr::group_by(sex) |> generate_frequency(marital_status)
By default, the function will generate a single frequency table for the grouped data. If you want to generate a list of frequency tables for each group, you can set group_as_list = TRUE.
person_record |> dplyr::group_by(sex) |> generate_frequency(marital_status, group_as_list = TRUE)
By default, the output is sorted by frequency in descending order. If sort_value is set to FALSE, the output will be sorted by the variable values in ascending order.
person_record |> generate_frequency(age, sort_value = TRUE) person_record |> generate_frequency(age, sort_value = FALSE)
If multiple variables are specified, you can indicate which variable/s is/are excluded from sorting using the sort_except argument.
person_record |> generate_frequency( sex, age, marital_status, # vector of variable names (character) to exclude from sorting sort_except = "age" )
n valuesYou can specify the top n most frequent values to display in the frequency table, if sort_value is TRUE. By default, it will show top-n values plus the remaining values grouped into "Others".
person_record |> generate_frequency( marital_status, top_n = 3 )
If you want to show only the top-n values and exclude the rest, set top_n_only = TRUE.
person_record |> generate_frequency( marital_status, top_n = 3, top_n_only = TRUE )
You can also specify whether to include or exclude NAs (missing values) from the frequency table.
person_record |> generate_frequency( employed, include_na = TRUE # default ) # Exclude NA values person_record |> generate_frequency( employed, include_na = FALSE )
If the all variables passed to generate_frequency() are of the same structure (i.e. have the same number of levels or categories), you can collapse them into a single frequency table by setting collapse_list = TRUE.
person_record |> generate_frequency( seeing, hearing, walking, remembering, self_caring, communicating, collapse_list = TRUE )
Or equivalently using the collapse_list() helper function.
person_record |> generate_frequency( seeing, hearing, walking, remembering, self_caring, communicating ) |> collapse_list()
You can also add cumulative frequency and percentage to the frequency table.
person_record |> generate_frequency( sex, add_cumulative = TRUE, add_cumulative_percent = TRUE )
You can also specify whether to express the value as a proportion.
person_record |> generate_frequency( marital_status, as_proportion = TRUE )
You can also position the total row at the top of the table.
person_record |> generate_frequency( marital_status, position_total = "top" )
NOTE: For labelled data, the value for the row total is automatically set the lowest numeric value. The default label for the total row is "Total"; if you want to set a custom label for the total row, you can use the label_total argument.
The generate_crosstab() function allows you to create cross-tabulations between two variables, which is useful for exploring relationships between categorical variables.
person_record |> generate_crosstab(marital_status, sex)
NOTE: If you pass only one variable, it will fall back to generate_frequency() and generate a frequency table for variable specified.
If you pass mutliple variables, it will generate cross-tabulations for each pair of variables separately in a list.
person_record |> generate_crosstab( sex, seeing, hearing, walking, remembering, self_caring, communicating )
You can also specify grouping with group_by() from dplyr and it will calculate the cross-tabulation for each group.
person_record |> dplyr::group_by(sex) |> generate_crosstab(marital_status, employed)
If you want to generate a list of cross-tabulations for each group, you can set group_as_list = TRUE.
person_record |> dplyr::group_by(sex) |> generate_crosstab(marital_status, employed, group_as_list = TRUE)
You can specify whether to calculate the percentage or proportion by row or column using the percent_by_column argument. If it is set to TRUE, the percentage will be calculated by column; if set to FALSE, it will be calculated by row. The default is FALSE.
person_record |> generate_crosstab( marital_status, sex, percent_by_column = TRUE )
Just like generate_frequency(), you can also specify whether to express the value as a proportion.
person_record |> generate_crosstab( marital_status, sex, as_proportion = TRUE )
You can also position the total row at the top of the table.
person_record |> generate_crosstab( marital_status, sex, position_total = "top" )
You can export your frequency table or cross-tabulation to Excel using the write_xlsx().
person_record |> generate_frequency(sex) |> write_xlsx(path = "table-01.xlsx")
You can add a title and subtitle to your table using the add_table_title() and add_table_subtitle() functions.
person_record |> generate_crosstab(marital_status, sex) |> add_table_title("Marital Status by Sex") |> add_table_subtitle("Sample dataset: person_record") |> write_xlsx(path = "table-02.xlsx")
You can also add end notes to your table using the add_source_note() and add_footnote() functions.
person_record |> generate_crosstab(marital_status, sex) |> add_table_title("Marital Status by Sex") |> add_table_subtitle("Sample dataset: person_record") |> add_source_note("Source: person_record dataset") |> add_footnote("This is a footnote for the table") |> write_xlsx(path = "table-03.xlsx")
Alternatively, you can directly add table title, subtitle, source_note, and footnotes by specifying them in the arguments of the write_xlsx() function.
person_record |> generate_crosstab(marital_status, sex) |> write_xlsx( path = "table-03.xlsx", table_title = "Marital Status by Sex", table_subtitle = "Sample dataset: person_record", source_note = "Source: person_record dataset", footnotes = "This is a footnote for the table" )
You can use the add_facade() function to apply a facade to your table. A facade is a set of styling options that can be applied to the table to customize its appearance.
person_record |> generate_frequency(sex) |> add_facade( table.offsetRow = 2, table.offsetCol = 1 ) |> write_xlsx( path = "table-04.xlsx", # Using built-in facade facade = get_tsg_facade("yolo") )
If you want to further customize the appearance of your table, you can use the facade argument to specify a YAML facade file. The facade file contains styling options for the table, such as font size, border style, background color, and text alignment.
person_record |> generate_frequency(sex) |> write_xlsx( path = "table-05.xlsx", # Using built-in facade facade = get_tsg_facade("yolo") )
You can generate a template facade file using the generate_template() function and then customize it to your needs.
generate_output() functiongenerate_output() can be used to generate and save the output file in the specified format (e.g., Excel, HTML, PDF, Word). It supports various formats and can handle different data structures.
person_record |> generate_frequency(sex) |> generate_output(path = "table-06.xlsx")
NOTE: At the moment, it only supports Excel output. The other formats are not yet implemented.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.