knitr::opts_chunk$set( comment = "#>", collapse = TRUE, echo = TRUE, message = FALSE, knitr.table.format = "html" ) options( vlkr.fig.settings=list( html = list( dpi = 96, scale = 1, width = 910, pxperline = 12 ) ) )
First, load the package, set the plot theme and get some data.
# Load the package library(volker) # Set the basic plot theme theme_set(theme_vlkr()) # Load an example dataset ds from the package ds <- volker::chatgpt
Decide whether your data is categorical or metric and choose the appropriate function:
report_counts()
shows frequency tables and generates simple and stacked bar charts.report_metrics()
creates tables with distribution parameters,
visualises distributions in density plots, box plots or scatter plots.The column selection determines whether to analyse single variables, item lists or to compare and correlate multiple variables.
Try out the following examples!
# A single variable report_counts(ds, use_private)
# A list of variables report_counts(ds, c(use_private, use_work))
# Variables matched by a pattern report_counts(ds, starts_with("use_"))
# One metric variable tab_metrics(ds, sd_age)
# Multiple metric items tab_metrics(ds, starts_with("cg_adoption_"))
Provide a grouping column in the third parameter to compare different groups.
report_counts(ds, adopter, sd_gender)
For metric variables, you can compare the mean values. The ci parameter adds confidence intervals.
report_metrics(ds, sd_age, sd_gender, ci = TRUE)
By default, the crossing variable is treated as categorical. You can change this behavior using the metric-parameter to calculate correlations:
tab_metrics(ds, sd_age, use_work, metric = TRUE, ci = TRUE)
See the function help (F1 key) to learn the options.
For example, you can use the prop
parameter to grow bars to 100%.
The numbers
parameter prints frequencies and percentages onto the bars.
ds |> filter(sd_gender != "diverse") |> report_counts(adopter, sd_gender, prop="rows", numbers= "n")
Further, the effect-functions conduct statistical tests:
ds |> filter(sd_gender != "diverse") |> effect_counts(adopter, sd_gender)
Reports combine plots, tables and effect calculations in an RMarkdown document. Optionally, for item batteries, an index, clusters or factors are calculated and reported.
To see an example or develop own reports, use the volker report template in RStudio:
Have fun with developing own reports!
Without the template, to generate a volker-report from any R-Markdown document,
add volker::html_report
to the
output options of your Markdown document:
--- title: "How to create reports?" output: volker::html_report ---
Then, you can generate combined outputs using the report-functions. One advantage of the report-functions is that plots are automatically scaled to fit the page. See the function help for further options (F1 key).
ds %>% filter(sd_gender != "diverse") %>% report_metrics(starts_with("cg_adoption_"), sd_gender, box=TRUE, ci=TRUE)
By default, a header and tabsheets are automatically created. You can mix in custom content.
FALSE
and add your
own title.FALSE
and adding a new header
on the fifth level (5 x # followed by the tab name).
Close your custom new tabsheet with #### {-}
(4 x #). Try out the following pattern in an RMarkdown document!
#> ### Adoption types #> #> ```r #> ds %>% #> filter(sd_gender != "diverse") %>% #> report_counts(adopter, sd_gender, prop="rows", title=FALSE, close=FALSE) #> ``` #> #> ##### Method #> Basis: Only male and female respondents. #> #> #### {-}
The theme_vlkr()
-function lets you customise colors:
theme_set(theme_vlkr( base_fill = c("#F0983A","#3ABEF0","#95EF39","#E35FF5","#7A9B59"), base_gradient = c("#FAE2C4","#F0983A") ))
Labels used in plots and tables are stored in the comment attribute of the variable.
You can inspect all labels using the codebook()
-function:
codebook(ds)
Set specific column labels by providing a named list to the items-parameter of labs_apply()
:
ds %>% labs_apply( items = list( "cg_adoption_advantage_01" = "Allgemeine Vorteile", "cg_adoption_advantage_02" = "Finanzielle Vorteile", "cg_adoption_advantage_03" = "Vorteile bei der Arbeit", "cg_adoption_advantage_04" = "Macht mehr Spaß" ) ) %>% report_metrics(starts_with("cg_adoption_advantage_"))
Labels for values inside a column can be adjusted by providing a named list to the values-parameter of labs_apply()
. In addition, select the columns where value labels should be changed:
ds %>% labs_apply( cols=starts_with("cg_adoption"), values = list( "1" = "Stimme überhaupt nicht zu", "2" = "Stimme nicht zu", "3" = "Unentschieden", "4" = "Stimme zu", "5" = "Stimme voll und ganz zu" ) ) %>% report_metrics(starts_with("cg_adoption"))
To conveniently manage all labels of a dataset,
save the result of codebook()
to an Excel file,
change the labels manually in a copy of the Excel file,
and finally call labs_apply()
with your revised codebook.
library(readxl) library(writexl) # Save codebook to a file codes <- codebook(ds) write_xlsx(codes,"codebook.xlsx") # Load and apply a codebook from a file codes <- read_xlsx("codebook_revised.xlsx") ds <- labs_apply(ds, codebook)
Be aware that some data operations such as mutate()
from the tidyverse
loose labels on their way. In this case, store the labels (in the
codebook attribute of the data frame) before the operation and restore
them afterwards:
ds %>% labs_store() %>% mutate(sd_age = 2024 - sd_age) %>% labs_restore() %>% report_metrics(sd_age)
You can calculate mean indexes from a bunch of items using add_index()
.
A new column is created with the average value of all selected columns
for each case. Provide a custom name for the column using the newcol
parameter.
Reliability and number of items are calculated with psych::alpha()
and stored as column attribute named "psych.alpha". The reliability values
are printed by report
_metrics()`.
Add a single index
ds %>% add_index(starts_with("cg_adoption_"), newcol = "idx_cg_adoption") %>% report_metrics(idx_cg_adoption)
Compare the index values by group
ds %>% add_index(starts_with("cg_adoption_"), newcol = "idx_cg_adoption") %>% report_metrics(idx_cg_adoption, adopter)
Add multiple indizes and summarize them
ds %>% add_index(starts_with("cg_adoption_")) %>% add_index(starts_with("cg_adoption_advantage")) %>% add_index(starts_with("cg_adoption_fearofuse")) %>% add_index(starts_with("cg_adoption_social")) %>% tab_metrics(starts_with("idx_cg_adoption"))
The easiest way to conduct factor analysis or cluster analyses
is to use the respective parameters in the report_metrics()
function.
ds |> report_metrics(starts_with("cg_adoption"), factors = TRUE, clusters = TRUE)
Currently, cluster analysis is performed using kmeans and factor analysis is a principal component analysis. Setting the parameters to true, automatically generates scree plots and selects the number of factors or clusters. Alternatively, you can explicitly specify the numbers.
Add factor analysis results
If you want to work with the results, use add_factors()
and add_clusters()
respectively.
For factor analysis, new columns prefixed with "fct_" are created to store the factor loadings based on the specified number of factors.
For clustering, an additional column prefixed with "cls_" is added that assigns each observation to a cluster number.
You can use the new columns as shown below.
ds |> add_factors(starts_with("cg_adoption"), k = 3) |> report_metrics(fct_cg_adoption_1, fct_cg_adoption_2, metric = TRUE)
Automatically determine the number of factors
To automatically determine the optimal number of factors or clusters based on diagnostics, set k = NULL.
ds |> add_factors(starts_with("cg_adoption"), k = NULL) |> factor_tab(starts_with("fct_cg_adoption"))
Compare values by cluster
ds |> add_clusters(starts_with("cg_adoption"), k = 3) |> report_counts(sd_gender, cls_cg_adoption, prop = "cols")
The volker-package is based on standard methods for data handling and visualisation. You could produce all outputs on your own. The package just makes your code dry - don't repeat yourself - and wraps often used snippets into a simple interface.
Report functions call subsidiary tab and plot functions, which in turn call functions
specifically designed for the provided column selection.
In case you only need a table or want to work with the result of a table,
call the specific function. For example tab_counts()
or plot_counts()
.
Console and markdown output is pimped by specific print- and knit-functions.
To make this work, the cleaned data, produced plots, tables and markdown snippets
gain new classes (vlkr_df
, vlkr_plt
, vlkr_tbl
, vlkr_list
, vlkr_rprt
).
The volker-package makes use of common tidyverse functions. Basically, most outputs are generated by three functions:
count()
is used to produce counts skim()
is used to produce metrics ggplot()
is used to assemble plots.Statistical tests, clustering and factor analysis are largely based on the stats, psych, car and effectsize packages.
Thanks to all the maintainers, authors and contributors of the packages that make the world of data a magical place.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.