expss
computes and displays tables with support for 'SPSS'-style labels, multiple / nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', R notebooks, 'Shiny' and 'Jupyter' notebooks. Proper methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package offers useful functions for data processing in marketing research / social surveys - popular data transformation functions from 'SPSS' Statistics and 'Excel' ('RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP', etc.). Package is intended to help people to move data processing from 'Excel'/'SPSS' to R. See examples below. You can get help about any function by typing ?function_name
in the R console.
expss
is on CRAN, so for installation you can print in the console
install.packages("expss")
.
We will use for demonstartion well-known mtcars
dataset. Let's start with adding labels to the dataset. Then we can continue with tables creation.
library(expss) data(mtcars) mtcars = apply_labels(mtcars, mpg = "Miles/(US) gallon", cyl = "Number of cylinders", disp = "Displacement (cu.in.)", hp = "Gross horsepower", drat = "Rear axle ratio", wt = "Weight (1000 lbs)", qsec = "1/4 mile time", vs = "Engine", vs = c("V-engine" = 0, "Straight engine" = 1), am = "Transmission", am = c("Automatic" = 0, "Manual"=1), gear = "Number of forward gears", carb = "Number of carburetors" )
For quick cross-tabulation there are fre
and cross
family of function. For simplicity we demonstrate here only cross_cpct
which calculates column percent. Documentation for other functions, such as cross_cases
for counts, cross_rpct
for row percent, cross_tpct
for table percent and cross_fun
for custom summary functions can be seen by typing ?cross_cpct
and ?cross_fun
in the console.
# 'cross_*' examples # just simple crosstabulation, similar to base R 'table' function cross_cases(mtcars, am, vs) # Table column % with multiple banners cross_cpct(mtcars, cyl, list(total(), am, vs)) # magrittr pipe usage and nested banners mtcars %>% cross_cpct(cyl, list(total(), am %nest% vs))
We have more sophisticated interface for table construction with magrittr
piping. Table construction consists of at least of three functions chained with pipe operator: %>%
. At first we need to specify variables for which statistics will be computed with tab_cells
. Secondary, we calculate statistics with one of the tab_stat_*
functions. And last, we finalize table creation with tab_pivot
, e. g.: dataset %>% tab_cells(variable) %>% tab_stat_cases() %>% tab_pivot()
. After that we can optionally sort table with tab_sort_asc
, drop empty rows/columns with drop_rc
and transpose with tab_transpose
. Resulting table is just a data.frame
so we can use usual R operations on it. Detailed documentation for table creation can be seen via ?tables
. For significance testing see ?significance
.
Generally, tables automatically translated to HTML for output in knitr or Jupyter notebooks. However, if we want HTML output in the R notebooks or in the RStudio viewer we need to set options for that: expss_output_rnotebook()
or expss_output_viewer()
.
# simple example mtcars %>% tab_cells(cyl) %>% tab_cols(total(), am) %>% tab_stat_cpct() %>% tab_pivot() # table with caption mtcars %>% tab_cells(mpg, disp, hp, wt, qsec) %>% tab_cols(total(), am) %>% tab_stat_mean_sd_n() %>% tab_last_sig_means(subtable_marks = "both") %>% tab_pivot() %>% set_caption("Table with summary statistics and significance marks.") # Table with the same summary statistics. Statistics labels in columns. mtcars %>% tab_cells(mpg, disp, hp, wt, qsec) %>% tab_cols(total(label = "#Total| |"), am) %>% tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>% tab_pivot() # Different statistics for different variables. mtcars %>% tab_cols(total(), vs) %>% tab_cells(mpg) %>% tab_stat_mean() %>% tab_stat_valid_n() %>% tab_cells(am) %>% tab_stat_cpct(total_row_position = "none", label = "col %") %>% tab_stat_rpct(total_row_position = "none", label = "row %") %>% tab_stat_tpct(total_row_position = "none", label = "table %") %>% tab_pivot(stat_position = "inside_rows") # Table with split by rows and with custom totals. mtcars %>% tab_cells(cyl) %>% tab_cols(total(), vs) %>% tab_rows(am) %>% tab_stat_cpct(total_row_position = "above", total_label = c("number of cases", "row %"), total_statistic = c("u_cases", "u_rpct")) %>% tab_pivot() # Linear regression by groups. mtcars %>% tab_cells(sheet(mpg, disp, hp, wt, qsec)) %>% tab_cols(total(label = "#Total| |"), am) %>% tab_stat_fun_df( function(x){ frm = reformulate(".", response = as.name(names(x)[1])) model = lm(frm, data = x) sheet('Coef.' = coef(model), confint(model) ) } ) %>% tab_pivot()
Here we use truncated dataset with data from product test of two samples of chocolate sweets. 150 respondents tested two kinds of sweets (codenames: VSX123 and SDF546). Sample was divided into two groups (cells) of 75 respondents in each group. In cell 1 product VSX123 was presented first and then SDF546. In cell 2 sweets were presented in reversed order. Questions about respondent impressions about first product are in the block A (and about second tested product in the block B). At the end of the questionnaire there was a question about the preferences between sweets.
List of variables:
id
Respondent Idcell
First tested product (cell number)s2a
Agea1_1-a1_6
What did you like in these sweets? Multiple response. First tested producta22
Overall quality. First tested productb1_1-b1_6
What did you like in these sweets? Multiple response. Second tested productb22
Overall quality. Second tested productc1
Preferencesdata(product_test) w = product_test # shorter name to save some keystrokes # here we recode variables from first/second tested product to separate variables for each product according to their cells # 'h' variables - VSX123 sample, 'p' variables - 'SDF456' sample # also we recode preferences from first/second product to true names # for first cell there are no changes, for second cell we should exchange 1 and 2. w = w %>% let_if(cell == 1, h1_1 %to% h1_6 := recode(a1_1 %to% a1_6, other ~ copy), p1_1 %to% p1_6 := recode(b1_1 %to% b1_6, other ~ copy), h22 := recode(a22, other ~ copy), p22 := recode(b22, other ~ copy), c1r = c1 ) %>% let_if(cell == 2, p1_1 %to% p1_6 := recode(a1_1 %to% a1_6, other ~ copy), h1_1 %to% h1_6 := recode(b1_1 %to% b1_6, other ~ copy), p22 := recode(a22, other ~ copy), h22 := recode(b22, other ~ copy), c1r := recode(c1, 1 ~ 2, 2 ~ 1, other ~ copy) ) %>% let( # recode age by groups age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2), # count number of likes # codes 2 and 99 are ignored. h_likes = count_row_if(1 | 3 %thru% 98, h1_1 %to% h1_6), p_likes = count_row_if(1 | 3 %thru% 98, p1_1 %to% p1_6) ) # here we prepare labels for future usage codeframe_likes = num_lab(" 1 Liked everything 2 Disliked everything 3 Chocolate 4 Appearance 5 Taste 6 Stuffing 7 Nuts 8 Consistency 98 Other 99 Hard to answer ") overall_liking_scale = num_lab(" 1 Extremely poor 2 Very poor 3 Quite poor 4 Neither good, nor poor 5 Quite good 6 Very good 7 Excellent ") w = apply_labels(w, c1r = "Preferences", c1r = num_lab(" 1 VSX123 2 SDF456 3 Hard to say "), age_cat = "Age", age_cat = c("18 - 25" = 1, "26 - 35" = 2), h1_1 = "Likes. VSX123", p1_1 = "Likes. SDF456", h1_1 = codeframe_likes, p1_1 = codeframe_likes, h_likes = "Number of likes. VSX123", p_likes = "Number of likes. SDF456", h22 = "Overall quality. VSX123", p22 = "Overall quality. SDF456", h22 = overall_liking_scale, p22 = overall_liking_scale )
Are there any significant differences between preferences? Yes, difference is significant.
# 'tab_mis_val(3)' remove 'hard to say' from vector w %>% tab_cols(total(), age_cat) %>% tab_cells(c1r) %>% tab_mis_val(3) %>% tab_stat_cases() %>% tab_last_sig_cases() %>% tab_pivot()
Further we calculate distribution of answers in the survey questions.
# lets specify repeated parts of table creation chains banner = w %>% tab_cols(total(), age_cat, c1r) # column percent with significance tab_cpct_sig = . %>% tab_stat_cpct() %>% tab_last_sig_cpct(sig_labels = paste0("<b>",LETTERS, "</b>")) # means with siginifcance tab_means_sig = . %>% tab_stat_mean_sd_n(labels = c("<b><u>Mean</u></b>", "sd", "N")) %>% tab_last_sig_means( sig_labels = paste0("<b>",LETTERS, "</b>"), keep = "means") # Preferences banner %>% tab_cells(c1r) %>% tab_cpct_sig() %>% tab_pivot() # Overall liking banner %>% tab_cells(h22) %>% tab_means_sig() %>% tab_cpct_sig() %>% tab_cells(p22) %>% tab_means_sig() %>% tab_cpct_sig() %>% tab_pivot() # Likes banner %>% tab_cells(h_likes) %>% tab_means_sig() %>% tab_cells(mrset(h1_1 %to% h1_6)) %>% tab_cpct_sig() %>% tab_cells(p_likes) %>% tab_means_sig() %>% tab_cells(mrset(p1_1 %to% p1_6)) %>% tab_cpct_sig() %>% tab_pivot() # below more complicated table where we compare likes side by side # Likes - side by side comparison w %>% tab_cols(total(label = "#Total| |"), c1r) %>% tab_cells(list(unvr(mrset(h1_1 %to% h1_6)))) %>% tab_stat_cpct(label = var_lab(h1_1)) %>% tab_cells(list(unvr(mrset(p1_1 %to% p1_6)))) %>% tab_stat_cpct(label = var_lab(p1_1)) %>% tab_pivot(stat_position = "inside_columns")
We can save labelled dataset as *.csv file with accompanying R code for labelling.
write_labelled_csv(w, file filename = "product_test.csv")
Or, we can save dataset as *.csv file with SPSS syntax to read data and apply labels.
write_labelled_spss(w, file filename = "product_test.csv")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.