normality.tbl_dbi: Performs the Shapiro-Wilk test of normality
In bit2r/kodlookr: Korean Help Resources for the dlookr Package

Description Usage Arguments Details Value Normality test information See Also Examples

The normality() performs Shapiro-Wilk test of normality of numerical(INTEGER, NUMBER, etc.) column of the DBMS table through tbl_dbi.

1 2	## S3 method for class 'tbl_dbi' normality(.data, ..., sample = 5000, in_database = FALSE, collect_size = Inf)

`.data`	a tbl_dbi.
`...`	one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, normality() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.
`sample`	the number of samples to perform the test.
`in_database`	Specifies whether to perform in-database operations. If TRUE, most operations are performed in the DBMS. if FALSE, table data is taken in R and operated in-memory. Not yet supported in_database = TRUE.
`collect_size`	a integer. The number of data samples from the DBMS to R. Applies only if in_database = FALSE. See vignette("EDA") for an introduction to these concepts.

This function is useful when used with the group_by function of the dplyr package. If you want to test by level of the categorical data you are interested in, rather than the whole observation, you can use group_tf as the group_by function. This function is computed shapiro.test function.

An object of the same class as .data.

The information derived from the numerical data test is as follows.

statistic : the value of the Shapiro-Wilk statistic.
p_value : an approximate p-value for the test. This is said in Roystion(1995) to be adequate for p_value < 0.1.
sample : the numer of samples to perform the test. The number of observations supported by the stats::shapiro.test function is 3 to 5000.

normality.data.frame, diagnose_numeric.tbl_dbi, describe.tbl_dbi, plot_normality.tbl_dbi.

library(dplyr)

# connect DBMS
con_sqlite <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

# copy heartfailure to the DBMS with a table named TB_HEARTFAILURE
copy_to(con_sqlite, heartfailure, name = "TB_HEARTFAILURE", overwrite = TRUE)

# Using pipes ---------------------------------
# Normality test of all numerical variables
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  normality()

# Positive values select variables, and In-memory mode and collect size is 200
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  normality(platelets, sodium, collect_size  = 200)

# Positions values select variables
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  normality(1)

# Using pipes & dplyr -------------------------
# Test all numerical variables by 'smoking' and 'death_event',
# and extract only those with 'smoking' variable level is "Yes".
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  group_by(smoking, death_event) %>%
  normality() %>%
  filter(smoking == "Yes")

# extract only those with 'sex' variable level is "Male",
# and test 'sodium' by 'smoking' and 'death_event'
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  filter(sex == "Male") %>%
  group_by(smoking, death_event) %>%
  normality(sodium)

# Test log(sodium) variables by 'smoking' and 'death_event',
# and extract only p.value greater than 0.01.

# SQLite extension functions for log
RSQLite::initExtension(con_sqlite)

con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  mutate(log_sodium = log(sodium)) %>%
  group_by(smoking, death_event) %>%
  normality(log_sodium) %>%
  filter(p_value > 0.01)
 
# Disconnect DBMS   
DBI::dbDisconnect(con_sqlite)