Description Usage Arguments Details Value Normality test information See Also Examples
The normality() performs Shapiro-Wilk test of normality of numerical(INTEGER, NUMBER, etc.) column of the DBMS table through tbl_dbi.
| 1 2 | 
| .data | a tbl_dbi. | 
| ... | one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, normality() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing. | 
| sample | the number of samples to perform the test. | 
| in_database | Specifies whether to perform in-database operations. If TRUE, most operations are performed in the DBMS. if FALSE, table data is taken in R and operated in-memory. Not yet supported in_database = TRUE. | 
| collect_size | a integer. The number of data samples from the DBMS to R. Applies only if in_database = FALSE. See vignette("EDA") for an introduction to these concepts. | 
This function is useful when used with the group_by
function of the dplyr package. If you want to test by level of the categorical
data you are interested in, rather than the whole observation,
you can use group_tf as the group_by function.
This function is computed shapiro.test function.
An object of the same class as .data.
The information derived from the numerical data test is as follows.
statistic : the value of the Shapiro-Wilk statistic.
p_value : an approximate p-value for the test. This is said in Roystion(1995) to be adequate for p_value < 0.1.
sample : the numer of samples to perform the test. The number of observations supported by the stats::shapiro.test function is 3 to 5000.
normality.data.frame, diagnose_numeric.tbl_dbi,
describe.tbl_dbi, plot_normality.tbl_dbi.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | library(dplyr)
# connect DBMS
con_sqlite <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# copy heartfailure to the DBMS with a table named TB_HEARTFAILURE
copy_to(con_sqlite, heartfailure, name = "TB_HEARTFAILURE", overwrite = TRUE)
# Using pipes ---------------------------------
# Normality test of all numerical variables
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  normality()
# Positive values select variables, and In-memory mode and collect size is 200
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  normality(platelets, sodium, collect_size  = 200)
# Positions values select variables
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  normality(1)
# Using pipes & dplyr -------------------------
# Test all numerical variables by 'smoking' and 'death_event',
# and extract only those with 'smoking' variable level is "Yes".
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  group_by(smoking, death_event) %>%
  normality() %>%
  filter(smoking == "Yes")
# extract only those with 'sex' variable level is "Male",
# and test 'sodium' by 'smoking' and 'death_event'
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  filter(sex == "Male") %>%
  group_by(smoking, death_event) %>%
  normality(sodium)
# Test log(sodium) variables by 'smoking' and 'death_event',
# and extract only p.value greater than 0.01.
# SQLite extension functions for log
RSQLite::initExtension(con_sqlite)
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  mutate(log_sodium = log(sodium)) %>%
  group_by(smoking, death_event) %>%
  normality(log_sodium) %>%
  filter(p_value > 0.01)
 
# Disconnect DBMS   
DBI::dbDisconnect(con_sqlite)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.