normality.tbl_dbi | R Documentation |
The normality() performs Shapiro-Wilk test of normality of numerical(INTEGER, NUMBER, etc.) column of the DBMS table through tbl_dbi.
## S3 method for class 'tbl_dbi'
normality(.data, ..., sample = 5000, in_database = FALSE, collect_size = Inf)
.data |
a tbl_dbi. |
... |
one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, normality() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing. |
sample |
the number of samples to perform the test. |
in_database |
Specifies whether to perform in-database operations. If TRUE, most operations are performed in the DBMS. if FALSE, table data is taken in R and operated in-memory. Not yet supported in_database = TRUE. |
collect_size |
a integer. The number of data samples from the DBMS to R. Applies only if in_database = FALSE. See vignette("EDA") for an introduction to these concepts. |
This function is useful when used with the group_by
function of the dplyr package. If you want to test by level of the categorical
data you are interested in, rather than the whole observation,
you can use group_tf as the group_by function.
This function is computed shapiro.test
function.
An object of the same class as .data.
The information derived from the numerical data test is as follows.
statistic : the value of the Shapiro-Wilk statistic.
p_value : an approximate p-value for the test. This is said in Roystion(1995) to be adequate for p_value < 0.1.
sample : the numer of samples to perform the test. The number of observations supported by the stats::shapiro.test function is 3 to 5000.
normality.data.frame
, diagnose_numeric.tbl_dbi
, describe.tbl_dbi
.
# If you have the 'DBI' and 'RSQLite' packages installed, perform the code block:
if (FALSE) {
library(dplyr)
# connect DBMS
con_sqlite <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# copy heartfailure to the DBMS with a table named TB_HEARTFAILURE
copy_to(con_sqlite, heartfailure, name = "TB_HEARTFAILURE", overwrite = TRUE)
# Using pipes ---------------------------------
# Normality test of all numerical variables
con_sqlite %>%
tbl("TB_HEARTFAILURE") %>%
normality()
# Positive values select variables, and In-memory mode and collect size is 200
con_sqlite %>%
tbl("TB_HEARTFAILURE") %>%
normality(platelets, sodium, collect_size = 200)
# Positions values select variables
con_sqlite %>%
tbl("TB_HEARTFAILURE") %>%
normality(1)
# Using pipes & dplyr -------------------------
# Test all numerical variables by 'smoking' and 'death_event',
# and extract only those with 'smoking' variable level is "Yes".
con_sqlite %>%
tbl("TB_HEARTFAILURE") %>%
group_by(smoking, death_event) %>%
normality() %>%
filter(smoking == "Yes")
# extract only those with 'sex' variable level is "Male",
# and test 'sodium' by 'smoking' and 'death_event'
con_sqlite %>%
tbl("TB_HEARTFAILURE") %>%
filter(sex == "Male") %>%
group_by(smoking, death_event) %>%
normality(sodium)
# Test log(sodium) variables by 'smoking' and 'death_event',
# and extract only p.value greater than 0.01.
# SQLite extension functions for log
RSQLite::initExtension(con_sqlite)
con_sqlite %>%
tbl("TB_HEARTFAILURE") %>%
mutate(log_sodium = log(sodium)) %>%
group_by(smoking, death_event) %>%
normality(log_sodium) %>%
filter(p_value > 0.01)
# Disconnect DBMS
DBI::dbDisconnect(con_sqlite)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.