Nivesh Elangovanraaj 2020-01-22
This is an introductory guide for the use of descriptive
package. This package helps create a descriptive summary of the data and exports them into formatted excel files,that are ready for consumption. This package uses tables
and openxlsx
package to generate the formatted descriptive summary reports
Descriptive statistics are used to describe the basic features of the data in a study. They provide summaries about the sample.
descriptive
?In R, we have libraries that create descriptive summary reports in a html format. This package focuses on reporting the descriptive summary in form of excel files (xlsx format). Excel files allows us to easily interpret the summary because of its table format
There are four main functions for use in descriptive
: descriptive()
, sheet()
, wb()
and export()
.The usage of each of the functions is explained below.
descriptive()
- To generate the descriptive summary of the datasheet()
- To bind the tables printed in a sheetwb()
- To bind all the sheets of a workbookexport()
- To export the workbook object as an excel workbookTo start, install and load descriptive
with the following code:
devtools::install_github("nivesh22/descriptive")
library("descriptive")
descriptive()
descriptive()
is the primary function of descriptive
. It produces descriptive summary for the variables from a dataset given as inputs. The summary output will be frequencies for categorical variables, and central tendency and dispersion for continuous variables.
For more help using descriptive()
, see ?descriptive
in R, which contains information on how to use the different arguments of the function.
## survival pacakge for kidney data
library(survival)
data(kidney)
# patient: id
# time: time
# status: event status
# age: in years
# sex: 1=male, 2=female
# disease: disease type (0=GN, 1=AN, 2=PKD, 3=Other)
# frail: frailty estimate from original paper
#Re-coding the sex variable
kidney$sex<-ifelse(kidney$sex==1,"Male","Female")
The summay for categorical variables can be generated as follows:
r ## Categorical summary Gender<-descriptive(ADS = kidney, type="categorical", #type can be categorical,binary or continuous vars = c("sex"), vars_label = c("Gender"), percent = "col") print(Gender$table)
| Variables | All | % | |:----------|:----|:-----| | All | 76 | 1.00 | | Gender | | | | Female | 56 | 0.74 | | Male | 20 | 0.26 |
The summary can be generated with stratification as well. The stratifications variable has to be added as list of vectors.
## Categorical summary with stratification
Gender<-descriptive(ADS = kidney,
type="categorical",
vars = c("disease"),
strata = list(c("sex")),
strata_label = list(c("Gender")),
percent = NULL)
print(Gender$table)
| Variables | Overall | N | sex | Female | N | sex | Male | N | |:----------|:------------|:-----------------|:---------------| | All | 76 | 56 | 20 | | disease | | | | | AN | 24 | 20 | 4 | | GN | 18 | 12 | 6 | | Other | 26 | 20 | 6 | | PKD | 8 | 4 | 4 |
In case of multiple stratifications, the stratification variable has to be given in different vectors inside the list.
## Categorical summary with multiple stratifications
multiple_strata<-descriptive(ADS = kidney,
type="categorical",
vars = c("disease"),
strata = list(c("status"),c("sex")),
percent = NULL)
print(multiple_strata$table)
Variables
Overall | N
status | 0 | N
status | 1 | N
sex | Female | N
sex | Male | N
All
76
18
58
56
20
disease
AN
24
6
18
20
4
GN
18
4
14
12
6
Other
26
6
20
20
6
PKD
8
2
6
4
4
In case of nested stratification, the stratification variables has to be given in the same vector inside the list.
## Categorical summary with sub-stratification
nested_strata<-descriptive(ADS = kidney,
type="categorical",
vars = c("disease"),
strata = list(c("status","sex")),
percent = NULL)
print(nested_strata$table)
Variables
Overall | N
status | 0 | All | N
status | 0 | sex | Female | N
status | 0 | sex | Male | N
status | 1 | All | N
status | 1 | sex | Female | N
status | 1 | sex | Male | N
All
76
18
16
2
58
40
18
disease
AN
24
6
5
1
18
15
3
GN
18
4
4
0
14
8
6
Other
26
6
6
0
20
14
6
PKD
8
2
1
1
6
3
3
At times when we generate the summary statistics for a binary variable, we'd like to report only the 1
s or TRUE
s. In cases like these, we can use type="binary"
## Categorical summary
Status<-descriptive(ADS = kidney,
type="binary",
vars = c("status"),
vars_label = c("status=1"),
strata = c("sex"),
percent = "col")
print(Status$table)
Variables
Overall | All
Overall | Percent
sex | Female | All
sex | Female | Percent
sex | Male | All
sex | Male | Percent
1
All
76
1.00
56
1.00
20
1.0
4
status=1
58
0.76
40
0.71
18
0.9
For continuous variables, we can generate summary statistics like mean
,median
,min
,max
,etc.
## Categorical summary with sub-stratification
continuous_vars<-descriptive(ADS = kidney,
type="continuous",
vars = c("time","frail"),
strata = list(c("sex")),
strata_label = list(c("Gender")),
numeric_summary ="min+mean+median+sd+max")
print(continuous_vars$table)
| Variables | Labels | All | sex | Female | sex | Male | |:----------|:-------|:-------|:-------------|:-----------| | All | All | 76.00 | 56.00 | 20.00 | | time | All | 76.00 | 56.00 | 20.00 | | | min | 2.00 | 5.00 | 2.00 | | | mean | 101.63 | 116.75 | 59.30 | | | median | 39.50 | 62.00 | 16.50 | | | sd | 130.91 | 130.36 | 126.08 | | | max | 562.00 | 536.00 | 562.00 | | frail | All | 76.00 | 56.00 | 20.00 | | | min | 0.20 | 0.40 | 0.20 | | | mean | 1.18 | 1.14 | 1.31 | | | median | 1.10 | 1.05 | 1.20 | | | sd | 0.69 | 0.64 | 0.82 | | | max | 3.00 | 2.90 | 3.00 |
Sheet()
This function helps in biding the necessary information to generate a sheet
#sheet1
sheet1<-sheet(tables = list(Gender),
subtitle = "Gender distribution",
sheetname = "Gender")
#sheet2
sheet2<-sheet(tables = list(continuous_vars),
subtitle = "Continuous variables",
sheetname = "cont. variables")
#sheet2
sheet3<-sheet(tables = list(multiple_strata,nested_strata),
subtitle = "multiple tables",
sheetname = "multiple tables",
stack = "sideways")
To print multiple tables in a sheet, the objects returned by descriptive()
has to be given in the form of a list of objects to the tables argument. The tables can be printed either one below the other or one next to the other. This can be controlled by the argument stack
. Stack takes either of the two values 'sideways' or 'below' sideways
for printing the tables one next to the other and below
one below the other respectively.
wb()
This function binds all the sheets given as input together as a list.
#workbook
workbook<-wb(sheet1,sheet2,sheet3)
export()
This function prints all the tables generated to an excel workbook.
#workbook
export(excel_workbook = workbook,file_name = "Kidney data descriptive",index = T)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.