README.md

Generate excel descriptive report : A Guide to the descriptive Package

Nivesh Elangovanraaj 2020-01-22

This is an introductory guide for the use of descriptive package. This package helps create a descriptive summary of the data and exports them into formatted excel files,that are ready for consumption. This package uses tables and openxlsx package to generate the formatted descriptive summary reports

Introduction

Descriptive statistics are used to describe the basic features of the data in a study. They provide summaries about the sample.

Why descriptive?

In R, we have libraries that create descriptive summary reports in a html format. This package focuses on reporting the descriptive summary in form of excel files (xlsx format). Excel files allows us to easily interpret the summary because of its table format

How to use the package?

There are four main functions for use in descriptive: descriptive(), sheet(), wb()and export().The usage of each of the functions is explained below.

  1. descriptive() - To generate the descriptive summary of the data
  2. sheet() - To bind the tables printed in a sheet
  3. wb() - To bind all the sheets of a workbook
  4. export() - To export the workbook object as an excel workbook

To start, install and load descriptive with the following code:

devtools::install_github("nivesh22/descriptive")
library("descriptive")

descriptive()

descriptive() is the primary function of descriptive. It produces descriptive summary for the variables from a dataset given as inputs. The summary output will be frequencies for categorical variables, and central tendency and dispersion for continuous variables.

For more help using descriptive(), see ?descriptive in R, which contains information on how to use the different arguments of the function.

## survival pacakge for kidney data
library(survival)
data(kidney)
# patient: id
# time: time
# status: event status
# age: in years
# sex: 1=male, 2=female
# disease: disease type (0=GN, 1=AN, 2=PKD, 3=Other)
# frail: frailty estimate from original paper
#Re-coding the sex variable
kidney$sex<-ifelse(kidney$sex==1,"Male","Female")

Categorical summary

The summay for categorical variables can be generated as follows:

r ## Categorical summary Gender<-descriptive(ADS = kidney, type="categorical", #type can be categorical,binary or continuous vars = c("sex"), vars_label = c("Gender"), percent = "col") print(Gender$table)

| Variables | All | % | |:----------|:----|:-----| | All | 76 | 1.00 | | Gender | | | | Female | 56 | 0.74 | | Male | 20 | 0.26 |

The summary can be generated with stratification as well. The stratifications variable has to be added as list of vectors.

## Categorical summary with stratification
Gender<-descriptive(ADS = kidney,
                    type="categorical",
                    vars = c("disease"),
                    strata = list(c("sex")),
                    strata_label = list(c("Gender")),
                    percent = NULL)
print(Gender$table)

| Variables | Overall | N | sex | Female | N | sex | Male | N | |:----------|:------------|:-----------------|:---------------| | All | 76 | 56 | 20 | | disease | | | | | AN | 24 | 20 | 4 | | GN | 18 | 12 | 6 | | Other | 26 | 20 | 6 | | PKD | 8 | 4 | 4 |

In case of multiple stratifications, the stratification variable has to be given in different vectors inside the list.

## Categorical summary with multiple stratifications
multiple_strata<-descriptive(ADS = kidney,
                             type="categorical",
                             vars = c("disease"),
                             strata = list(c("status"),c("sex")),
                             percent = NULL)
print(multiple_strata$table)
Variables Overall | N status | 0 | N status | 1 | N sex | Female | N sex | Male | N All 76 18 58 56 20 disease AN 24 6 18 20 4 GN 18 4 14 12 6 Other 26 6 20 20 6 PKD 8 2 6 4 4

In case of nested stratification, the stratification variables has to be given in the same vector inside the list.

## Categorical summary with sub-stratification
nested_strata<-descriptive(ADS = kidney,
                           type="categorical",
                           vars = c("disease"),
                           strata = list(c("status","sex")),
                           percent = NULL)
print(nested_strata$table)
Variables Overall | N status | 0 | All | N status | 0 | sex | Female | N status | 0 | sex | Male | N status | 1 | All | N status | 1 | sex | Female | N status | 1 | sex | Male | N All 76 18 16 2 58 40 18 disease AN 24 6 5 1 18 15 3 GN 18 4 4 0 14 8 6 Other 26 6 6 0 20 14 6 PKD 8 2 1 1 6 3 3

Binary summary

At times when we generate the summary statistics for a binary variable, we'd like to report only the 1s or TRUEs. In cases like these, we can use type="binary"

## Categorical summary
Status<-descriptive(ADS = kidney,
type="binary",
vars = c("status"),
vars_label = c("status=1"),
strata = c("sex"),
percent = "col")
print(Status$table)
Variables Overall | All Overall | Percent sex | Female | All sex | Female | Percent sex | Male | All sex | Male | Percent 1 All 76 1.00 56 1.00 20 1.0 4 status=1 58 0.76 40 0.71 18 0.9

Continuous summary

For continuous variables, we can generate summary statistics like mean,median,min,max,etc.

## Categorical summary with sub-stratification
continuous_vars<-descriptive(ADS = kidney,
type="continuous",
vars = c("time","frail"),
strata = list(c("sex")),
strata_label = list(c("Gender")),
numeric_summary ="min+mean+median+sd+max")
print(continuous_vars$table)

| Variables | Labels | All | sex | Female | sex | Male | |:----------|:-------|:-------|:-------------|:-----------| | All | All | 76.00 | 56.00 | 20.00 | | time | All | 76.00 | 56.00 | 20.00 | | | min | 2.00 | 5.00 | 2.00 | | | mean | 101.63 | 116.75 | 59.30 | | | median | 39.50 | 62.00 | 16.50 | | | sd | 130.91 | 130.36 | 126.08 | | | max | 562.00 | 536.00 | 562.00 | | frail | All | 76.00 | 56.00 | 20.00 | | | min | 0.20 | 0.40 | 0.20 | | | mean | 1.18 | 1.14 | 1.31 | | | median | 1.10 | 1.05 | 1.20 | | | sd | 0.69 | 0.64 | 0.82 | | | max | 3.00 | 2.90 | 3.00 |

Sheet()

This function helps in biding the necessary information to generate a sheet

#sheet1
sheet1<-sheet(tables = list(Gender),
subtitle = "Gender distribution",
sheetname = "Gender")
#sheet2
sheet2<-sheet(tables = list(continuous_vars),
subtitle = "Continuous variables",
sheetname = "cont. variables")
#sheet2
sheet3<-sheet(tables = list(multiple_strata,nested_strata),
subtitle = "multiple tables",
sheetname = "multiple tables",
stack = "sideways")

To print multiple tables in a sheet, the objects returned by descriptive() has to be given in the form of a list of objects to the tables argument. The tables can be printed either one below the other or one next to the other. This can be controlled by the argument stack. Stack takes either of the two values 'sideways' or 'below' sideways for printing the tables one next to the other and below one below the other respectively.

wb()

This function binds all the sheets given as input together as a list.

#workbook
workbook<-wb(sheet1,sheet2,sheet3)

export()

This function prints all the tables generated to an excel workbook.

#workbook
export(excel_workbook = workbook,file_name = "Kidney data descriptive",index = T)


nivesh22/descriptive documentation built on Jan. 22, 2020, 8:03 p.m.