Home

/

GitHub

/

Report.md
In sbalci/histopathology-template: Template of R Codes Used in Histopathology Research.

title: "Histopathology Research Template" description: | Codes Used in Histopathology Research Data Report for Histopathology Research Example Using Random Generated Fakedata author: - name: Serdar Balci, MD, Pathologist url: https://sbalci.github.io/histopathology-template/ affiliation: serdarbalci.com affiliation_url: https://www.serdarbalci.com/ date: "2020-05-13" mail: drserdarbalci@gmail.com linkedin: "serdar-balci-md-pathologist" twitter: "serdarbalci" github: "sbalci" home: "https://www.serdarbalci.com/" header-includes: - \usepackage{pdflscape} - \newcommand{\blandscape}{\begin{landscape}} - \newcommand{\elandscape}{\end{landscape}} - \usepackage{xcolor} - \usepackage{afterpage} - \renewcommand{\linethickness}{0.05em} - \usepackage{booktabs} - \usepackage{sectsty} \allsectionsfont{\nohang\centering \emph} - \usepackage{float} - \usepackage{svg} always_allow_html: yes output: html_document: toc: yes toc_float: yes number_sections: yes fig_caption: yes keep_md: yes highlight: kate theme: readable code_folding: "hide" includes: after_body: _footer.html css: css/style.css prettydoc::html_pretty: theme: leonids highlight: vignette toc: true number_sections: yes css: css/style.css includes: after_body: _footer.html rmarkdown::html_vignette: css: - !expr system.file("rmarkdown/templates/html_vignette/resources/vignette.css", package = "rmarkdown") redoc::redoc: highlight_outputs: TRUE margins: 1 line_numbers: FALSE distill::distill_article: toc: true pdf_document: fig_caption: yes highlight: kate number_sections: yes toc: yes latex_engine: lualatex toc_depth: 5 keep_tex: yes includes: in_header: highlight_echo.tex vignette: > %\VignetteIndexEntry{Histopathology Research Template} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} bibliography: bib/template.bib

h1{ text-align: center; } h2{ text-align: center; } h3{ text-align: center; } h4{ text-align: center; } h4.date{ text-align: center; }

$$

https://doi.org/10.5281/zenodo.3635430

https://osf.io/3tjfk/

Histopathology Research Template 🔬

Introduction

State the marker of interest, the study objectives, and hypotheses [@Knijn2015].^[From Table 1: Proposed items for reporting histopathology studies. Recommendations for reporting histopathology studies: a proposal Virchows Arch (2015) 466:611–615 DOI 10.1007/s00428-015-1762-3 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460276/]

Materials & Methods

Describe Materials and Methods as highlighted in [@Knijn2015].^[From Table 1: Proposed items for reporting histopathology studies. Recommendations for reporting histopathology studies: a proposal Virchows Arch (2015) 466:611–615 DOI 10.1007/s00428-015-1762-3 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460276/]

Describe patient characteristics, and inclusion and exclusion criteria
Describe treatment details
Describe the type of material used
Specify how expression of the biomarker was assessed
Describe the number of independent (blinded) scorers and how they scored
State the method of case selection, study design, origin of the cases, and time frame
Describe the end of the follow-up period and median follow-up time
Define all clinical endpoints examined
Specify all applied statistical methods
Describe how interactions with other clinical/pathological factors were analyzed

Codes for general settings.^[See childRmd/_01header.Rmd file for other general settings]

Setup global chunk settings^[Change echo = FALSE to hide codes after knitting and Change cache = TRUE to knit quickly. Change error=TRUE to continue rendering while errors are present.]

knitr::opts_chunk$set(
    eval = TRUE,
    echo = TRUE,
    fig.path = here::here("figs/"),
    message = FALSE,
    warning = FALSE,
    error = TRUE,
    cache = TRUE,
    comment = NA,
    tidy = TRUE,
    fig.width = 6,
    fig.height = 4
)

library(knitr)
hook_output = knit_hooks$get("output")
knit_hooks$set(output = function(x, options) {
    # this hook is used only when the linewidth option is not NULL
    if (!is.null(n <- options$linewidth)) {
        x = knitr:::split_lines(x)
        # any lines wider than n should be wrapped
        if (any(nchar(x) > n)) 
            x = strwrap(x, width = n)
        x = paste(x, collapse = "\n")
    }
    hook_output(x, options)
})

# linewidth css
  pre:not([class]) {
    color: #333333;
    background-color: #cccccc;
  }

# linewidth css pre:not([class]) { color: #333333; background-color: #cccccc; }

# linewidth css

pre.jamovitable{
  color:black;
  background-color: white;
  margin-bottom: 35px;  
}

pre.jamovitable{ color:black; background-color: white; margin-bottom: 35px; }

jtable <- function(jobject, digits = 3) {
    snames <- sapply(jobject$columns, function(a) a$title)
    asDF <- jobject$asDF
    tnames <- unlist(lapply(names(asDF), function(n) snames[[n]]))
    names(asDF) <- tnames
    kableExtra::kable(asDF, "html", table.attr = "class=\"jmv-results-table-table\"", 
        row.names = F, digits = 3)
}

Block rmdnote

Block rmdtip

Block warning

Load Library

see R/loadLibrary.R for the libraries loaded.

source(file = here::here("R", "loadLibrary.R"))

Codes for generating fake data.^[See childRmd/_02fakeData.Rmd file for other codes]

Generate Fake Data

This code generates a fake histopathological data. Some sources for fake data generation here^[Synthea The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures. BMC Med Inform Decis Mak 19, 44 (2019) doi:10.1186/s12911-019-0793-0] , here^[https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-019-0793-0] , here^[Synthetic Patient Generation] , here^[Basic Setup and Running] , here^[intelligent patient data generator (iPDG)] , here^[https://medium.com/free-code-camp/how-our-test-data-generator-makes-fake-data-look-real-ace01c5bde4a] , here^[https://forums.librehealth.io/t/demo-data-generation/203] , here^[https://mihin.org/services/patient-generator/] , and here^[lung, cancer, breast datası ile birleştir] .

Use this code to generate fake clinicopathologic data

source(file = here::here("R", "gc_fake_data.R"))

wakefield::table_heat(x = fakedata, palette = "Set1", flip = TRUE, print = TRUE)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/plot fake data-1.png)

Codes for importing data.^[See childRmd/_03importData.Rmd file for other codes]

Read the data

library(readxl)
mydata <- readxl::read_excel(here::here("data", "mydata.xlsx"))
# View(mydata) # Use to view data after importing

Add code for import multiple data purrr reduce

Codes for reporting general features.^[See childRmd/_04briefSummary.Rmd file for other codes]

Dataframe Report

# Dataframe report
mydata %>% dplyr::select(-contains("Date")) %>% report::report(.)

The data contains 250 observations of the following variables:
  - ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others (0 missing)
  - Name: 249 entries: Aceyn, n = 1; Adalaide, n = 1; Adidas, n = 1 and 246 others (1 missing)
  - Sex: 2 entries: Male, n = 127; Female, n = 122 (1 missing)
  - Age: Mean = 49.54, SD = 14.16, Median = , MAD = 17.79, range: [25, 73], Skewness = 0.00, Kurtosis = -1.15, 1 missing
  - Race: 7 entries: White, n = 158; Hispanic, n = 38; Black, n = 30 and 4 others (1 missing)
  - PreinvasiveComponent: 2 entries: Absent, n = 203; Present, n = 46 (1 missing)
  - LVI: 2 entries: Absent, n = 147; Present, n = 102 (1 missing)
  - PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing)
  - Death: 2 levels: FALSE (n = 83, 33.20%); TRUE (n = 166, 66.40%) and missing (n = 1, 0.40%)
  - Group: 2 entries: Treatment, n = 131; Control, n = 118 (1 missing)
  - Grade: 3 entries: 3, n = 109; 1, n = 78; 2, n = 62 (1 missing)
  - TStage: 4 entries: 4, n = 118; 3, n = 65; 2, n = 43 and 1 other (0 missing)
  - AntiX_intensity: Mean = 2.39, SD = 0.66, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.63, Kurtosis = -0.65, 1 missing
  - AntiY_intensity: Mean = 2.02, SD = 0.80, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.03, Kurtosis = -1.42, 1 missing
  - LymphNodeMetastasis: 2 entries: Absent, n = 144; Present, n = 105 (1 missing)
  - Valid: 2 levels: FALSE (n = 116, 46.40%); TRUE (n = 133, 53.20%) and missing (n = 1, 0.40%)
  - Smoker: 2 levels: FALSE (n = 130, 52.00%); TRUE (n = 119, 47.60%) and missing (n = 1, 0.40%)
  - Grade_Level: 3 entries: high, n = 109; low, n = 77; moderate, n = 63 (1 missing)
  - DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101 (0 missing)

mydata %>% explore::describe_tbl()

250 observations with 21 variables
18 variables containing missings (NA)
0 variables with no variance

div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}

**Always Respect Patient Privacy** - Health Information Privacy^[https://www.hhs.gov/hipaa/index.html] - Kişisel Verilerin Korunması^[[Kişisel verilerin kaydedilmesi ve kişisel verileri hukuka aykırı olarak verme veya ele geçirme Türk Ceza Kanunu'nun 135. ve 136. maddesi kapsamında bizim hukuk sistemimizde suç olarak tanımlanmıştır. Kişisel verilerin kaydedilmesi suçunun cezası 1 ila 3 yıl hapis cezasıdır. Suçun nitelikli hali ise, kamu görevlisi tarafından görevin verdiği yetkinin kötüye kullanılarak veya belirli bir meslek veya sanatın sağladığı kolaylıktan yararlanılarak işlenmesidir ki bu durumda suçun cezası 1.5 ile 4.5 yıl hapis cezası olacaktır.](https://barandogan.av.tr/blog/ceza-hukuku/kisisel-verilerin-ele-gecirilmesi-yayilmasi-baskasina-verilmesi-sucu.html)]

\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{

Always Respect Patient Privacy - Health Information Privacy^[https://www.hhs.gov/hipaa/index.html] - Kişisel Verilerin Korunması^[Kişisel verilerin kaydedilmesi ve kişisel verileri hukuka aykırı olarak verme veya ele geçirme Türk Ceza Kanunu'nun 135. ve 136. maddesi kapsamında bizim hukuk sistemimizde suç olarak tanımlanmıştır. Kişisel verilerin kaydedilmesi suçunun cezası 1 ila 3 yıl hapis cezasıdır. Suçun nitelikli hali ise, kamu görevlisi tarafından görevin verdiği yetkinin kötüye kullanılarak veya belirli bir meslek veya sanatın sağladığı kolaylıktan yararlanılarak işlenmesidir ki bu durumda suçun cezası 1.5 ile 4.5 yıl hapis cezası olacaktır.]

} }

Codes for defining variable types.^[See childRmd/_06variableTypes.Rmd file for other codes]

print column names as vector

dput(names(mydata))

c("ID", "Name", "Sex", "Age", "Race", "PreinvasiveComponent", 
"LVI", "PNI", "LastFollowUpDate", "Death", "Group", "Grade", 
"TStage", "AntiX_intensity", "AntiY_intensity", "LymphNodeMetastasis", 
"Valid", "Smoker", "Grade_Level", "SurgeryDate", "DeathTime")

Find ID and key columns to exclude from analysis

vctrs::vec_assert()

dplyr::all_equal()

arsenal::compare()

visdat::vis_compare()

See the code as function in R/find_key.R.

keycolumns <- mydata %>% sapply(., FUN = dataMaid::isKey) %>% tibble::as_tibble() %>% 
    dplyr::select(which(.[1, ] == TRUE)) %>% names()
keycolumns

[1] "ID"   "Name"

Get variable types

mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types()

# A tibble: 4 x 4
  type             cnt  pcnt col_name  
  <chr>          <int> <dbl> <list>    
1 character         11  57.9 <chr [11]>
2 logical            3  15.8 <chr [3]> 
3 numeric            3  15.8 <chr [3]> 
4 POSIXct POSIXt     2  10.5 <chr [2]>

mydata %>% dplyr::select(-keycolumns, -contains("Date")) %>% describer::describe() %>% 
    knitr::kable(format = "markdown")

|.column_name |.column_class |.column_type | .count_elements| .mean_value| .sd_value|.q0_value | .q25_value| .q50_value| .q75_value|.q100_value | |:--------------------|:-------------|:------------|---------------:|-----------:|----------:|:-------------|----------:|----------:|----------:|:-----------| |Sex |character |character | 250| NA| NA|Female | NA| NA| NA|Male | |Age |numeric |double | 250| 49.538153| 14.1595015|25 | 37| 49| 61|73 | |Race |character |character | 250| NA| NA|Asian | NA| NA| NA|White | |PreinvasiveComponent |character |character | 250| NA| NA|Absent | NA| NA| NA|Present | |LVI |character |character | 250| NA| NA|Absent | NA| NA| NA|Present | |PNI |character |character | 250| NA| NA|Absent | NA| NA| NA|Present | |Death |logical |logical | 250| NA| NA|FALSE | NA| NA| NA|TRUE | |Group |character |character | 250| NA| NA|Control | NA| NA| NA|Treatment | |Grade |character |character | 250| NA| NA|1 | NA| NA| NA|3 | |TStage |character |character | 250| NA| NA|1 | NA| NA| NA|4 | |AntiX_intensity |numeric |double | 250| 2.389558| 0.6636071|1 | 2| 2| 3|3 | |AntiY_intensity |numeric |double | 250| 2.016064| 0.7980211|1 | 1| 2| 3|3 | |LymphNodeMetastasis |character |character | 250| NA| NA|Absent | NA| NA| NA|Present | |Valid |logical |logical | 250| NA| NA|FALSE | NA| NA| NA|TRUE | |Smoker |logical |logical | 250| NA| NA|FALSE | NA| NA| NA|TRUE | |Grade_Level |character |character | 250| NA| NA|high | NA| NA| NA|moderate | |DeathTime |character |character | 250| NA| NA|MoreThan1Year | NA| NA| NA|Within1Year |

Plot variable types

mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>% inspectdf::show_plot()

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/variable type plot inspectdf-1.png)

# https://github.com/ropensci/visdat
# http://visdat.njtierney.com/articles/using_visdat.html
# https://cran.r-project.org/web/packages/visdat/index.html
# http://visdat.njtierney.com/

# visdat::vis_guess(mydata)

visdat::vis_dat(mydata)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/variable type plot visdat-1.png)

mydata %>% explore::explore_tbl()

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/variable type plot explore-1.png)

Find `character` variables

characterVariables <- mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>% 
    dplyr::filter(type == "character") %>% dplyr::select(col_name) %>% dplyr::pull() %>% 
    unlist()

characterVariables

 [1] "Sex"                  "Race"                 "PreinvasiveComponent"
 [4] "LVI"                  "PNI"                  "Group"               
 [7] "Grade"                "TStage"               "LymphNodeMetastasis" 
[10] "Grade_Level"          "DeathTime"

Find `categorical` variables

categoricalVariables <- mydata %>% dplyr::select(-keycolumns, -contains("Date")) %>% 
    describer::describe() %>% janitor::clean_names() %>% dplyr::filter(column_type == 
    "factor") %>% dplyr::select(column_name) %>% dplyr::pull()

categoricalVariables

character(0)

Find `continious` variables

continiousVariables <- mydata %>% dplyr::select(-keycolumns, -contains("Date")) %>% 
    describer::describe() %>% janitor::clean_names() %>% dplyr::filter(column_type == 
    "numeric" | column_type == "double") %>% dplyr::select(column_name) %>% dplyr::pull()

continiousVariables

[1] "Age"             "AntiX_intensity" "AntiY_intensity"

Find `numeric` variables

numericVariables <- mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>% 
    dplyr::filter(type == "numeric") %>% dplyr::select(col_name) %>% dplyr::pull() %>% 
    unlist()

numericVariables

[1] "Age"             "AntiX_intensity" "AntiY_intensity"

Find `integer` variables

integerVariables <- mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>% 
    dplyr::filter(type == "integer") %>% dplyr::select(col_name) %>% dplyr::pull() %>% 
    unlist()

integerVariables

NULL

Find `list` variables

listVariables <- mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>% 
    dplyr::filter(type == "list") %>% dplyr::select(col_name) %>% dplyr::pull() %>% 
    unlist()
listVariables

NULL

Find `date` variables

is_date <- function(x) inherits(x, c("POSIXct", "POSIXt"))

dateVariables <- names(which(sapply(mydata, FUN = is_date) == TRUE))
dateVariables

[1] "LastFollowUpDate" "SurgeryDate"

Codes for overviewing the data.^[See childRmd/_07overView.Rmd file for other codes]

View(mydata)

reactable::reactable(data = mydata, sortable = TRUE, resizable = TRUE, filterable = TRUE, 
    searchable = TRUE, pagination = TRUE, paginationType = "numbers", showPageSizeOptions = TRUE, 
    highlight = TRUE, striped = TRUE, outlined = TRUE, compact = TRUE, wrap = FALSE, 
    showSortIcon = TRUE, showSortable = TRUE)

Summary of Data via summarytools 📦

summarytools::view(summarytools::dfSummary(mydata %>% dplyr::select(-keycolumns)))

if (!dir.exists(here::here("out"))) {
    dir.create(here::here("out"))
}

summarytools::view(x = summarytools::dfSummary(mydata %>% dplyr::select(-keycolumns)), 
    file = here::here("out", "mydata_summary.html"))

Summary via dataMaid 📦

if (!dir.exists(here::here("out"))) {
    dir.create(here::here("out"))
}

dataMaid::makeDataReport(data = mydata, file = here::here("out", "dataMaid_mydata.Rmd"), 
    replace = TRUE, openResult = FALSE, render = FALSE, quiet = TRUE)

Summary via explore 📦

if (!dir.exists(here::here("out"))) {
    dir.create(here::here("out"))
}

mydata %>% dplyr::select(-dateVariables) %>% explore::report(output_file = "mydata_report.html", 
    output_dir = here::here("out"))

Glimpse of Data

dplyr::glimpse(mydata %>% dplyr::select(-keycolumns, -dateVariables))

Observations: 250
Variables: 17
$ Sex                  <chr> "Female", "Female", "Female", "Female", "Male", …
$ Age                  <dbl> 30, 32, 53, 57, 47, 58, 59, 54, 35, 27, 53, 55, …
$ Race                 <chr> "White", "White", "White", "Hispanic", "White", …
$ PreinvasiveComponent <chr> "Absent", "Absent", "Absent", "Absent", "Absent"…
$ LVI                  <chr> "Present", "Absent", "Absent", "Present", "Absen…
$ PNI                  <chr> "Absent", "Absent", "Absent", "Present", "Absent…
$ Death                <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRU…
$ Group                <chr> "Control", "Control", "Control", "Control", "Con…
$ Grade                <chr> "1", "1", "2", "1", "2", "2", "3", "1", "2", "1"…
$ TStage               <chr> "4", "4", "3", "3", "1", "3", "3", "3", "4", "4"…
$ AntiX_intensity      <dbl> 2, 2, 2, 2, 3, 1, 1, 3, 2, 3, 2, 3, 1, 3, 1, 2, …
$ AntiY_intensity      <dbl> 2, 2, 2, 3, 2, 1, 2, 3, 3, 1, 1, 2, 1, 3, 1, 2, …
$ LymphNodeMetastasis  <chr> "Present", "Absent", "Present", "Present", "Pres…
$ Valid                <lgl> TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRU…
$ Smoker               <lgl> TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TR…
$ Grade_Level          <chr> "moderate", "moderate", "high", "low", "high", "…
$ DeathTime            <chr> "Within1Year", "Within1Year", "Within1Year", "Wi…

mydata %>% explore::describe()

# A tibble: 21 x 8
   variable             type     na na_pct unique   min  mean   max
   <chr>                <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>
 1 ID                   chr       0    0      250    NA NA       NA
 2 Name                 chr       1    0.4    250    NA NA       NA
 3 Sex                  chr       1    0.4      3    NA NA       NA
 4 Age                  dbl       1    0.4     50    25 49.5     73
 5 Race                 chr       1    0.4      8    NA NA       NA
 6 PreinvasiveComponent chr       1    0.4      3    NA NA       NA
 7 LVI                  chr       1    0.4      3    NA NA       NA
 8 PNI                  chr       1    0.4      3    NA NA       NA
 9 LastFollowUpDate     dat       1    0.4     13    NA NA       NA
10 Death                lgl       1    0.4      3     0  0.67     1
# … with 11 more rows

Explore

explore::explore(mydata)

Control Data if matching expectations

visdat::vis_expect(data = mydata, expectation = ~.x == -1, show_perc = TRUE)

visdat::vis_expect(mydata, ~.x >= 25)

See missing values

visdat::vis_miss(airquality, cluster = TRUE)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/missing values visdat-1.png)

visdat::vis_miss(airquality, sort_miss = TRUE)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/missing values visdat 2-1.png)

xray::anomalies(mydata)

$variables
               Variable   q qNA  pNA qZero pZero qBlank pBlank qInf pInf
1                Smoker 250   1 0.4%   130   52%      0      -    0    -
2                 Valid 250   1 0.4%   116 46.4%      0      -    0    -
3                 Death 250   1 0.4%    83 33.2%      0      -    0    -
4                   Sex 250   1 0.4%     0     -      0      -    0    -
5  PreinvasiveComponent 250   1 0.4%     0     -      0      -    0    -
6                   LVI 250   1 0.4%     0     -      0      -    0    -
7                   PNI 250   1 0.4%     0     -      0      -    0    -
8                 Group 250   1 0.4%     0     -      0      -    0    -
9   LymphNodeMetastasis 250   1 0.4%     0     -      0      -    0    -
10                Grade 250   1 0.4%     0     -      0      -    0    -
11      AntiX_intensity 250   1 0.4%     0     -      0      -    0    -
12      AntiY_intensity 250   1 0.4%     0     -      0      -    0    -
13          Grade_Level 250   1 0.4%     0     -      0      -    0    -
14                 Race 250   1 0.4%     0     -      0      -    0    -
15     LastFollowUpDate 250   1 0.4%     0     -      0      -    0    -
16                  Age 250   1 0.4%     0     -      0      -    0    -
17          SurgeryDate 250   1 0.4%     0     -      0      -    0    -
18                 Name 250   1 0.4%     0     -      0      -    0    -
19            DeathTime 250   0    -     0     -      0      -    0    -
20               TStage 250   0    -     0     -      0      -    0    -
21                   ID 250   0    -     0     -      0      -    0    -
   qDistinct      type anomalous_percent
1          3   Logical             52.4%
2          3   Logical             46.8%
3          3   Logical             33.6%
4          3 Character              0.4%
5          3 Character              0.4%
6          3 Character              0.4%
7          3 Character              0.4%
8          3 Character              0.4%
9          3 Character              0.4%
10         4 Character              0.4%
11         4   Numeric              0.4%
12         4   Numeric              0.4%
13         4 Character              0.4%
14         8 Character              0.4%
15        13 Timestamp              0.4%
16        50   Numeric              0.4%
17       233 Timestamp              0.4%
18       250 Character              0.4%
19         2 Character                 -
20         4 Character                 -
21       250 Character                 -

$problem_variables
 [1] Variable          q                 qNA               pNA              
 [5] qZero             pZero             qBlank            pBlank           
 [9] qInf              pInf              qDistinct         type             
[13] anomalous_percent problems         
<0 rows> (or 0-length row.names)

xray::distributions(mydata)

================================================================================

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-2.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-3.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-4.png)

[1] "Ignoring variable LastFollowUpDate: Unsupported type for visualization."

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-5.png)

[1] "Ignoring variable SurgeryDate: Unsupported type for visualization."

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-6.png)

         Variable p_1 p_10 p_25 p_50 p_75 p_90 p_99
1 AntiX_intensity   1  1.8    2    2    3    3    3
2 AntiY_intensity   1    1    1    2    3    3    3
3             Age  25 30.8   37   49   61   70   73

Summary of Data via DataExplorer 📦

DataExplorer::plot_str(mydata)

DataExplorer::plot_str(mydata, type = "r")

DataExplorer::introduce(mydata)

# A tibble: 1 x 9
   rows columns discrete_columns continuous_colu… all_missing_col…
  <int>   <int>            <int>            <int>            <int>
1   250      21               18                3                0
# … with 4 more variables: total_missing_values <int>, complete_rows <int>,
#   total_observations <int>, memory_usage <dbl>

DataExplorer::plot_intro(mydata)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 4-1.png)

DataExplorer::plot_missing(mydata)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 5-1.png)

Drop columns

mydata2 <- DataExplorer::drop_columns(mydata, "TStage")

DataExplorer::plot_bar(mydata)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 7-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 7-2.png)

DataExplorer::plot_bar(mydata, with = "Death")

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 8-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 8-2.png)

DataExplorer::plot_histogram(mydata)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 9-1.png)

Statistical Analysis

Learn these tests as highlighted in [@Schmidt2017].^[Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty. Archives of Pathology & Laboratory Medicine: February 2017, Vol. 141, No. 2, pp. 279-287. https://doi.org/10.5858/arpa.2016-0200-OA]

Results

Write results as described in [@Knijn2015]^[From Table 1: Proposed items for reporting histopathology studies. Recommendations for reporting histopathology studies: a proposal Virchows Arch (2015) 466:611–615 DOI 10.1007/s00428-015-1762-3 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460276/]

Describe the number of patients included in the analysis and reason for dropout
Report patient/disease characteristics (including the biomarker of interest) with the number of missing values
Describe the interaction of the biomarker of interest with established prognostic variables
Include at least 90 % of initial cases included in univariate and multivariate analyses
Report the estimated effect (relative risk/odds ratio, confidence interval, and p value) in univariate analysis
Report the estimated effect (hazard rate/odds ratio, confidence interval, and p value) in multivariate analysis
Report the estimated effects (hazard ratio/odds ratio, confidence interval, and p value) of other prognostic factors included in multivariate analysis

Codes for generating data dictionary.^[See childRmd/_08dataDictionary.Rmd file for other codes]

Codes for clean and recode data.^[See childRmd/_09cleanRecode.Rmd file for other codes]

questionr::irec()

questionr::iorder()

questionr::icut()

iris %>% mutate(sumVar = rowSums(.[1:4]))

iris %>% mutate(sumVar = rowSums(select(., contains("Sepal")))) %>% head

iris %>% mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>% head

iRenameColumn.R

iSelectColumn.R

<= 22 Low
>= 23 & <= 41 Average 
>=42 High

Codes for missing data and impute.^[See childRmd/_10impute.Rmd file for other codes]

Multiple imputation support in Finalfit https://www.datasurg.net/2019/09/25/multiple-imputation-support-in-finalfit/
Missing data https://finalfit.org/articles/missing.html

Plot missing data

visdat::vis_miss(mydata)

\pagebreak

Codes for Descriptive Statistics.^[See childRmd/_11descriptives.Rmd file for other codes]

Report Data properties via report 📦

mydata %>% dplyr::select(-dplyr::contains("Date")) %>% report::report()

The data contains 250 observations of the following variables:
  - ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others (0 missing)
  - Name: 249 entries: Aceyn, n = 1; Adalaide, n = 1; Adidas, n = 1 and 246 others (1 missing)
  - Sex: 2 entries: Male, n = 127; Female, n = 122 (1 missing)
  - Age: Mean = 49.54, SD = 14.16, Median = , MAD = 17.79, range: [25, 73], Skewness = 0.00, Kurtosis = -1.15, 1 missing
  - Race: 7 entries: White, n = 158; Hispanic, n = 38; Black, n = 30 and 4 others (1 missing)
  - PreinvasiveComponent: 2 entries: Absent, n = 203; Present, n = 46 (1 missing)
  - LVI: 2 entries: Absent, n = 147; Present, n = 102 (1 missing)
  - PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing)
  - Death: 2 levels: FALSE (n = 83, 33.20%); TRUE (n = 166, 66.40%) and missing (n = 1, 0.40%)
  - Group: 2 entries: Treatment, n = 131; Control, n = 118 (1 missing)
  - Grade: 3 entries: 3, n = 109; 1, n = 78; 2, n = 62 (1 missing)
  - TStage: 4 entries: 4, n = 118; 3, n = 65; 2, n = 43 and 1 other (0 missing)
  - AntiX_intensity: Mean = 2.39, SD = 0.66, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.63, Kurtosis = -0.65, 1 missing
  - AntiY_intensity: Mean = 2.02, SD = 0.80, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.03, Kurtosis = -1.42, 1 missing
  - LymphNodeMetastasis: 2 entries: Absent, n = 144; Present, n = 105 (1 missing)
  - Valid: 2 levels: FALSE (n = 116, 46.40%); TRUE (n = 133, 53.20%) and missing (n = 1, 0.40%)
  - Smoker: 2 levels: FALSE (n = 130, 52.00%); TRUE (n = 119, 47.60%) and missing (n = 1, 0.40%)
  - Grade_Level: 3 entries: high, n = 109; low, n = 77; moderate, n = 63 (1 missing)
  - DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101 (0 missing)

Table 1 via arsenal 📦

# cat(names(mydata), sep = " + \n")
library(arsenal)
tab1 <- arsenal::tableby(
  ~ Sex +
    Age +
    Race +
    PreinvasiveComponent +
    LVI +
    PNI +
    Death +
    Group +
    Grade +
    TStage +
    # `Anti-X-intensity` +
    # `Anti-Y-intensity` +
    LymphNodeMetastasis +
    Valid +
    Smoker +
    Grade_Level
  ,
  data = mydata 
)
summary(tab1)

| | Overall (N=250) | |:---------------------------|:---------------:| |Sex | | | N-Miss | 1 | | Female | 122 (49.0%) | | Male | 127 (51.0%) | |Age | | | N-Miss | 1 | | Mean (SD) | 49.538 (14.160) | | Range | 25.000 - 73.000 | |Race | | | N-Miss | 1 | | Asian | 15 (6.0%) | | Bi-Racial | 5 (2.0%) | | Black | 30 (12.0%) | | Hispanic | 38 (15.3%) | | Native | 2 (0.8%) | | Other | 1 (0.4%) | | White | 158 (63.5%) | |PreinvasiveComponent | | | N-Miss | 1 | | Absent | 203 (81.5%) | | Present | 46 (18.5%) | |LVI | | | N-Miss | 1 | | Absent | 147 (59.0%) | | Present | 102 (41.0%) | |PNI | | | N-Miss | 1 | | Absent | 171 (68.7%) | | Present | 78 (31.3%) | |Death | | | N-Miss | 1 | | FALSE | 83 (33.3%) | | TRUE | 166 (66.7%) | |Group | | | N-Miss | 1 | | Control | 118 (47.4%) | | Treatment | 131 (52.6%) | |Grade | | | N-Miss | 1 | | 1 | 78 (31.3%) | | 2 | 62 (24.9%) | | 3 | 109 (43.8%) | |TStage | | | 1 | 24 (9.6%) | | 2 | 43 (17.2%) | | 3 | 65 (26.0%) | | 4 | 118 (47.2%) | |LymphNodeMetastasis | | | N-Miss | 1 | | Absent | 144 (57.8%) | | Present | 105 (42.2%) | |Valid | | | N-Miss | 1 | | FALSE | 116 (46.6%) | | TRUE | 133 (53.4%) | |Smoker | | | N-Miss | 1 | | FALSE | 130 (52.2%) | | TRUE | 119 (47.8%) | |Grade_Level | | | N-Miss | 1 | | high | 109 (43.8%) | | low | 77 (30.9%) | | moderate | 63 (25.3%) |

Table 1 via tableone 📦

library(tableone)
mydata %>% dplyr::select(-keycolumns, -dateVariables) %>% tableone::CreateTableOne(data = .)


                                     Overall      
  n                                    250        
  Sex = Male (%)                       127 (51.0) 
  Age (mean (SD))                    49.54 (14.16)
  Race (%)                                        
     Asian                              15 ( 6.0) 
     Bi-Racial                           5 ( 2.0) 
     Black                              30 (12.0) 
     Hispanic                           38 (15.3) 
     Native                              2 ( 0.8) 
     Other                               1 ( 0.4) 
     White                             158 (63.5) 
  PreinvasiveComponent = Present (%)    46 (18.5) 
  LVI = Present (%)                    102 (41.0) 
  PNI = Present (%)                     78 (31.3) 
  Death = TRUE (%)                     166 (66.7) 
  Group = Treatment (%)                131 (52.6) 
  Grade (%)                                       
     1                                  78 (31.3) 
     2                                  62 (24.9) 
     3                                 109 (43.8) 
  TStage (%)                                      
     1                                  24 ( 9.6) 
     2                                  43 (17.2) 
     3                                  65 (26.0) 
     4                                 118 (47.2) 
  AntiX_intensity (mean (SD))         2.39 (0.66) 
  AntiY_intensity (mean (SD))         2.02 (0.80) 
  LymphNodeMetastasis = Present (%)    105 (42.2) 
  Valid = TRUE (%)                     133 (53.4) 
  Smoker = TRUE (%)                    119 (47.8) 
  Grade_Level (%)                                 
     high                              109 (43.8) 
     low                                77 (30.9) 
     moderate                           63 (25.3) 
  DeathTime = Within1Year (%)          149 (59.6)

Descriptive Statistics of Continuous Variables

mydata %>% dplyr::select(continiousVariables, numericVariables, integerVariables) %>% 
    summarytools::descr(., style = "rmarkdown")

print(summarytools::descr(mydata), method = "render", table.classes = "st-small")

mydata %>% summarytools::descr(., stats = "common", transpose = TRUE, headings = FALSE)

mydata %>% summarytools::descr(stats = "common") %>% summarytools::tb()

mydata$Sex %>% summarytools::freq(cumul = FALSE, report.nas = FALSE) %>% summarytools::tb()

mydata %>% explore::describe() %>% dplyr::filter(unique < 5)

# A tibble: 15 x 8
   variable             type     na na_pct unique   min  mean   max
   <chr>                <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>
 1 Sex                  chr       1    0.4      3    NA NA       NA
 2 PreinvasiveComponent chr       1    0.4      3    NA NA       NA
 3 LVI                  chr       1    0.4      3    NA NA       NA
 4 PNI                  chr       1    0.4      3    NA NA       NA
 5 Death                lgl       1    0.4      3     0  0.67     1
 6 Group                chr       1    0.4      3    NA NA       NA
 7 Grade                chr       1    0.4      4    NA NA       NA
 8 TStage               chr       0    0        4    NA NA       NA
 9 AntiX_intensity      dbl       1    0.4      4     1  2.39     3
10 AntiY_intensity      dbl       1    0.4      4     1  2.02     3
11 LymphNodeMetastasis  chr       1    0.4      3    NA NA       NA
12 Valid                lgl       1    0.4      3     0  0.53     1
13 Smoker               lgl       1    0.4      3     0  0.48     1
14 Grade_Level          chr       1    0.4      4    NA NA       NA
15 DeathTime            chr       0    0        2    NA NA       NA

mydata %>% explore::describe() %>% dplyr::filter(na > 0)

# A tibble: 18 x 8
   variable             type     na na_pct unique   min  mean   max
   <chr>                <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>
 1 Name                 chr       1    0.4    250    NA NA       NA
 2 Sex                  chr       1    0.4      3    NA NA       NA
 3 Age                  dbl       1    0.4     50    25 49.5     73
 4 Race                 chr       1    0.4      8    NA NA       NA
 5 PreinvasiveComponent chr       1    0.4      3    NA NA       NA
 6 LVI                  chr       1    0.4      3    NA NA       NA
 7 PNI                  chr       1    0.4      3    NA NA       NA
 8 LastFollowUpDate     dat       1    0.4     13    NA NA       NA
 9 Death                lgl       1    0.4      3     0  0.67     1
10 Group                chr       1    0.4      3    NA NA       NA
11 Grade                chr       1    0.4      4    NA NA       NA
12 AntiX_intensity      dbl       1    0.4      4     1  2.39     3
13 AntiY_intensity      dbl       1    0.4      4     1  2.02     3
14 LymphNodeMetastasis  chr       1    0.4      3    NA NA       NA
15 Valid                lgl       1    0.4      3     0  0.53     1
16 Smoker               lgl       1    0.4      3     0  0.48     1
17 Grade_Level          chr       1    0.4      4    NA NA       NA
18 SurgeryDate          dat       1    0.4    233    NA NA       NA

mydata %>% explore::describe()

# A tibble: 21 x 8
   variable             type     na na_pct unique   min  mean   max
   <chr>                <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>
 1 ID                   chr       0    0      250    NA NA       NA
 2 Name                 chr       1    0.4    250    NA NA       NA
 3 Sex                  chr       1    0.4      3    NA NA       NA
 4 Age                  dbl       1    0.4     50    25 49.5     73
 5 Race                 chr       1    0.4      8    NA NA       NA
 6 PreinvasiveComponent chr       1    0.4      3    NA NA       NA
 7 LVI                  chr       1    0.4      3    NA NA       NA
 8 PNI                  chr       1    0.4      3    NA NA       NA
 9 LastFollowUpDate     dat       1    0.4     13    NA NA       NA
10 Death                lgl       1    0.4      3     0  0.67     1
# … with 11 more rows

Use R/gc_desc_cat.R to generate gc_desc_cat.Rmd containing descriptive statistics for categorical variables

source(here::here("R", "gc_desc_cat.R"))

Descriptive Statistics Sex

mydata %>% janitor::tabyl(Sex) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

Sex n percent valid_percent

Female 122 48.8% 49.0% Male 127 50.8% 51.0% NA 1 0.4% -

\pagebreak

Descriptive Statistics Race

mydata %>% janitor::tabyl(Race) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

Race n percent valid_percent

Asian 15 6.0% 6.0% Bi-Racial 5 2.0% 2.0% Black 30 12.0% 12.0% Hispanic 38 15.2% 15.3% Native 2 0.8% 0.8% Other 1 0.4% 0.4% White 158 63.2% 63.5% NA 1 0.4% -

\pagebreak

Descriptive Statistics PreinvasiveComponent

mydata %>% janitor::tabyl(PreinvasiveComponent) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

PreinvasiveComponent n percent valid_percent

Absent 203 81.2% 81.5% Present 46 18.4% 18.5% NA 1 0.4% -

\pagebreak

Descriptive Statistics LVI

mydata %>% janitor::tabyl(LVI) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

LVI n percent valid_percent

Absent 147 58.8% 59.0% Present 102 40.8% 41.0% NA 1 0.4% -

\pagebreak

Descriptive Statistics PNI

mydata %>% janitor::tabyl(PNI) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

PNI n percent valid_percent

Absent 171 68.4% 68.7% Present 78 31.2% 31.3% NA 1 0.4% -

\pagebreak

Descriptive Statistics Group

mydata %>% janitor::tabyl(Group) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

Group n percent valid_percent

Control 118 47.2% 47.4% Treatment 131 52.4% 52.6% NA 1 0.4% -

\pagebreak

Descriptive Statistics Grade

mydata %>% janitor::tabyl(Grade) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

Grade n percent valid_percent

1 78 31.2% 31.3% 2 62 24.8% 24.9% 3 109 43.6% 43.8% NA 1 0.4% -

\pagebreak

Descriptive Statistics TStage

mydata %>% janitor::tabyl(TStage) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

TStage n percent

1 24 9.6% 2 43 17.2% 3 65 26.0% 4 118 47.2%

\pagebreak

Descriptive Statistics LymphNodeMetastasis

mydata %>% janitor::tabyl(LymphNodeMetastasis) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

LymphNodeMetastasis n percent valid_percent

Absent 144 57.6% 57.8% Present 105 42.0% 42.2% NA 1 0.4% -

\pagebreak

Descriptive Statistics Grade_Level

mydata %>% janitor::tabyl(Grade_Level) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

Grade_Level n percent valid_percent

high 109 43.6% 43.8% low 77 30.8% 30.9% moderate 63 25.2% 25.3% NA 1 0.4% -

\pagebreak

Descriptive Statistics DeathTime

mydata %>% janitor::tabyl(DeathTime) %>% janitor::adorn_pct_formatting(rounding = "half up", 
    digits = 1) %>% knitr::kable()

DeathTime n percent

MoreThan1Year 101 40.4% Within1Year 149 59.6%

\pagebreak

race_stats <- summarytools::freq(mydata$Race)
print(race_stats, report.nas = FALSE, totals = FALSE, display.type = FALSE, Variable.label = "Race Group")

mydata %>% explore::describe(PreinvasiveComponent)

variable = PreinvasiveComponent
type     = character
na       = 1 of 250 (0.4%)
unique   = 3
 Absent  = 203 (81.2%)
 Present = 46 (18.4%)
 NA      = 1 (0.4%)

## Frequency or custom tables for categorical variables
SmartEDA::ExpCTable(mydata, Target = NULL, margin = 1, clim = 10, nlim = 5, round = 2, 
    bin = NULL, per = T)

               Variable         Valid Frequency Percent CumPercent
1                   Sex        Female       122    48.8       48.8
2                   Sex          Male       127    50.8       99.6
3                   Sex            NA         1     0.4      100.0
4                   Sex         TOTAL       250      NA         NA
5                  Race         Asian        15     6.0        6.0
6                  Race     Bi-Racial         5     2.0        8.0
7                  Race         Black        30    12.0       20.0
8                  Race      Hispanic        38    15.2       35.2
9                  Race            NA         1     0.4       35.6
10                 Race        Native         2     0.8       36.4
11                 Race         Other         1     0.4       36.8
12                 Race         White       158    63.2      100.0
13                 Race         TOTAL       250      NA         NA
14 PreinvasiveComponent        Absent       203    81.2       81.2
15 PreinvasiveComponent            NA         1     0.4       81.6
16 PreinvasiveComponent       Present        46    18.4      100.0
17 PreinvasiveComponent         TOTAL       250      NA         NA
18                  LVI        Absent       147    58.8       58.8
19                  LVI            NA         1     0.4       59.2
20                  LVI       Present       102    40.8      100.0
21                  LVI         TOTAL       250      NA         NA
22                  PNI        Absent       171    68.4       68.4
23                  PNI            NA         1     0.4       68.8
24                  PNI       Present        78    31.2      100.0
25                  PNI         TOTAL       250      NA         NA
26                Group       Control       118    47.2       47.2
27                Group            NA         1     0.4       47.6
28                Group     Treatment       131    52.4      100.0
29                Group         TOTAL       250      NA         NA
30                Grade             1        78    31.2       31.2
31                Grade             2        62    24.8       56.0
32                Grade             3       109    43.6       99.6
33                Grade            NA         1     0.4      100.0
34                Grade         TOTAL       250      NA         NA
35               TStage             1        24     9.6        9.6
36               TStage             2        43    17.2       26.8
37               TStage             3        65    26.0       52.8
38               TStage             4       118    47.2      100.0
39               TStage         TOTAL       250      NA         NA
40  LymphNodeMetastasis        Absent       144    57.6       57.6
41  LymphNodeMetastasis            NA         1     0.4       58.0
42  LymphNodeMetastasis       Present       105    42.0      100.0
43  LymphNodeMetastasis         TOTAL       250      NA         NA
44          Grade_Level          high       109    43.6       43.6
45          Grade_Level           low        77    30.8       74.4
46          Grade_Level      moderate        63    25.2       99.6
47          Grade_Level            NA         1     0.4      100.0
48          Grade_Level         TOTAL       250      NA         NA
49            DeathTime MoreThan1Year       101    40.4       40.4
50            DeathTime   Within1Year       149    59.6      100.0
51            DeathTime         TOTAL       250      NA         NA
52      AntiX_intensity             1        25    10.0       10.0
53      AntiX_intensity             2       102    40.8       50.8
54      AntiX_intensity             3       122    48.8       99.6
55      AntiX_intensity            NA         1     0.4      100.0
56      AntiX_intensity         TOTAL       250      NA         NA
57      AntiY_intensity             1        77    30.8       30.8
58      AntiY_intensity             2        91    36.4       67.2
59      AntiY_intensity             3        81    32.4       99.6
60      AntiY_intensity            NA         1     0.4      100.0
61      AntiY_intensity         TOTAL       250      NA         NA

inspectdf::inspect_cat(mydata)

# A tibble: 16 x 5
   col_name               cnt common      common_pcnt levels            
   <chr>                <int> <chr>             <dbl> <named list>      
 1 Death                    3 TRUE               66.4 <tibble [3 × 3]>  
 2 DeathTime                2 Within1Year        59.6 <tibble [2 × 3]>  
 3 Grade                    4 3                  43.6 <tibble [4 × 3]>  
 4 Grade_Level              4 high               43.6 <tibble [4 × 3]>  
 5 Group                    3 Treatment          52.4 <tibble [3 × 3]>  
 6 ID                     250 001                 0.4 <tibble [250 × 3]>
 7 LVI                      3 Absent             58.8 <tibble [3 × 3]>  
 8 LymphNodeMetastasis      3 Absent             57.6 <tibble [3 × 3]>  
 9 Name                   250 Aceyn               0.4 <tibble [250 × 3]>
10 PNI                      3 Absent             68.4 <tibble [3 × 3]>  
11 PreinvasiveComponent     3 Absent             81.2 <tibble [3 × 3]>  
12 Race                     8 White              63.2 <tibble [8 × 3]>  
13 Sex                      3 Male               50.8 <tibble [3 × 3]>  
14 Smoker                   3 FALSE              52   <tibble [3 × 3]>  
15 TStage                   4 4                  47.2 <tibble [4 × 3]>  
16 Valid                    3 TRUE               53.2 <tibble [3 × 3]>

inspectdf::inspect_cat(mydata)$levels$Group

# A tibble: 3 x 3
  value      prop   cnt
  <chr>     <dbl> <int>
1 Treatment 0.524   131
2 Control   0.472   118
3 <NA>      0.004     1

Split-Group Stats Categorical

library(summarytools)

grouped_freqs <- stby(data = mydata$Smoker, INDICES = mydata$Sex, FUN = freq, cumul = FALSE, 
    report.nas = FALSE)

grouped_freqs %>% tb(order = 2)

Grouped Categorical

summarytools::stby(list(x = mydata$LVI, y = mydata$LymphNodeMetastasis), mydata$PNI, 
    summarytools::ctable)

with(mydata, summarytools::stby(list(x = LVI, y = LymphNodeMetastasis), PNI, summarytools::ctable))

mydata %>% dplyr::select(characterVariables) %>% dplyr::select(PreinvasiveComponent, 
    PNI, LVI) %>% reactable::reactable(data = ., groupBy = c("PreinvasiveComponent", 
    "PNI"), columns = list(LVI = reactable::colDef(aggregate = "count")))

\pagebreak

questionr:::icut()

source(here::here("R", "gc_desc_cont.R"))

Descriptive Statistics Age

mydata %>% jmv::descriptives(data = ., vars = "Age", hist = TRUE, dens = TRUE, box = TRUE, 
    violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE, skew = TRUE, 
    kurt = TRUE, quart = TRUE)


 DESCRIPTIVES

 Descriptives                       
 ────────────────────────────────── 
                          Age       
 ────────────────────────────────── 
   N                          249   
   Missing                      1   
   Mean                      49.5   
   Median                    49.0   
   Mode                      72.0   
   Standard deviation        14.2   
   Variance                   200   
   Minimum                   25.0   
   Maximum                   73.0   
   Skewness               0.00389   
   Std. error skewness      0.154   
   Kurtosis                 -1.15   
   Std. error kurtosis      0.307   
   25th percentile           37.0   
   50th percentile           49.0   
   75th percentile           61.0   
 ──────────────────────────────────

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics Age-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics Age-2.png)

\pagebreak

Descriptive Statistics AntiX_intensity

mydata %>% jmv::descriptives(data = ., vars = "AntiX_intensity", hist = TRUE, dens = TRUE, 
    box = TRUE, violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE, 
    skew = TRUE, kurt = TRUE, quart = TRUE)


 DESCRIPTIVES

 Descriptives                               
 ────────────────────────────────────────── 
                          AntiX_intensity   
 ────────────────────────────────────────── 
   N                                  249   
   Missing                              1   
   Mean                              2.39   
   Median                            2.00   
   Mode                              3.00   
   Standard deviation               0.664   
   Variance                         0.440   
   Minimum                           1.00   
   Maximum                           3.00   
   Skewness                        -0.631   
   Std. error skewness              0.154   
   Kurtosis                        -0.640   
   Std. error kurtosis              0.307   
   25th percentile                   2.00   
   50th percentile                   2.00   
   75th percentile                   3.00   
 ──────────────────────────────────────────

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics AntiX_intensity-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics AntiX_intensity-2.png)

\pagebreak

Descriptive Statistics AntiY_intensity

mydata %>% jmv::descriptives(data = ., vars = "AntiY_intensity", hist = TRUE, dens = TRUE, 
    box = TRUE, violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE, 
    skew = TRUE, kurt = TRUE, quart = TRUE)


 DESCRIPTIVES

 Descriptives                               
 ────────────────────────────────────────── 
                          AntiY_intensity   
 ────────────────────────────────────────── 
   N                                  249   
   Missing                              1   
   Mean                              2.02   
   Median                            2.00   
   Mode                              2.00   
   Standard deviation               0.798   
   Variance                         0.637   
   Minimum                           1.00   
   Maximum                           3.00   
   Skewness                       -0.0289   
   Std. error skewness              0.154   
   Kurtosis                         -1.43   
   Std. error kurtosis              0.307   
   25th percentile                   1.00   
   50th percentile                   2.00   
   75th percentile                   3.00   
 ──────────────────────────────────────────

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics AntiY_intensity-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics AntiY_intensity-2.png)

\pagebreak

tab <- tableone::CreateTableOne(data = mydata)
# ?print.ContTable
tab$ContTable


                              Overall      
  n                           250          
  Age (mean (SD))             49.54 (14.16)
  AntiX_intensity (mean (SD))  2.39 (0.66) 
  AntiY_intensity (mean (SD))  2.02 (0.80)

print(tab$ContTable, nonnormal = c("Anti-X-intensity"))


                              Overall      
  n                           250          
  Age (mean (SD))             49.54 (14.16)
  AntiX_intensity (mean (SD))  2.39 (0.66) 
  AntiY_intensity (mean (SD))  2.02 (0.80)

mydata %>% explore::describe(Age)

variable = Age
type     = double
na       = 1 of 250 (0.4%)
unique   = 50
min|max  = 25 | 73
q05|q95  = 28 | 72
q25|q75  = 37 | 61
median   = 49
mean     = 49.53815

mydata %>% dplyr::select(continiousVariables) %>% SmartEDA::ExpNumStat(data = ., 
    by = "A", gp = NULL, Qnt = seq(0, 1, 0.1), MesofShape = 2, Outlier = TRUE, round = 2)

inspectdf::inspect_num(mydata, breaks = 10)

# A tibble: 3 x 10
  col_name        min    q1 median  mean    q3   max     sd pcnt_na hist        
  <chr>         <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl> <named list>
1 Age              25    37     49 49.5     61    73 14.2       0.4 <tibble [12…
2 AntiX_intens…     1     2      2  2.39     3     3  0.664     0.4 <tibble [12…
3 AntiY_intens…     1     1      2  2.02     3     3  0.798     0.4 <tibble [12…

inspectdf::inspect_num(mydata)$hist$Age

# A tibble: 27 x 2
   value        prop
   <chr>       <dbl>
 1 [-Inf, 24) 0     
 2 [24, 26)   0.0201
 3 [26, 28)   0.0281
 4 [28, 30)   0.0361
 5 [30, 32)   0.0361
 6 [32, 34)   0.0602
 7 [34, 36)   0.0482
 8 [36, 38)   0.0241
 9 [38, 40)   0.0161
10 [40, 42)   0.0602
# … with 17 more rows

inspectdf::inspect_num(mydata, breaks = 10) %>% inspectdf::show_plot()

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/inspectdf 5-1.png)

Split-Group Stats Continious

grouped_descr <- summarytools::stby(data = mydata, INDICES = mydata$Sex, FUN = summarytools::descr, 
    stats = "common")
# grouped_descr %>% summarytools::tb(order = 2)
grouped_descr %>% summarytools::tb()

Grouped Continious

summarytools::stby(data = mydata, INDICES = mydata$PreinvasiveComponent, FUN = summarytools::descr, 
    stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE)

with(mydata, summarytools::stby(Age, PreinvasiveComponent, summarytools::descr), 
    stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE)

mydata %>% group_by(PreinvasiveComponent) %>% summarytools::descr(stats = "fivenum")

## Summary statistics by – category
SmartEDA::ExpNumStat(mydata, by = "GA", gp = "PreinvasiveComponent", Qnt = seq(0, 
    1, 0.1), MesofShape = 2, Outlier = TRUE, round = 2)

  Vname                        Group  TN nNeg nZero nPos NegInf PosInf NA_Value
1   Age     PreinvasiveComponent:All 250    0     0  249      0      0        1
2   Age  PreinvasiveComponent:Absent 203    0     0  203      0      0        0
3   Age PreinvasiveComponent:Present  46    0     0   45      0      0        1
4   Age      PreinvasiveComponent:NA   0    0     0    0      0      0        0
  Per_of_Missing   sum min  max  mean median    SD   CV  IQR Skewness Kurtosis
1           0.40 12335  25   73 49.54     49 14.16 0.29 24.0     0.00    -1.16
2           0.00 10117  25   73 49.84     51 14.34 0.29 23.5    -0.02    -1.20
3           2.17  2170  25   72 48.22     49 13.55 0.28 22.0     0.08    -0.98
4            NaN     0 Inf -Inf   NaN     NA    NA   NA   NA      NaN      NaN
  0%  10%  20%  30%  40% 50%  60%  70% 80%  90% 100% LB.25% UB.75% nOutliers
1 25 30.8 34.0 40.4 45.0  49 54.0 59.0  64 70.0   73   1.00  97.00         0
2 25 31.0 34.0 40.6 45.0  51 54.0 59.0  65 70.8   73   2.25  96.25         0
3 25 30.8 34.8 40.2 43.6  49 51.8 56.8  59 68.6   72   3.00  91.00         0
4 NA   NA   NA   NA   NA  NA   NA   NA  NA   NA   NA     NA     NA         0

\pagebreak

\newpage \blandscape

Codes for cross tables.^[See childRmd/_12crossTables.Rmd file for other codes]

library(finalfit)

# dependent <- c('dependent1', 'dependent2' )

# explanatory <- c('explanatory1', 'explanatory2' )

dependent <- "PreinvasiveComponent"

explanatory <- c("Sex", "Age", "Grade", "TStage")

Change column = TRUE argument to get row or column percentages.

source(here::here("R", "gc_table_cross.R"))

Cross Table PreinvasiveComponent

mydata %>%
    summary_factorlist(dependent = 'PreinvasiveComponent', 
                       explanatory = explanatory,
                       # column = TRUE,
                       total_col = TRUE,
                       p = TRUE,
                       add_dependent_label = TRUE,
                       na_include=FALSE
                       # catTest = catTestfisher
                       ) -> table

knitr::kable(table, row.names = FALSE, align = c('l', 'l', 'r', 'r', 'r'))

Dependent: PreinvasiveComponent Absent Present Total p

Sex Female 104 (51.2) 17 (37.8) 121 (48.8) 0.102 Male 99 (48.8) 28 (62.2) 127 (51.2) Age Mean (SD) 49.8 (14.3) 48.2 (13.6) 49.5 (14.2) 0.492 Grade 1 68 (33.7) 9 (19.6) 77 (31.0) 0.100 2 46 (22.8) 16 (34.8) 62 (25.0) 3 88 (43.6) 21 (45.7) 109 (44.0) TStage 1 18 (8.9) 6 (13.0) 24 (9.6) 0.117 2 38 (18.7) 4 (8.7) 42 (16.9) 3 48 (23.6) 17 (37.0) 65 (26.1) 4 99 (48.8) 19 (41.3) 118 (47.4)

\pagebreak

library(DT)
datatable(mtcars, rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=T) )

rmngb

RVAideMemoire

\newpage \blandscape

\elandscape

Codes for generating Plots.^[See childRmd/_13plots.Rmd file for other codes]

R allows to build any type of interactive graphic. My favourite library is plotly that will turn any of your ggplot2 graphic interactive in one supplementary line of code. Try to hover points, to select a zone, to click on the legend.

library(ggplot2)
library(plotly)
library(gapminder)

p <- gapminder %>% filter(year == 1977) %>% ggplot(aes(gdpPercap, lifeExp, size = pop, 
    color = continent)) + geom_point() + scale_x_log10() + theme_bw()

ggplotly(p)

scales::show_col(colours(), cex_label = 0.35)

embedgist <- gistr::gist("https://gist.github.com/sbalci/834ebc154c0ffcb7d5899c42dd3ab75e") %>% 
    gistr::embed()

# https://stackoverflow.com/questions/43053375/weighted-sankey-alluvial-diagram-for-visualizing-discrete-and-continuous-panel/48133004

library(tidyr)
library(dplyr)
library(alluvial)
library(ggplot2)
library(forcats)

set.seed(42)
individual <- rep(LETTERS[1:10], each = 2)
timeperiod <- paste0("time_", rep(1:2, 10))
discretechoice <- factor(paste0("choice_", sample(letters[1:3], 20, replace = T)))
continuouschoice <- ceiling(runif(20, 0, 100))
d <- data.frame(individual, timeperiod, discretechoice, continuouschoice)

# stacked bar diagram of discrete choice by individual
g <- ggplot(data = d, aes(timeperiod, fill = fct_rev(discretechoice)))
g + geom_bar(position = "stack") + guides(fill = guide_legend(title = NULL))

# alluvial diagram of discrete choice by individual
d_alluvial <- d %>% select(individual, timeperiod, discretechoice) %>% spread(timeperiod, 
    discretechoice) %>% group_by(time_1, time_2) %>% summarize(count = n()) %>% ungroup()

Error in UseMethod("ungroup"): no applicable method for 'ungroup' applied to an object of class "list"

alluvial(select(d_alluvial, -count), freq = d_alluvial$count)

Error in log_select(.data, .fun = dplyr::select, .funname = "select", : object 'd_alluvial' not found

# stacked bar diagram of discrete choice, weighting by continuous choice
g + geom_bar(position = "stack", aes(weight = continuouschoice))

library(ggalluvial)
ggplot(data = d, aes(x = timeperiod, stratum = discretechoice, alluvium = individual, 
    y = continuouschoice)) + geom_stratum(aes(fill = discretechoice)) + geom_flow()

CD44changes <- mydata %>% dplyr::select(TumorCD44, TomurcukCD44, PeritumoralTomurcukGr4) %>% 
    dplyr::filter(complete.cases(.)) %>% dplyr::group_by(TumorCD44, TomurcukCD44, 
    PeritumoralTomurcukGr4) %>% dplyr::tally()

Error: Can't subset columns that don't exist.
[31mx[39m The column `TumorCD44` doesn't exist.

library(ggalluvial)

ggplot(data = CD44changes, aes(axis1 = TumorCD44, axis2 = TomurcukCD44, y = n)) + 
    scale_x_discrete(limits = c("TumorCD44", "TomurcukCD44"), expand = c(0.1, 0.05)) + 
    xlab("Tumor Tomurcuk") + geom_alluvium(aes(fill = PeritumoralTomurcukGr4, colour = PeritumoralTomurcukGr4)) + 
    geom_stratum(alpha = 0.5) + geom_text(stat = "stratum", infer.label = TRUE) + 
    # geom_text(stat = 'alluvium', infer.label = TRUE) +
theme_minimal() + ggtitle("Changes in CD44")

Error in ggplot(data = CD44changes, aes(axis1 = TumorCD44, axis2 = TomurcukCD44, : object 'CD44changes' not found

Codes for generating paired tests.^[See childRmd/_14pairedTests.Rmd file for other codes]

Codes for generating hypothesis tests.^[See childRmd/_15hypothesisTests.Rmd file for other codes]

Hypothesis Tests

mytable <- jmv::ttestIS(formula = HindexCTLA4 ~ PeritumoralTomurcukGr4, data = mydata, 
    vars = HindexCTLA4, students = FALSE, mann = TRUE, norm = TRUE, meanDiff = TRUE, 
    desc = TRUE, plots = TRUE)

Error: Argument 'vars' contains 'HindexCTLA4' which is not present in the dataset

cat("<pre class='jamovitable'>")

wzxhzdk:162



wzxhzdk:163



wzxhzdk:164

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/t.test.html

t.test(mtcars$mpg ~ mtcars$am) %>% report::report()

report(t.test(iris$Sepal.Length, iris$Petal.Length))

Frequently Used Statistical Tests By Pathologists

Frequently Used Statistical Tests^[Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty. Archives of Pathology & Laboratory Medicine: February 2017, Vol. 141, No. 2, pp. 279-287. https://doi.org/10.5858/arpa.2016-0200-OA] by [@Schmidt2017]

Student t test
Regression/ANOVA
Chi-square test
Mann-Whitney test (rank sum)
Fisher exact test
Survival analysis
- Kaplan-Meier/log-rank
- Cox regression
Multiple comparison adjustment
- Tukey
- Bonferroni
- Newman-Keuls
Kappa Statistic
ROC analysis
Logistic regression
Spearman rank correlation
Kruskal-Wallis test
Pearson correlation statistic
Normality test
McNemar test

Consider Adding:

\newpage \blandscape

Codes for ROC.^[See childRmd/_16ROC.Rmd file for other codes]

ROC

Codes for Decision Tree.^[See childRmd/_17decisionTree.Rmd]

Decision Tree

Explore

explore::explore(mydata)

Codes for Survival Analysis^[See childRmd/_18survival.Rmd file for other codes, and childRmd/_19shinySurvival.Rmd for shiny application]

Survival analysis with strata, clusters, frailties and competing risks in in Finalfit

https://www.datasurg.net/2019/09/12/survival-analysis-with-strata-clusters-frailties-and-competing-risks-in-in-finalfit/

Intracranial WHO grade I meningioma: a competing risk analysis of progression and disease-specific survival

https://link.springer.com/article/10.1007/s00701-019-04096-9

Calculate survival time

mydata$int <- lubridate::interval(lubridate::ymd(mydata$SurgeryDate), lubridate::ymd(mydata$LastFollowUpDate))
mydata$OverallTime <- lubridate::time_length(mydata$int, "month")
mydata$OverallTime <- round(mydata$OverallTime, digits = 1)

recode death status outcome as numbers for survival analysis

## Recoding mydata$Death into mydata$Outcome
mydata$Outcome <- forcats::fct_recode(as.character(mydata$Death), `1` = "TRUE", `0` = "FALSE")
mydata$Outcome <- as.numeric(as.character(mydata$Outcome))

it is always a good practice to double-check after recoding^[JAMA retraction after miscoding – new Finalfit function to check recoding]

table(mydata$Death, mydata$Outcome)


          0   1
  FALSE  83   0
  TRUE    0 166

library(survival)
# data(lung) km <- with(lung, Surv(time, status))
km <- with(mydata, Surv(OverallTime, Outcome))
head(km, 80)

 [1]  4.5+  7.8   7.1   7.9  10.6   6.9+  8.4+ 11.0   3.5   7.6   8.4   6.0 
[13]   NA   9.5  11.2  11.7   9.2   7.6?  4.1   4.7   9.7+  8.3+  6.0+  5.5+
[25]  6.4  11.4   3.8+ 10.2   3.0   6.4  11.3   6.5+  9.7   6.7   3.3+ 11.2+
[37]  7.8   7.0   6.3  10.2   7.0  11.2   9.7+  6.8   3.1   3.6   7.8   9.5+
[49]  6.0  10.4+ 11.2+  3.3+  7.4   9.2+  9.9  11.2+ 10.0   5.4   9.5   5.4 
[61]  5.9   8.4   4.1   9.2   7.3+  6.6   7.0+  8.6+  4.0   4.1  10.7   4.7 
[73]  6.9   6.6   5.3   8.0   9.3   8.4+  8.6+  8.8

plot(km)

Kaplan-Meier Plot Log-Rank Test

# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html
dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "LVI"

mydata %>%
  finalfit::surv_plot(.data = .,
                      dependent = dependentKM,
                      explanatory = explanatoryKM,
                      xlab='Time (months)',
                      pval=TRUE,
                      legend = 'none',
                      break.time.by = 12,
                      xlim = c(0,60)
                      # legend.labs = c('a','b')
                      )

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Kaplan-Meier Plot Log-Rank Test-1.png)

# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html

mydata %>%
  finalfit::surv_plot(.data = .,
                      dependent = "Surv(OverallTime, Outcome)",
                      explanatory = "LVI",
                      xlab='Time (months)',
                      pval=TRUE,
                      legend = 'none',
                      break.time.by = 12,
                      xlim = c(0,60)
                      # legend.labs = c('a','b')
                      )

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Kaplan-Meier Plot Log-Rank Test 2-1.png)

library(finalfit)
library(survival)
explanatoryUni <- "LVI"
dependentUni <- "Surv(OverallTime, Outcome)"

tUni <- mydata %>% finalfit::finalfit(dependentUni, explanatoryUni)

knitr::kable(tUni[, 1:4], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))

Dependent: Surv(OverallTime, Outcome) all HR (univariable)

LVI Absent 147 (100.0) - Present 102 (100.0) 1.59 (1.15-2.20, p=0.005)

tUni_df <- tibble::as_tibble(tUni, .name_repair = "minimal") %>% janitor::clean_names()

tUni_df_descr <- paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1], 
    " is ", tUni_df$x[2], ", there is ", tUni_df$hr_univariable[2], " times risk than ", 
    "when ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1], 
    ".")

div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}

When LVI is Present, there is 1.59 (1.15-2.20, p=0.005) times risk than when LVI is Absent.

\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{

$ When LVI is Present, there is 1.59 (1.15-2.20, p=0.005) times risk than when LVI is Absent. $

} }

km_fit <- survfit(Surv(OverallTime, Outcome) ~ LVI, data = mydata)
km_fit

Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)

   4 observations deleted due to missingness 
              n events median 0.95LCL 0.95UCL
LVI=Absent  144    100   22.0    14.3    31.0
LVI=Present 102     64   10.5     9.9    13.8

plot(km_fit)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Median Survivals-1.png)

# summary(km_fit)

km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% janitor::clean_names() %>% 
    tibble::rownames_to_column()

km_fit_median_definition <- km_fit_median_df %>% dplyr::mutate(description = glue::glue("When {rowname}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.")) %>% 
    dplyr::select(description) %>% dplyr::pull()

div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}

When LVI=Absent, median survival is 22 [14.3 - 31, 95% CI] months., When LVI=Present, median survival is 10.5 [9.9 - 13.8, 95% CI] months.

\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{

When LVI=Absent, median survival is 22 [14.3 - 31, 95% CI] months., When LVI=Present, median survival is 10.5 [9.9 - 13.8, 95% CI] months.

} }

summary(km_fit, times = c(12, 36, 60))

Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)

4 observations deleted due to missingness 
                LVI=Absent 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   12     75      52    0.617  0.0421        0.539        0.705
   36     19      35    0.252  0.0452        0.177        0.358

                LVI=Present 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   12     23      49    0.383  0.0566       0.2870        0.512
   36      4      12    0.134  0.0488       0.0657        0.274

km_fit_summary <- summary(km_fit, times = c(12, 36, 60))

km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", 
    "surv", "std.err", "lower", "upper")])

km_fit_definition <- km_fit_df %>% dplyr::mutate(description = glue::glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>% 
    dplyr::select(description) %>% dplyr::pull()

div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}

When LVI=Absent, 12 month survival is 62% [54%-70.5%, 95% CI]., When LVI=Absent, 36 month survival is 25% [18%-35.8%, 95% CI]., When LVI=Present, 12 month survival is 38% [29%-51.2%, 95% CI]., When LVI=Present, 36 month survival is 13% [7%-27.4%, 95% CI].

\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{

When LVI=Absent, 12 month survival is 62% [54%-70.5%, 95% CI]., When LVI=Absent, 36 month survival is 25% [18%-35.8%, 95% CI]., When LVI=Present, 12 month survival is 38% [29%-51.2%, 95% CI]., When LVI=Present, 36 month survival is 13% [7%-27.4%, 95% CI].

} }

source(here::here("R", "gc_survival.R"))

Kaplan-Meier Plot Log-Rank Test

library(survival)
library(survminer)
library(finalfit)

mydata %>%
  finalfit::surv_plot('Surv(OverallTime, Outcome)', 'LVI', 
  xlab='Time (months)', pval=TRUE, legend = 'none',
    break.time.by = 12, xlim = c(0,60)

# legend.labs = c('a','b')

)

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Kaplan-Meier LVI-1.png)

Univariate Cox-Regression

explanatoryUni <- "LVI"
dependentUni <- "Surv(OverallTime, Outcome)"
tUni <- mydata %>% finalfit(dependentUni, explanatoryUni, metrics = TRUE)

knitr::kable(tUni[, 1:4], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))

Error in tUni[, 1:4]: incorrect number of dimensions

Univariate Cox-Regression Summary

tUni_df <- tibble::as_tibble(tUni, .name_repair = "minimal") %>% janitor::clean_names(dat = ., 
    case = "snake")


n_level <- dim(tUni_df)[1]

tUni_df_descr <- function(n) {
    paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[n + 
        1], ", there is ", tUni_df$hr_univariable[n + 1], " times risk than ", "when ", 
        tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1], ".")

}



results5 <- purrr::map(.x = c(2:n_level - 1), .f = tUni_df_descr)

print(unlist(results5))

[1] "When  is c(\"Absent\", \"Present\"), there is  times risk than when  is c(\"LVI\", \"\")."

\pagebreak

Median Survival

km_fit <- survfit(Surv(OverallTime, Outcome) ~ LVI, data = mydata)
km_fit

Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)

   4 observations deleted due to missingness 
              n events median 0.95LCL 0.95UCL
LVI=Absent  144    100   22.0    14.3    31.0
LVI=Present 102     64   10.5     9.9    13.8

km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% janitor::clean_names(dat = ., 
    case = "snake") %>% tibble::rownames_to_column(.data = ., var = "LVI")



km_fit_median_definition <- km_fit_median_df %>% dplyr::mutate(description = glue::glue("When, LVI, {LVI}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.")) %>% 
    dplyr::mutate(description = gsub(pattern = "thefactor=", replacement = " is ", 
        x = description)) %>% dplyr::select(description) %>% dplyr::pull()

km_fit_median_definition

When, LVI, LVI=Absent, median survival is 22 [14.3 - 31, 95% CI] months.
When, LVI, LVI=Present, median survival is 10.5 [9.9 - 13.8, 95% CI] months.

1-3-5-yr survival

summary(km_fit, times = c(12, 36, 60))

Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)

4 observations deleted due to missingness 
                LVI=Absent 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   12     75      52    0.617  0.0421        0.539        0.705
   36     19      35    0.252  0.0452        0.177        0.358

                LVI=Present 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   12     23      49    0.383  0.0566       0.2870        0.512
   36      4      12    0.134  0.0488       0.0657        0.274

km_fit_summary <- summary(km_fit, times = c(12, 36, 60))

km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", 
    "surv", "std.err", "lower", "upper")])

km_fit_df

       strata time n.risk n.event      surv    std.err      lower     upper
1  LVI=Absent   12     75      52 0.6165782 0.04211739 0.53931696 0.7049078
2  LVI=Absent   36     19      35 0.2520087 0.04515881 0.17737163 0.3580528
3 LVI=Present   12     23      49 0.3833784 0.05662684 0.28701265 0.5120993
4 LVI=Present   36      4      12 0.1340646 0.04881983 0.06566707 0.2737036

km_fit_definition <- km_fit_df %>% dplyr::mutate(description = glue::glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>% 
    dplyr::select(description) %>% dplyr::pull()

km_fit_definition

When LVI=Absent, 12 month survival is 62% [54%-70.5%, 95% CI].
When LVI=Absent, 36 month survival is 25% [18%-35.8%, 95% CI].
When LVI=Present, 12 month survival is 38% [29%-51.2%, 95% CI].
When LVI=Present, 36 month survival is 13% [7%-27.4%, 95% CI].

\pagebreak

summary(km_fit)$table

            records n.max n.start events   *rmean *se(rmean) median 0.95LCL
LVI=Absent      144   144     144    100 24.71341   1.571856   22.0    14.3
LVI=Present     102   102     102     64 17.48672   1.904576   10.5     9.9
            0.95UCL
LVI=Absent     31.0
LVI=Present    13.8

km_fit_median_df <- summary(km_fit)
results1html <- as.data.frame(km_fit_median_df$table) %>% janitor::clean_names(dat = ., 
    case = "snake") %>% tibble::rownames_to_column(.data = ., var = "LVI")

results1html[, 1] <- gsub(pattern = "thefactor=", replacement = "", x = results1html[, 
    1])

knitr::kable(results1html, row.names = FALSE, align = c("l", rep("r", 9)), format = "html", 
    digits = 1)

LVI records n_max n_start events rmean se_rmean median x0_95lcl x0_95ucl LVI=Absent 144 144 144 100 24.7 1.6 22.0 14.3 31.0 LVI=Present 102 102 102 64 17.5 1.9 10.5 9.9 13.8

\pagebreak

Pairwise Comparisons

\pagebreak

dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "TStage"

mydata %>%
  finalfit::surv_plot(.data = .,
                      dependent = dependentKM,
                      explanatory = explanatoryKM,
                      xlab='Time (months)',
                      pval=TRUE,
                      legend = 'none',
                      break.time.by = 12,
                      xlim = c(0,60)
                      # legend.labs = c('a','b')
                      )

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Kaplan-Meier Plot Log-Rank Test TStage-1.png)

km_fit

Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)

   4 observations deleted due to missingness 
              n events median 0.95LCL 0.95UCL
LVI=Absent  144    100   22.0    14.3    31.0
LVI=Present 102     64   10.5     9.9    13.8

print(km_fit, 
      scale=1,
      digits = max(options()$digits - 4,3),
      print.rmean=getOption("survfit.print.rmean"),
      rmean = getOption('survfit.rmean'),
      print.median=getOption("survfit.print.median"),
      median = getOption('survfit.median')

      )

Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)

   4 observations deleted due to missingness 
              n events median 0.95LCL 0.95UCL
LVI=Absent  144    100   22.0    14.3    31.0
LVI=Present 102     64   10.5     9.9    13.8

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% hr_plot(dependent, explanatory)

library(survival)
library(survminer)
library(finalfit)
mb_followup %>%
  finalfit::surv_plot('Surv(OverallTime, Outcome)', 'Operation', 
  xlab='Time (months)', pval=TRUE, legend = 'none',
  # pval.coord
    break.time.by = 12, xlim = c(0,60), ylim = c(0.8, 1)

# legend.labs = c('a','b')

)

Univariate Cox-Regression

explanatoryUni <- "Operation"
dependentUni <- "Surv(OverallTime, Outcome)"
tUni <- mb_followup %>% finalfit(dependentUni, explanatoryUni)

knitr::kable(tUni[, 1:4], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))

Univariate Cox-Regression Summary

tUni_df <- tibble::as_tibble(tUni, .name_repair = "minimal") %>% janitor::clean_names(dat = ., 
    case = "snake")


n_level <- dim(tUni_df)[1]

tUni_df_descr <- function(n) {
    paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[n + 
        1], ", there is ", tUni_df$hr_univariable[n + 1], " times risk than ", "when ", 
        tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1], ".")

}



results5 <- purrr::map(.x = c(2:n_level - 1), .f = tUni_df_descr)

print(unlist(results5))

\pagebreak

Median Survival

km_fit <- survfit(Surv(OverallTime, Outcome) ~ Operation, data = mb_followup)

# km_fit

# summary(km_fit)

km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% janitor::clean_names(dat = ., 
    case = "snake") %>% tibble::rownames_to_column(.data = ., var = "Derece")

km_fit_median_df

# km_fit_median_df %>% knitr::kable(format = 'latex') %>%
# kableExtra::kable_styling(latex_options='scale_down')

km_fit_median_definition <- km_fit_median_df %>% dplyr::mutate(description = glue::glue("When, Derece, {Derece}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.")) %>% 
    dplyr::mutate(description = gsub(pattern = "thefactor=", replacement = " is ", 
        x = description)) %>% dplyr::select(description) %>% dplyr::pull()

# km_fit_median_definition

1-3-5-yr survival

summary(km_fit, times = c(12, 36, 60))

km_fit_summary <- summary(km_fit, times = c(12, 36, 60))

km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", 
    "surv", "std.err", "lower", "upper")])

km_fit_df %>% knitr::kable(format = "latex") %>% kableExtra::kable_styling(latex_options = "scale_down")





km_fit_definition <- km_fit_df %>% dplyr::mutate(description = glue::glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>% 
    dplyr::select(description) %>% dplyr::pull()

km_fit_definition

\pagebreak

Pairwise Comparisons

survminer::pairwise_survdiff(formula = Surv(OverallTime, Outcome) ~ Operation, data = mb_followup, 
    p.adjust.method = "BH")

\pagebreak

library(gt)
library(gtsummary)

library(survival)
fit1 <- survfit(Surv(ttdeath, death) ~ trt, trial)
tbl_strata_ex1 <- tbl_survival(fit1, times = c(12, 24), label = "{time} Months")

fit2 <- survfit(Surv(ttdeath, death) ~ 1, trial)
tbl_nostrata_ex2 <- tbl_survival(fit2, probs = c(0.1, 0.2, 0.5), header_estimate = "**Months**")

Interactive Survival Analysis

Codes for generating Survival Analysis.^[See childRmd/_18survival.Rmd file for other codes]

Codes for generating Shiny Survival Analysis.^[See childRmd/_19shinySurvival.Rmd file for other codes]

\elandscape

Correlation

Codes for generating correlation analysis.^[See childRmd/_20correlation.Rmd file for other codes]

https://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.test.html

https://neuropsychology.github.io/psycho.R/2018/05/20/correlation.html

devtools::install_github("neuropsychology/psycho.R")  # Install the newest version

remove.packages("psycho")
renv::install("neuropsychology/psycho.R@0.4.0")
# devtools::install_github("neuropsychology/psycho.R@0.4.0")

library(psycho)
<!-- library(tidyverse) -->

cor <- psycho::affective %>% 
  correlation()

summary(cor)


plot(cor)


print(cor)

summary(cor) %>% 
  knitr::kable(format = "latex") %>% 
  kableExtra::kable_styling(latex_options="scale_down")


ggplot(mydata, aes(x = tx_zamani_verici_yasi, y = trombosit)) +
  geom_point() + 
  geom_smooth(method = lm, size = 1)

Models

Codes used in models^[See childRmd/_21models.Rmd file for other codes] Use these descriptions to add autoreporting of new models

generate automatic reporting of model via easystats/report 📦

library(report)
model <- lm(Sepal.Length ~ Species, data = iris)
report::report(model)

We fitted a linear model (estimated using OLS) to predict Sepal.Length with Species (formula = Sepal.Length ~ Species). Standardized parameters were obtained by fitting the model on a standardized version of the dataset. Effect sizes were labelled following Funder's (2019) recommendations.

The model explains a significant and substantial proportion of variance (R2 = 0.62, F(2, 147) = 119.26, p < .001, adj. R2 = 0.61). The model's intercept, corresponding to Sepal.Length = 0 and Species = setosa, is at 5.01 (SE = 0.07, 95% CI [4.86, 5.15], p < .001). Within this model:

  - The effect of Speciesversicolor is positive and can be considered as very large and significant (beta = 1.12, SE = 0.12, 95% CI [0.88, 1.37], std. beta = 1.12, p < .001).
  - The effect of Speciesvirginica is positive and can be considered as very large and significant (beta = 1.91, SE = 0.12, 95% CI [1.66, 2.16], std. beta = 1.91, p < .001).

Table report for a linear model

model <- lm(Sepal.Length ~ Petal.Length + Species, data=iris)
r <- report(model)
to_text(r)
to_table(r)

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html

model <- glm(vs ~ mpg + cyl, data=mtcars, family="binomial")
r <- report(model)

to_fulltext(r)
to_fulltable(r)

Where a multivariable model contains a subset of the variables specified in the full univariable set, this can be specified.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
dependent = 'mort_5yr'
colon_s %>%
  summarizer(dependent, explanatory, explanatory.multi)

Random effects.

e.g. lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial")

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
random.effect = "hospital"
dependent = 'mort_5yr'
colon_s %>%
  summarizer(dependent, explanatory, explanatory.multi, random.effect)

metrics=TRUE provides common model metrics.

colon_s %>%
  summarizer(dependent, explanatory, explanatory.multi,  metrics=TRUE)

Cox proportional hazards

e.g. survival::coxph(dependent ~ explanatory)

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"

colon_s %>%
    summarizer(dependent, explanatory)

Rather than going all-in-one, any number of subset models can be manually added on to a summary.factorlist() table using summarizer.merge(). This is particularly useful when models take a long-time to run or are complicated.

Note requirement for glm.id=TRUE. fit2df is a subfunction extracting most common models to a dataframe.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
random.effect = "hospital"
dependent = 'mort_5yr'

# Separate tables
colon_s %>%
  summary.factorlist(dependent, explanatory, glm.id=TRUE) -> example.summary

colon_s %>%
  glmuni(dependent, explanatory) %>%
  fit2df(estimate.suffix=" (univariable)") -> example.univariable

colon_s %>%
  glmmulti(dependent, explanatory) %>%
  fit2df(estimate.suffix=" (multivariable)") -> example.multivariable


colon_s %>%
  glmmixed(dependent, explanatory, random.effect) %>%
  fit2df(estimate.suffix=" (multilevel") -> example.multilevel

# Pipe together
example.summary %>%
  summarizer.merge(example.univariable) %>%
  summarizer.merge(example.multivariable) %>%
  summarizer.merge(example.multilevel) %>%
  select(-c(glm.id, index)) -> example.final
example.final

Cox Proportional Hazards example with separate tables merged together.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
dependent = "Surv(time, status)"

# Separate tables
colon_s %>%
    summary.factorlist(dependent, explanatory, glm.id=TRUE) -> example2.summary

colon_s %>%
    coxphuni(dependent, explanatory) %>%
    fit2df(estimate.suffix=" (univariable)") -> example2.univariable

colon_s %>%
  coxphmulti(dependent, explanatory.multi) %>%
  fit2df(estimate.suffix=" (multivariable)") -> example2.multivariable

# Pipe together
example2.summary %>%
    summarizer.merge(example2.univariable) %>%
    summarizer.merge(example2.multivariable) %>%
    select(-c(glm.id, index)) -> example2.final
example2.final







# OR plot
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  or.plot(dependent, explanatory)
# Previously fitted models (`glmmulti()` or `glmmixed()`) can be provided directly to `glmfit`

# HR plot (not fully tested)
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"
colon_s %>%
  hr.plot(dependent, explanatory, dependent_label = "Survival")
# Previously fitted models (`coxphmulti`) can be provided directly using `coxfit`

# Full report for a Bayesian logistic mixed model with effect sizes
library(rstanarm)

stan_glmer(vs ~ mpg + (1|cyl), data=mtcars, family="binomial") %>% 
  report(standardize="smart", effsize="cohen1988") %>% 
  to_fulltext()

https://github.com/lme4/lme4/

Test if your model is a good model

https://easystats.github.io/performance/

\pagebreak

div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}

Some Text ile sağkalım açısından bir ilişki bulunmamıştır (p = 0.22).

\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{

Some Text ile sağkalım açısından bir ilişki bulunmamıştır (p = 0.22).

} }

my_text <- kableExtra::text_spec("Some Text", color = "red", background = "yellow")
# `r my_text`

\pagebreak

div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}

\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{

} }

\pagebreak

div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}

Text Here

\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{

Text Here

} }

\pagebreak

\pagecolor{yellow}\afterpage{\nopagecolor}

\pagebreak

Since R Markdown use the [bootstrap framework](https://getbootstrap.com/docs/4.0/layout/grid/) under the hood. It is possible to benefit its powerful grid system. Basically, you can consider that your row is divided in 12 subunits of same width. You can then choose to use only a few of this subunits.

Here, I use 3 subunits of size 4 (4x3=12). The last column is used for a plot. You can read more about the grid system [here](bootstrap grid system). I got this result showing the following code in my R Markdown document.

![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/unnamed-chunk-78-1.png)

Tabs for sub-chapters {#buttons .tabset .tabset-fade .tabset-pills}

content of sub-chapter #1

content of sub-chapter #2

content of sub-chapter #3

Block rmdnote

Block rmdtip

Block warning

\pagebreak

Discussion

Interpret the results in context of the working hypothesis elaborated in the introduction and other relevant studies; include a discussion of limitations of the study.
Discuss potential clinical applications and implications for future research

\pagebreak

Footer

Codes for explaining the software and the packages that are used in the analysis^[See childRmd/_23footer.Rmd file for other codes]

projectName <- list.files(path = here::here(), pattern = "Rproj")
projectName <- gsub(pattern = ".Rproj", replacement = "", x = projectName)

analysisDate <- as.character(Sys.Date())

imageName <- paste0(projectName, analysisDate, ".RData")

save.image(file = here::here("data", imageName))

rdsName <- paste0(projectName, analysisDate, ".rds")

readr::write_rds(x = mydata, path = here::here("data", rdsName))

saveRDS(object = mydata, file = here::here("data", rdsName))

excelName <- paste0(projectName, analysisDate, ".xlsx")

rio::export(x = mydata, file = here::here("data", excelName), format = "xlsx")

# writexl::write_xlsx(mydata, here::here('data', excelName))

print(glue::glue("saved data after analysis to ", rownames(file.info(here::here("data", 
    excelName))), " : ", as.character(file.info(here::here("data", excelName))$ctime)))

saved data after analysis to /Users/serdarbalciold/histopathRprojects/histopathology-template/data/histopathology-template2020-02-26.xlsx : 2020-02-26 15:31:04

mydata %>% downloadthis::download_this(output_name = excelName, output_extension = ".csv", 
    button_label = "Download data as csv", button_type = "default")

Download data as csv

mydata %>% downloadthis::download_this(output_name = excelName, output_extension = ".xlsx", 
    button_label = "Download data as xlsx", button_type = "primary")

Download data as xlsx

\pagebreak

# use summarytools to generate final data summary
# summarytools::view(summarytools::dfSummary(x = mydata
#                                            , style = "markdown"))

\pagebreak

Why and how to cite software and packages?^[Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science 2:e86. DOI: 10.7717/peerj-cs.86 https://www.force11.org/software-citation-principles]

citation()


To cite R in publications use:

  R Core Team (2019). R: A language and environment for statistical
  computing. R Foundation for Statistical Computing, Vienna, Austria.
  URL https://www.R-project.org/.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2019},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.

The jamovi project (2019). jamovi. (Version 0.9) [Computer Software]. Retrieved from https://www.jamovi.org. R Core Team (2018). R: A Language and envionment for statistical computing. [Computer software]. Retrieved from https://cran.r-project.org/. Fox, J., & Weisberg, S. (2018). car: Companion to Applied Regression. [R package]. Retrieved from https://cran.r-project.org/package=car. Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686 Data processing was carried out with R (R Core Team, 2019) and the easystats ecosystem (Lüdecke, Waggoner, & Makowski, 2019; Makowski, Ben-Shachar, & Lüdecke, 2019)

report::cite_packages(session = sessionInfo())

Alastair Rushworth (2019). inspectdf: Inspection, Comparison and Visualisation of Data Frames. R package version 0.0.7. https://CRAN.R-project.org/package=inspectdf Alboukadel Kassambara (2020). ggpubr: 'ggplot2' Based Publication Ready Plots. R package version 0.2.5. https://CRAN.R-project.org/package=ggpubr Alboukadel Kassambara, Marcin Kosinski and Przemyslaw Biecek (2019). survminer: Drawing Survival Curves using 'ggplot2'. R package version 0.4.6. https://CRAN.R-project.org/package=survminer Benjamin Elbers (2020). tidylog: Logging for 'dplyr' and 'tidyr' Functions. R package version 1.0.0. https://CRAN.R-project.org/package=tidylog Boxuan Cui (2020). DataExplorer: Automate Data Exploration and Treatment. R package version 0.8.1. https://CRAN.R-project.org/package=DataExplorer Chung-hong Chan, Geoffrey CH Chan, Thomas J. Leeper, and Jason Becker (2018). rio: A Swiss-army knife for data file I/O. R package version 0.5.16. David Robinson and Alex Hayes (2020). broom: Convert Statistical Analysis Objects into Tidy Tibbles. R package version 0.5.4. https://CRAN.R-project.org/package=broom Dayanand Ubrangala, Kiran R, Ravi Prasad Kondapalli and Sayan Putatunda (2020). SmartEDA: Summarize and Explore the Data. R package version 0.3.3. https://CRAN.R-project.org/package=SmartEDA Dirk Eddelbuettel and Romain Francois (2011). Rcpp: Seamless R and C++ Integration. Journal of Statistical Software, 40(8), 1-18. URL http://www.jstatsoft.org/v40/i08/. Dirk Eddelbuettel with contributions by Antoine Lucas, Jarek Tuszynski, Henrik Bengtsson, Simon Urbanek, Mario Frasca, Bryan Lewis, Murray Stokely, Hannes Muehleisen, Duncan Murdoch, Jim Hester, Wush Wu, Qiang Kou, Thierry Onkelinx, Michel Lang, Viliam Simko, Kurt Hornik, Radford Neal, Kendon Bell, Matthew de Queljoe, Ion Suruceanu and Bill Denney. (2020). digest: Create Compact Hash Digests of R Objects. R package version 0.6.24. https://CRAN.R-project.org/package=digest Ethan Heinzen, Jason Sinnwell, Elizabeth Atkinson, Tina Gunderson and Gregory Dougherty (2020). arsenal: An Arsenal of 'R' Functions for Large-Scale Statistical Summaries. R package version 3.4.0. https://CRAN.R-project.org/package=arsenal Ewen Harrison, Tom Drake and Riinu Ots (2019). finalfit: Quickly Create Elegant Regression Results Tables and Plots when Modelling. R package version 0.9.7. https://CRAN.R-project.org/package=finalfit Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL http://www.jstatsoft.org/v40/i03/. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. Hadley Wickham (2019). feather: R Bindings to the Feather 'API'. R package version 0.3.5. https://CRAN.R-project.org/package=feather Hadley Wickham (2019). forcats: Tools for Working with Categorical Variables (Factors). R package version 0.4.0. https://CRAN.R-project.org/package=forcats Hadley Wickham (2019). httr: Tools for Working with URLs and HTTP. R package version 1.4.1. https://CRAN.R-project.org/package=httr Hadley Wickham (2019). modelr: Modelling Functions that Work with the Pipe. R package version 0.1.5. https://CRAN.R-project.org/package=modelr Hadley Wickham (2019). rvest: Easily Harvest (Scrape) Web Pages. R package version 0.3.5. https://CRAN.R-project.org/package=rvest Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. https://CRAN.R-project.org/package=stringr Hadley Wickham and Evan Miller (2019). haven: Import and Export 'SPSS', 'Stata' and 'SAS' Files. R package version 2.2.0. https://CRAN.R-project.org/package=haven Hadley Wickham and Jennifer Bryan (2019). readxl: Read Excel Files. R package version 1.3.1. https://CRAN.R-project.org/package=readxl Hadley Wickham and Lionel Henry (2020). tidyr: Tidy Messy Data. R package version 1.0.2. https://CRAN.R-project.org/package=tidyr Hadley Wickham and Yihui Xie (2019). evaluate: Parsing and Evaluation Tools that Provide More Details than the Default. R package version 0.14. https://CRAN.R-project.org/package=evaluate Hadley Wickham, Jim Hester and Jeroen Ooms (2019). xml2: Parse XML. R package version 1.2.2. https://CRAN.R-project.org/package=xml2 Hadley Wickham, Jim Hester and Romain Francois (2018). readr: Read Rectangular Text Data. R package version 1.3.1. https://CRAN.R-project.org/package=readr Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2020). dplyr: A Grammar of Data Manipulation. R package version 0.8.4. https://CRAN.R-project.org/package=dplyr Jeremy Stephens, Kirill Simonov, Yihui Xie, Zhuoer Dong, Hadley Wickham, Jeffrey Horner, reikoch, Will Beasley, Brendan O'Connor and Gregory R. Warnes (2020). yaml: Methods to Convert R Data to YAML and Back. R package version 2.2.1. https://CRAN.R-project.org/package=yaml Jeroen Ooms (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805 [stat.CO] URL https://arxiv.org/abs/1403.2805. Jim Hester (2019). glue: Interpreted String Literals. R package version 1.3.1. https://CRAN.R-project.org/package=glue Jim Hester and Gábor Csárdi (2019). pak: Another Approach to Package Installation. R package version 0.1.2. https://CRAN.R-project.org/package=pak Jim Hester, Gábor Csárdi, Hadley Wickham, Winston Chang, Martin Morgan and Dan Tenenbaum (2020). remotes: R Package Installation from Remote Repositories, Including 'GitHub'. R package version 2.1.1. https://CRAN.R-project.org/package=remotes JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2020). rmarkdown: Dynamic Documents for R. R package version 2.1. URL https://rmarkdown.rstudio.com. JJ Allaire, Jeffrey Horner, Yihui Xie, Vicent Marti and Natacha Porte (2019). markdown: Render Markdown with the C Library 'Sundown'. R package version 1.1. https://CRAN.R-project.org/package=markdown Kazuki Yoshida (2019). tableone: Create 'Table 1' to Describe Baseline Characteristics. R package version 0.10.0. https://CRAN.R-project.org/package=tableone Kevin Ushey (2020). renv: Project Environments. R package version 0.9.3. https://CRAN.R-project.org/package=renv Kirill Müller (2017). here: A Simpler Way to Find Your Files. R package version 0.1. https://CRAN.R-project.org/package=here Kirill Müller (2020). hms: Pretty Time of Day. R package version 0.5.3. https://CRAN.R-project.org/package=hms Kirill Müller and Hadley Wickham (2019). tibble: Simple Data Frames. R package version 2.1.3. https://CRAN.R-project.org/package=tibble Koji Makiyama (2016). magicfor: Magic Functions to Obtain Results from for Loops. R package version 0.1.0. https://CRAN.R-project.org/package=magicfor Lionel Henry and Hadley Wickham (2019). purrr: Functional Programming Tools. R package version 0.3.3. https://CRAN.R-project.org/package=purrr Lionel Henry and Hadley Wickham (2020). rlang: Functions for Base Types and Core R and 'Tidyverse' Features. R package version 0.4.4. https://CRAN.R-project.org/package=rlang Makowski, D. & Lüdecke, D. (2019). The report package for R: Ensuring the use of best practices for results reporting. CRAN. Available from https://github.com/easystats/report. doi: . Pablo Seibelt (2017). xray: X Ray Vision on your Datasets. R package version 0.2. https://CRAN.R-project.org/package=xray Paul Hendricks (2015). describer: Describe Data in R Using Common Descriptive Statistics. R package version 0.2.0. https://CRAN.R-project.org/package=describer Petersen AH, Ekstrøm CT (2019). "dataMaid: Your Assistant forDocumenting Supervised Data Quality Screening in R." Journal ofStatistical Software, 90(6), 1-38. doi: 10.18637/jss.v090.i06 (URL:https://doi.org/10.18637/jss.v090.i06). Rinker, T. W. (2018). wakefield: Generate Random Data. version 0.3.3. Buffalo, New York. https://github.com/trinker/wakefield Rinker, T. W. & Kurkiewicz, D. (2017). pacman: Package Management for R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman Roland Krasser (2020). explore: Simplifies Exploratory Data Analysis. R package version 0.5.4. https://CRAN.R-project.org/package=explore RStudio and Inc. (2019). htmltools: Tools for HTML. R package version 0.4.0. https://CRAN.R-project.org/package=htmltools Sam Firke (2020). janitor: Simple Tools for Examining and Cleaning Dirty Data. R package version 1.2.1. https://CRAN.R-project.org/package=janitor Simon Garnier (2018). viridis: Default Color Maps from 'matplotlib'. R package version 0.5.1. https://CRAN.R-project.org/package=viridis Simon Garnier (2018). viridisLite: Default Color Maps from 'matplotlib' (Lite Version). R package version 0.3.0. https://CRAN.R-project.org/package=viridisLite Simon Urbanek (2015). base64enc: Tools for base64 encoding. R package version 0.1-3. https://CRAN.R-project.org/package=base64enc Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr Therneau T (2015). A Package for Survival Analysis in S. version2.38, . Tierney N (2017). "visdat: Visualising Whole Data Frames." JOSS,2(16), 355. doi: 10.21105/joss.00355 (URL:https://doi.org/10.21105/joss.00355), . Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2019). shiny: Web Application Framework for R. R package version 1.4.0. https://CRAN.R-project.org/package=shiny Yihui Xie (2019). formatR: Format R Code Automatically. R package version 1.7. https://CRAN.R-project.org/package=formatR Yihui Xie (2020). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.28. Yihui Xie (2020). mime: Map Filenames to MIME Types. R package version 0.9. https://CRAN.R-project.org/package=mime Yixuan Qiu and Yihui Xie (2019). highr: Syntax Highlighting for R Source Code. R package version 0.8. https://CRAN.R-project.org/package=highr

report::show_packages(session = sessionInfo()) %>% kableExtra::kable()

# citation('tidyverse')
citation("readxl")


To cite package 'readxl' in publications use:

  Hadley Wickham and Jennifer Bryan (2019). readxl: Read Excel Files. R
  package version 1.3.1. https://CRAN.R-project.org/package=readxl

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {readxl: Read Excel Files},
    author = {Hadley Wickham and Jennifer Bryan},
    year = {2019},
    note = {R package version 1.3.1},
    url = {https://CRAN.R-project.org/package=readxl},
  }

citation("janitor")


To cite package 'janitor' in publications use:

  Sam Firke (2020). janitor: Simple Tools for Examining and Cleaning
  Dirty Data. R package version 1.2.1.
  https://CRAN.R-project.org/package=janitor

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {janitor: Simple Tools for Examining and Cleaning Dirty Data},
    author = {Sam Firke},
    year = {2020},
    note = {R package version 1.2.1},
    url = {https://CRAN.R-project.org/package=janitor},
  }

# citation('report')
citation("finalfit")


To cite package 'finalfit' in publications use:

  Ewen Harrison, Tom Drake and Riinu Ots (2019). finalfit: Quickly
  Create Elegant Regression Results Tables and Plots when Modelling. R
  package version 0.9.7. https://CRAN.R-project.org/package=finalfit

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {finalfit: Quickly Create Elegant Regression Results Tables and Plots when
Modelling},
    author = {Ewen Harrison and Tom Drake and Riinu Ots},
    year = {2019},
    note = {R package version 0.9.7},
    url = {https://CRAN.R-project.org/package=finalfit},
  }

# citation('ggstatsplot')

if (!dir.exists(here::here("bib"))) {
    dir.create(here::here("bib"))
}

knitr::write_bib(x = c(.packages(), "knitr", "shiny"), file = here::here("bib", "packages.bib"))

\pagebreak

sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] survminer_0.4.6    ggpubr_0.2.5       viridis_0.5.1      viridisLite_0.3.0 
 [5] shiny_1.4.0        survival_3.1-8     magrittr_1.5       report_0.1.0      
 [9] wakefield_0.3.3    SmartEDA_0.3.3     magicfor_0.1.0     tableone_0.10.0   
[13] arsenal_3.4.0      DataExplorer_0.8.1 xray_0.2           visdat_0.5.3      
[17] inspectdf_0.0.7    describer_0.2.0    dataMaid_1.4.0     finalfit_0.9.7    
[21] explore_0.5.4      rio_0.5.16         janitor_1.2.1      formatR_1.7       
[25] renv_0.9.3         rlang_0.4.4        glue_1.3.1         tidylog_1.0.0     
[29] broom_0.5.4        modelr_0.1.5       rvest_0.3.5        xml2_1.2.2        
[33] readxl_1.3.1       httr_1.4.1         haven_2.2.0        feather_0.3.5     
[37] lubridate_1.7.4    hms_0.5.3          forcats_0.4.0      stringr_1.4.0     
[41] tibble_2.1.3       purrr_0.3.3        readr_1.3.1        tidyr_1.0.2       
[45] dplyr_0.8.4        ggplot2_3.2.1      rmarkdown_2.1      mime_0.9          
[49] base64enc_0.1-3    jsonlite_1.6.1     knitr_1.28         htmltools_0.4.0   
[53] Rcpp_1.0.3         yaml_2.2.1         markdown_1.1       highr_0.8         
[57] digest_0.6.24      evaluate_0.14      here_0.1           pak_0.1.2         
[61] pacman_0.5.1       remotes_2.1.1     

loaded via a namespace (and not attached):
  [1] utf8_1.1.4          tidyselect_1.0.0    lme4_1.1-21        
  [4] htmlwidgets_1.5.1   grid_3.6.0          lpSolve_5.6.15     
  [7] munsell_0.5.0       effectsize_0.1.2    codetools_0.2-16   
 [10] DT_0.12.1           withr_2.1.2         ISLR_1.2           
 [13] colorspace_1.4-1    rstudioapi_0.11     robustbase_0.93-5  
 [16] ggsignif_0.6.0      labeling_0.3        KMsurv_0.1-5       
 [19] farver_2.0.3        rprojroot_1.3-2     vctrs_0.2.3        
 [22] generics_0.0.2      xfun_0.12           downloadthis_0.1.0 
 [25] R6_2.4.1            reshape_0.8.8       assertthat_0.2.1   
 [28] promises_1.1.0      networkD3_0.4       scales_1.1.0       
 [31] nnet_7.3-12         gtable_0.3.0        clisymbols_1.2.0   
 [34] whoami_1.3.0        splines_3.6.0       lazyeval_0.2.2     
 [37] acepack_1.4.1       bsplus_0.1.1        checkmate_2.0.0    
 [40] backports_1.1.5     httpuv_1.5.2        Hmisc_4.3-1        
 [43] tools_3.6.0         ellipsis_0.3.0      RColorBrewer_1.1-2 
 [46] plyr_1.8.5          jmvcore_1.2.5       progress_1.2.2     
 [49] prettyunits_1.1.1   rpart_4.1-15        sampling_2.8       
 [52] zoo_1.8-7           reactR_0.4.2        cluster_2.1.0      
 [55] fs_1.3.1            survey_3.37         data.table_1.12.8  
 [58] openxlsx_4.1.4      mitml_0.3-7         reactable_0.1.0    
 [61] xtable_1.8-4        jpeg_0.1-8.1        gridExtra_2.3      
 [64] compiler_3.6.0      mice_3.7.0          writexl_1.2        
 [67] crayon_1.3.4        minqa_1.2.4         later_1.0.0        
 [70] Formula_1.2-3       DBI_1.1.0           jmv_1.2.5          
 [73] MASS_7.3-51.5       boot_1.3-24         Matrix_1.2-18      
 [76] cli_2.0.1           mitools_2.4         parallel_3.6.0     
 [79] insight_0.8.1       pan_1.6             igraph_1.2.4.2     
 [82] pkgconfig_2.0.3     km.ci_0.5-2         foreign_0.8-75     
 [85] foreach_1.4.8       snakecase_0.11.0    parameters_0.5.0.1 
 [88] cellranger_1.1.0    survMisc_0.5.5      htmlTable_1.13.3   
 [91] curl_4.3            jomo_2.6-10         rjson_0.2.20       
 [94] nloptr_1.2.1        lifecycle_0.1.0     nlme_3.1-144       
 [97] fansi_0.4.1         labelled_2.2.2      pillar_1.4.3       
[100] ggsci_2.9           lattice_0.20-38     GGally_1.4.0       
[103] fastmap_1.0.1       DEoptimR_1.0-8      bayestestR_0.5.2   
[106] zip_2.0.4           png_0.1-7           iterators_1.0.12   
[109] pander_0.6.3        performance_0.4.4   class_7.3-15       
[112] stringi_1.4.6       ggfittext_0.8.1     latticeExtra_0.6-29
[115] e1071_1.7-3

\pagebreak

pacman::p_loaded(all = TRUE)

\pagebreak

Last update on $ 2020-05-13 15:20:11 $

Serdar Balci, MD, Pathologist serdarbalci@serdarbalci.com https://rpubs.com/sbalci/CV https://github.com/sbalci https://sbalci.github.io/ Patoloji Notları ParaPathology https://twitter.com/serdarbalci

\pagebreak

Use following chunk options to include all codes below the report.

{r, echo=TRUE, eval=FALSE, ref.label=knitr::all_labels()}

# installing necessary packages
if (requireNamespace("magrittr", quietly = TRUE)) {
  `%>%` <- magrittr::`%>%`
}
if (!require("remotes")) install.packages("remotes")
if (!require("pacman")) install.packages("pacman")
if (!require("pak")) install.packages("pak")
if (!require("here")) install.packages("here")
source_rmd <- function(rmd_file){
  knitr::knit(rmd_file, output = tempfile(), envir = globalenv())
}

list_of_Rmd <- list.files(path = here::here("childRmd"), pattern = "Rmd")

list_of_Rmd <- list_of_Rmd[!list_of_Rmd %in% c("_19shinySurvival.Rmd")]

purrr::map(.x = here::here("childRmd", list_of_Rmd), .f = source_rmd)

source(file = here::here("R", "force_git.R"))
knitr::opts_chunk$set(
    eval = TRUE,
    echo = TRUE,
    fig.path = here::here("figs/"),
    message = FALSE,
    warning = FALSE,
    error = TRUE,
    cache = TRUE,
    comment = NA,
    tidy = TRUE,
    fig.width = 6,
    fig.height = 4
)
library(knitr)
hook_output = knit_hooks$get('output')
knit_hooks$set(output = function(x, options) {
  # this hook is used only when the linewidth option is not NULL
  if (!is.null(n <- options$linewidth)) {
    x = knitr:::split_lines(x)
    # any lines wider than n should be wrapped
    if (any(nchar(x) > n)) x = strwrap(x, width = n)
    x = paste(x, collapse = '\n')
  }
  hook_output(x, options)
})
# linewidth css
  pre:not([class]) {
    color: #333333;
    background-color: #cccccc;
  }
# linewidth css
pre.jamovitable{
  color:black;
  background-color: white;
  margin-bottom: 35px;  
}
 jtable<-function(jobject,digits=3) {
  snames<-sapply(jobject$columns,function(a) a$title)
  asDF<-jobject$asDF
  tnames<-unlist(lapply(names(asDF) ,function(n) snames[[n]]))
  names(asDF)<-tnames
  kableExtra::kable(asDF,"html",
                    table.attr='class="jmv-results-table-table"',
                    row.names = F,
                    digits=3)
}
# https://cran.r-project.org/web/packages/exploreR/vignettes/exploreR.html
# exploreR::reset()
Block rmdnote

Block rmdtip

Block warning

source(file = here::here("R", "loadLibrary.R"))
source(file = here::here("R", "gc_fake_data.R"))
wakefield::table_heat(x = fakedata, palette = "Set1", flip = TRUE, print = TRUE)
library(readxl)
mydata <- readxl::read_excel(here::here("data", "mydata.xlsx"))
# View(mydata) # Use to view data after importing
# https://cran.r-project.org/web/packages/rio/vignettes/rio.html
# rio::install_formats()

x <- rio::import("mtcars.csv")
y <- rio::import("mtcars.rds")
z <- rio::import("mtcars.dta")

rio::import("mtcars_noext", format = "csv")

rio::export(mtcars, "mtcars.csv")
rio::export(mtcars, "mtcars.rds")
rio::export(mtcars, "mtcars.dta")

rio::export(list(mtcars = mtcars, iris = iris), "multi.xlsx")

# Dataframe report
mydata %>% 
  dplyr::select(-contains("Date")) %>%
  report::report(.)
mydata %>% explore::describe_tbl()
dput(names(mydata))
keycolumns <-  
    mydata %>%  
    sapply(., FUN = dataMaid::isKey) %>%  
    tibble::as_tibble() %>%  
    dplyr::select(  
        which(.[1, ] == TRUE)  
    ) %>%   
    names()  
keycolumns  
mydata %>% 
  dplyr::select(-keycolumns) %>% 
inspectdf::inspect_types()
mydata %>% 
    dplyr::select(-keycolumns,
           -contains("Date")) %>% 
  describer::describe() %>% 
  knitr::kable(format = "markdown")
mydata %>% 
    dplyr::select(-keycolumns) %>% 
  inspectdf::inspect_types() %>% 
  inspectdf::show_plot()
# https://github.com/ropensci/visdat
# http://visdat.njtierney.com/articles/using_visdat.html
# https://cran.r-project.org/web/packages/visdat/index.html
# http://visdat.njtierney.com/

# visdat::vis_guess(mydata)

visdat::vis_dat(mydata)
mydata %>% explore::explore_tbl()
mydata %>% 
    dplyr::select(-keycolumns) %>% 
    inspectdf::inspect_types() %>% 
    dplyr::filter(type == "character") %>% 
    dplyr::select(col_name) %>% 
    dplyr::pull() %>% 
    unlist() -> characterVariables

characterVariables
mydata %>%
    dplyr::select(-keycolumns,
                  -contains("Date")
                  ) %>%
  describer::describe() %>% 
    janitor::clean_names() %>% 
    dplyr::filter(column_type == "factor") %>% 
    dplyr::select(column_name) %>% 
    dplyr::pull() -> categoricalVariables

categoricalVariables
mydata %>%
    dplyr::select(-keycolumns,
                  -contains("Date")) %>%
  describer::describe() %>% 
    janitor::clean_names() %>% 
    dplyr::filter(column_type == "numeric" | column_type == "double") %>% 
    dplyr::select(column_name) %>% 
    dplyr::pull() -> continiousVariables

continiousVariables
mydata %>% 
    dplyr::select(-keycolumns) %>% 
inspectdf::inspect_types() %>% 
  dplyr::filter(type == "numeric") %>% 
  dplyr::select(col_name) %>% 
  dplyr::pull() %>% 
  unlist() -> numericVariables

numericVariables
mydata %>% 
    dplyr::select(-keycolumns) %>% 
inspectdf::inspect_types() %>% 
  dplyr::filter(type == "integer") %>% 
  dplyr::select(col_name) %>% 
  dplyr::pull() %>% 
  unlist() -> integerVariables

integerVariables
mydata %>% 
    dplyr::select(-keycolumns) %>% 
inspectdf::inspect_types() %>% 
  dplyr::filter(type == "list") %>% 
  dplyr::select(col_name) %>% 
  dplyr::pull() %>% 
  unlist() -> listVariables
listVariables
is_date <- function(x) inherits(x, c("POSIXct", "POSIXt"))

dateVariables <- 
names(which(sapply(mydata, FUN = is_date) == TRUE))
dateVariables
View(mydata)
reactable::reactable(data = mydata, sortable = TRUE, resizable = TRUE, filterable = TRUE, searchable = TRUE, pagination = TRUE, paginationType = "numbers", showPageSizeOptions = TRUE, highlight = TRUE, striped = TRUE, outlined = TRUE, compact = TRUE, wrap = FALSE, showSortIcon = TRUE, showSortable = TRUE)
summarytools::view(summarytools::dfSummary(mydata %>% dplyr::select(-keycolumns)))
if(!dir.exists(here::here("out"))) {dir.create(here::here("out"))}

summarytools::view(
  x = summarytools::dfSummary(
    mydata %>% 
      dplyr::select(-keycolumns)
    ),
  file = here::here("out", "mydata_summary.html")
)
if(!dir.exists(here::here("out"))) {dir.create(here::here("out"))}

dataMaid::makeDataReport(data = mydata, 
                         file = here::here("out", "dataMaid_mydata.Rmd"),
                         replace = TRUE,
                         openResult = FALSE, 
                         render = FALSE,
                         quiet = TRUE
                         )
if(!dir.exists(here::here("out"))) {dir.create(here::here("out"))}

mydata %>% 
  dplyr::select(
    -dateVariables
  ) %>% 
  explore::report(
    output_file = "mydata_report.html",
    output_dir = here::here("out") 
    )
dplyr::glimpse(mydata %>% dplyr::select(-keycolumns, -dateVariables))
mydata %>% explore::describe()
explore::explore(mydata)
mydata %>%
  explore::explore_all()
visdat::vis_expect(data = mydata,
                   expectation = ~.x == -1,
                   show_perc = TRUE)

visdat::vis_expect(mydata, ~.x >= 25)
visdat::vis_miss(airquality,
                 cluster = TRUE)
visdat::vis_miss(airquality,
         sort_miss = TRUE)
xray::anomalies(mydata)
xray::distributions(mydata)
DataExplorer::plot_str(mydata)
DataExplorer::plot_str(mydata, type = "r")
DataExplorer::introduce(mydata)
DataExplorer::plot_intro(mydata)
DataExplorer::plot_missing(mydata)
mydata2 <- DataExplorer::drop_columns(mydata, "TStage")
DataExplorer::plot_bar(mydata)
DataExplorer::plot_bar(mydata, with = "Death")
DataExplorer::plot_histogram(mydata)
if(!dir.exists(here::here("out"))) {dir.create(here::here("out"))}

# https://cran.r-project.org/web/packages/dataMaid/vignettes/extending_dataMaid.html
library("dataMaid")
dataMaid::makeDataReport(mydata,
  #add extra precheck function
  preChecks = c("isKey", "isSingular", "isSupported", "isID"),

  #Add the extra summaries - countZeros() for character, factor,
  #integer, labelled and numeric variables and meanSummary() for integer,
  #numeric and logical variables:
  summaries = setSummaries(
    character = defaultCharacterSummaries(add = "countZeros"),
    factor = defaultFactorSummaries(add = "countZeros"),
    labelled = defaultLabelledSummaries(add = "countZeros"),
    numeric = defaultNumericSummaries(add = c("countZeros", "meanSummary")),
    integer = defaultIntegerSummaries(add = c("countZeros", "meanSummary")),
    logical = defaultLogicalSummaries(add =  c("meanSummary"))
  ),

  #choose mosaicVisual() for categorical variables,
  #prettierHist() for all others:
  visuals = setVisuals(
    factor = "mosaicVisual",
    numeric = "prettierHist",
    integer = "prettierHist",
    Date = "prettierHist"
  ),

  #Add the new checkFunction, identifyColons, for character, factor and
  #labelled variables:
  checks = setChecks(
    character = defaultCharacterChecks(add = "identifyColons"),
    factor = defaultFactorChecks(add = "identifyColons"),
    labelled = defaultLabelledChecks(add = "identifyColons")
  ),

  #overwrite old versions of the report, render to html and don't
  #open the html file automatically:
  replace = TRUE,
  output = "html",
  open = FALSE,
  file = here::here("out/dataMaid_mydata.Rmd")
)
# https://cran.r-project.org/web/packages/summarytools/vignettes/Recommendations-rmarkdown.html
# https://github.com/dcomtois/summarytools
library(knitr)
opts_chunk$set(comment=NA,
               prompt=FALSE,
               cache=FALSE,
               echo=TRUE,
               results='asis' # add to individual summarytools chunks
               )
library(summarytools)
st_css()
st_options(bootstrap.css     = FALSE,       # Already part of the theme so no need for it
           plain.ascii       = FALSE,       # One of the essential settings
           style             = "rmarkdown", # Idem.
           dfSummary.silent  = TRUE,        # Suppresses messages about temporary files
           footnote          = NA,          # Keeping the results minimalistic
           subtitle.emphasis = FALSE)       # For the vignette theme, this gives
                                            # much better results. Your mileage may vary.
summarytools::freq(iris$Species, plain.ascii = FALSE, style = "rmarkdown")
summarytools::freq(iris$Species, report.nas = FALSE, headings = FALSE, cumul = TRUE, totals = TRUE)
summarytools::freq(tobacco$gender, style = 'rmarkdown')
summarytools::freq(tobacco[ ,c("gender", "age.gr", "smoker")])
print(freq(tobacco$gender), method = 'render')
view(dfSummary(iris))
dfSummary(tobacco, style = 'grid', graph.magnif = 0.75, tmp.img.dir = "/tmp")
dfSummary(tobacco, plain.ascii = FALSE, style = "grid",
          graph.magnif = 0.75, valid.col = FALSE, tmp.img.dir = "/tmp")
print(dfSummary(tobacco, graph.magnif = 0.75), method = 'render')
# https://github.com/rolkra/explore
# https://cran.r-project.org/web/packages/explore/vignettes/explore.html
# https://cran.r-project.org/web/packages/explore/vignettes/explore_mtcars.html


# library(dplyr)
# library(explore)

explore::explore(mydata)

# iris %>% report(output_file = "report.html", output_dir = here::here())


# iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
# iris %>%
#   report(output_file = "report.html",
#          output_dir = here::here(),
#          target = is_versicolor
# # , split = FALSE
# )
iris %>% explore::explore_tbl()

iris %>% explore::describe_tbl()

iris %>% explore::explore(Species)
iris %>% explore::explore(Sepal.Length)
iris %>% explore::explore(Sepal.Length, target = is_versicolor)
iris %>% explore::explore(Sepal.Length, target = is_versicolor, split = FALSE)
iris %>% explore::explore(Sepal.Length, target = Species)
iris %>% explore::explore(Sepal.Length, target = Petal.Length)


%>% %>%
  explore::explore_all()

iris %>%
  dplyr::select(Sepal.Length, Sepal.Width) %>%
  explore::explore_all()

iris %>%
  dplyr::select(Sepal.Length, Sepal.Width, is_versicolor) %>%
  explore::explore_all(target = is_versicolor)

iris %>%
  dplyr::select(Sepal.Length, Sepal.Width, is_versicolor) %>%
  explore::explore_all(target = is_versicolor, split = FALSE)

iris %>%
  dplyr::select(Sepal.Length, Sepal.Width, Species) %>%
  explore::explore_all(target = Species)

iris %>%
  dplyr::select(Sepal.Length, Sepal.Width, Petal.Length) %>%
  explore::explore_all(target = Petal.Length)
iris %>%
  explore::explore_all()

knitr::opts_current(fig.height=explore::total_fig_height(iris, target = Species))

explore::total_fig_height(iris, target = Species)

iris %>% explore::explore_all(target = Species)
iris %>% explore::explore(Sepal.Length, min_val = 4.5, max_val = 7)
iris %>% explore::explore(Sepal.Length, auto_scale = FALSE)
mtcars %>% explore::describe()
# https://cran.r-project.org/web/packages/dlookr/vignettes/EDA.html

dlookr::describe(mydata
                 # ,
                 # cols = c(statistic)
                 )

# dlookr::describe(carseats, Sales, CompPrice, Income)
# dlookr::describe(carseats, Sales:Income)
# dlookr::describe(carseats, -(Sales:Income))
mydata %>%
  dlookr::describe() %>%
  dplyr::select(variable, skewness, mean, p25, p50, p75) %>%
  dplyr::filter(!is.na(skewness)) %>%
  arrange(desc(abs(skewness)))
# https://cran.r-project.org/web/packages/dlookr/vignettes/EDA.html

carseats %>%
  dlookr::eda_report(target = Sales,
                     output_format = "pdf",
             output_file = "EDA.pdf"
             )
carseats %>%
  dlookr::eda_report(target = Sales,
             output_format = "html",
             output_file = "EDA.html"
             )
# install.packages("ISLR")
library("ISLR")
# install.packages("SmartEDA")
library("SmartEDA")
## Load sample dataset from ISLR pacakge
Carseats <- ISLR::Carseats

## overview of the data;
SmartEDA::ExpData(data=Carseats,type=1)
## structure of the data
SmartEDA::ExpData(data=Carseats,type=2)


# iris %>% explore::data_dict_md(output_dir = here::here())
# description <- data.frame(
#                   variable = c("Species"), 
#                   description = c("Species of Iris flower"))

# explore::data_dict_md(data = mydata, 
#              title = "Data Set", 
#              # description =  description, 
#              output_file = "data_dict.md",
#              output_dir = here::here("out"))
mydata <- janitor::clean_names(mydata)
# cat(names(mydata), sep = ",\n")
# names(mydata) <- c(names(mydata)[1:21], paste0("Soru", 1:30))
iris %>% 
  explore::clean_var(data = ., 
                     var = Sepal.Length,  
            min_val = 4.5, 
            max_val = 7.0, 
            na = 5.8, 
            name = "sepal_length") %>% 
  describe()
summarytools::view(summarytools::dfSummary(mydata))
dplyr::glimpse(mydata)
library(finalfit)
# https://www.datasurg.net/2019/10/15/jama-retraction-after-miscoding-new-finalfit-function-to-check-recoding/
# intentionally miscoded
colon_s %>%
  mutate(
    sex.factor2 = forcats::fct_recode(sex.factor,
      "F" = "Male",
      "M" = "Female")
  ) %>%
  count(sex.factor, sex.factor2)
# Install
# devtools::install_github('ewenharrison/finalfit')
library(finalfit)
library(dplyr)
# Recode example
colon_s_small = colon_s %>%
  select(-id, -rx, -rx.factor) %>%
  mutate(
    age.factor2 = forcats::fct_collapse(age.factor,
      "<60 years" = c("<40 years", "40-59 years")),
    sex.factor2 = forcats::fct_recode(sex.factor,
    # Intentional miscode
      "F" = "Male",
      "M" = "Female")
  )
# Check
colon_s_small %>%
  finalfit::check_recode()
out = colon_s_small %>%
  select(-extent, -extent.factor,-time, -time.years) %>% # choose to exclude variables
  check_recode(include_numerics = TRUE)
## Recoding mydata$cinsiyet into mydata$Cinsiyet
mydata$Cinsiyet <- recode(mydata$cinsiyet,
               "K" = "Kadin",
               "E" = "Erkek")
mydata$Cinsiyet <- factor(mydata$Cinsiyet)
## Recoding mydata$tumor_yerlesimi into mydata$TumorYerlesimi
mydata$TumorYerlesimi <- recode(mydata$tumor_yerlesimi,
               "proksimal" = "Proksimal",
               "distal" = "Distal",
               "yaygın" = "Yaygin",
               "gö bileşke" = "GEJ",
               "antrum" = "Antrum")
mydata$TumorYerlesimi <- factor(mydata$TumorYerlesimi)

## Reordering mydata$TumorYerlesimi
mydata$TumorYerlesimi <- factor(mydata$TumorYerlesimi, levels=c("GEJ", "Proksimal", "Antrum", "Distal", "Yaygin"))
## Recoding mydata$histolojik_alt_tip into mydata$HistolojikAltTip
mydata$HistolojikAltTip <- recode(mydata$histolojik_alt_tip,
               "medüller benzeri" = "meduller benzeri")
mydata$HistolojikAltTip <- factor(mydata$HistolojikAltTip)

## Recoding mydata$lauren_siniflamasi into mydata$Lauren
mydata$Lauren <- recode(mydata$lauren_siniflamasi,
               "diffüz" = "diffuse",
               "???" = "medullary")
mydata$Lauren <- factor(mydata$Lauren)

## Recoding mydata$histolojik_derece into mydata$Grade
mydata$Grade <- recode(mydata$histolojik_derece,
               "az diferansiye" = "az",
               "iyi diferansiye" = "iyi",
               "orta diferansiye" = "orta")
mydata$Grade <- factor(mydata$Grade)

## Reordering mydata$Grade
mydata$Grade <- factor(mydata$Grade, levels=c("iyi", "orta", "az"))
mydata$Tstage <- stringr::str_match(mydata$patolojik_evre, paste('(.+)', "N", sep=''))[,2]

mydata$Nstage <- paste0("N",
    stringr::str_match(mydata$patolojik_evre, paste( "N", '(.+)', "M", sep=''))[,2]
    )

mydata$Mstage <- paste0("M", 
    stringr::str_match(mydata$patolojik_evre, paste("M", '(.+)', sep=''))[,2]
)
mydata <- mydata %>% 
    dplyr::mutate(
        T_stage = dplyr::case_when(
            grepl(pattern = "T1", x = .$Tstage) == TRUE ~ "T1",
            grepl(pattern = "T2", x = .$Tstage) == TRUE ~ "T2",
            grepl(pattern = "T3", x = .$Tstage) == TRUE ~ "T3",
            grepl(pattern = "T4", x = .$Tstage) == TRUE ~ "T4",
            TRUE ~ "Tx"
        )
    ) %>% 
dplyr::mutate(
        N_stage = dplyr::case_when(
            grepl(pattern = "N0", x = .$Nstage) == TRUE ~ "N0",
            grepl(pattern = "N1", x = .$Nstage) == TRUE ~ "N1",
            grepl(pattern = "N2", x = .$Nstage) == TRUE ~ "N2",
            grepl(pattern = "N3", x = .$Nstage) == TRUE ~ "N3",
            TRUE ~ "Nx"
        )
    ) %>% 
dplyr::mutate(
        M_stage = dplyr::case_when(
            grepl(pattern = "M0", x = .$Mstage) == TRUE ~ "M0",
            grepl(pattern = "M1", x = .$Mstage) == TRUE ~ "M1",
            TRUE ~ "Mx"
        )
    )


## Recoding mydata$cd44_oran into mydata$CD44
mydata$CD44 <- recode(mydata$cd44_oran,
               "2" = "positive",
               "0" = "negative",
               "1" = "negative",
               "3" = "positive")
mydata$CD44 <- factor(mydata$CD44)

## Recoding mydata$her2_skor into mydata$Her2
mydata$Her2 <- recode(mydata$her2_skor,
               "+3" = "3",
               "+1" = "1",
               "+2" = "2")
mydata$Her2 <- factor(mydata$Her2)
## Reordering mydata$Her2
mydata$Her2 <- factor(mydata$Her2, levels=c("0", "1", "2", "3"))
## Recoding mydata$msi into mydata$MMR
mydata$MMR <- recode(mydata$msi,
               "MSS" = "pMMR",
               "MSİ(PMS2,MLH1)" = "dMMR(PMS2,MLH1)",
               "MSİ(MSH2,MSH6)" = "dMMR(MSH2,MSH6)",
               "MSİ(PMS2)" = "dMMR(PMS2)")
mydata$MMR <- factor(mydata$MMR)

## Recoding mydata$msi into mydata$MMR2
mydata$MMR2 <- recode(mydata$msi,
               "MSS" = "pMMR",
               "MSİ(PMS2,MLH1)" = "dMMR",
               "MSİ(MSH2,MSH6)" = "dMMR",
               "MSİ(PMS2)" = "dMMR")
mydata$MMR2 <- factor(mydata$MMR2)


mydata <- mydata %>% 
    dplyr::mutate(
TumorPDL1gr1 = dplyr::case_when(
        t_pdl1 < 1 ~ "kucuk1",
        t_pdl1 >= 1 ~ "buyukesit1"
    )
    ) %>% 
dplyr::mutate(
TumorPDL1gr5 = dplyr::case_when(
        t_pdl1 < 5 ~ "kucuk5",
        t_pdl1 >= 5 ~ "buyukesit5"
    )
    )   %>% 
dplyr::mutate(
inflPDL1gr1 = dplyr::case_when(
        i_pdl1 < 1 ~ "kucuk1",
        i_pdl1 >= 1 ~ "buyukesit1"
    )
    ) %>% 
dplyr::mutate(
inflPDL1gr5 = dplyr::case_when(
        i_pdl1 < 5 ~ "kucuk5",
        i_pdl1 >= 5 ~ "buyukesit5"
    )
    )

## Recoding mydata$lvi into mydata$LVI
mydata$LVI <- recode(mydata$lvi,
               "var" = "Var",
               "yok" = "Yok")
mydata$LVI <- factor(mydata$LVI)
## Reordering mydata$LVI
mydata$LVI <- factor(mydata$LVI, levels=c("Yok", "Var"))
## Recoding mydata$pni into mydata$PNI
mydata$PNI <- recode(mydata$pni,
               "var" = "Var",
               "yok" = "Yok")
mydata$PNI <- factor(mydata$PNI)
## Reordering mydata$PNI
mydata$PNI <- factor(mydata$PNI, levels=c("Yok", "Var"))
## Recoding mydata$ln into mydata$LenfNoduMetastazi
mydata$LenfNoduMetastazi <- recode(mydata$ln,
               "var" = "Var",
               "yok" = "Yok")
mydata$LenfNoduMetastazi <- factor(mydata$LenfNoduMetastazi)
## Reordering mydata$LenfNoduMetastazi
mydata$LenfNoduMetastazi <- factor(mydata$LenfNoduMetastazi, levels=c("Yok", "Var"))
mydata$sontarih <- janitor::excel_numeric_to_date(as.numeric(mydata$olum_tarihi))
mydata$Outcome <- "Dead"
mydata$Outcome[mydata$olum_tarihi == "yok"] <- "Alive"
# cat(names(mydata), sep = ",\n")

mydata <- mydata %>% 
    select(
# sira_no,
# no,
# x3,
# hasta_biyopsi_no,
# cinsiyet,
        Cinsiyet,
        Yas = hasta_yasi,
        TumorYerlesimi,
        TumorCapi = tumor_capi,
HistolojikAltTip,
Lauren,
Grade,
TNM = patolojik_evre,
Tstage,
T_stage,
Nstage,
N_stage,
Mstage,
M_stage,
CD44,
Her2,
MMR,
MMR2,
TumorPDL1gr1,
TumorPDL1gr5,
inflPDL1gr1,
inflPDL1gr5,
LVI,
PNI,
LenfNoduMetastazi,
Outcome,        
# tumor_yerlesimi,

# histolojik_alt_tip,
# lauren_siniflamasi,
# histolojik_derece,
# cd44_oran,
# cd44_intense,
# her2_skor,
# msi,
# t_pdl1,
# i_pdl1,
# lvi,
# pni,
# ln,
CerrahiTarih = cerrahi_tarih,
# olum_tarihi,
genel_sagkalim,
SonTarih = sontarih
    )

mydata <- janitor::clean_names(mydata)
# cat(names(mydata), sep = ",\n")

names(mydata) <- c(names(mydata)[1:21], paste0("Soru", 1:30))

library(arsenal)
tab1 <- tableby(~ katilim_durumu
                ,
                data = mydata
)
summary(tab1)
mydata <- mydata %>% 
  filter(katilim_durumu == "katılmış ve tamamlamış")
# summarytools::view(summarytools::dfSummary(mydata))
# dplyr::glimpse(mydata)
# mydata %>%
#   select(starts_with("Soru")) %>% 
#   pivot_longer(everything()) %>% 
#   select(value) %>% 
#   pull() %>% 
#   unique() %>% 
#   cat(sep = "\n")
## Recoding mydata$x3_yasiniz_nedir into mydata$YasGrup
mydata$YasGrup <- factor(mydata$x3_yasiniz_nedir)
## Reordering mydata$YasGrup
mydata$YasGrup <- factor(mydata$YasGrup, levels=c("20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89"))
## Recoding mydata$x4_cinsiyetiniz_nedir into mydata$Cinsiyet
mydata$Cinsiyet <- recode(mydata$x4_cinsiyetiniz_nedir,
               "Kadın" = "Kadin")
mydata$Cinsiyet <- factor(mydata$Cinsiyet)

## Recoding mydata$x5_kac_yildir_genel_cerrahi_uzmanisiniz into mydata$UzmanlikSuresi
mydata$UzmanlikSuresi <- recode(mydata$x5_kac_yildir_genel_cerrahi_uzmanisiniz,
               "43739" = "10-19")
mydata$UzmanlikSuresi <- factor(mydata$UzmanlikSuresi)

## Reordering mydata$UzmanlikSuresi
mydata$UzmanlikSuresi <- factor(mydata$UzmanlikSuresi, levels=c("0-9", "10-19", "20-29", "30-39", "40-49"))


## Recoding mydata$x6_unvaniniz_nedir into mydata$Unvan
mydata$Unvan <- factor(mydata$x6_unvaniniz_nedir)

## Reordering mydata$Unvan
mydata$Unvan <- factor(mydata$Unvan, levels=c("Op.Dr.", "Doktor Öğretim Üyesi", "Doç.Dr.", "Prof.Dr"))

## Recoding mydata$x8_hangi_kurumda_calisiyorsunuz into mydata$Kurum
mydata$Kurum <- recode(mydata$x8_hangi_kurumda_calisiyorsunuz,
               "Eğitim Araştırma Hastanesi" = "Eğitim Araştırma",
               "İlçe Devlet Hastanesi" = "İlçe Devlet",
               "Üniversite Hastanesi" = "Üniversite",
               "İl Devlet Hastanesi" = "İl Devlet",
               "Özel Hastane ve Kurumlar" = "Özel")
mydata$Kurum <- factor(mydata$Kurum)

## Reordering mydata$Kurum
mydata$Kurum <- factor(mydata$Kurum, levels=c("Özel", "İlçe Devlet", "İl Devlet", "Eğitim Araştırma", "Üniversite"))
tersSorular <- c("Soru1",
                 "Soru4",
                 "Soru15",
                 "Soru17",
                 "Soru29")

CSS <- c(
  "Soru3",
  "Soru6",
  "Soru12",
  "Soru16",
  "Soru18",
  "Soru20",
  "Soru22",
  "Soru24",
  "Soru27",
  "Soru30"
)


BS <- c(
  "Soru1",
  "Soru4",
  "Soru8",
  "Soru10",
  "Soru15",
  "Soru17",
  "Soru19",
  "Soru21",
  "Soru26",
  "Soru29"
)


STSS <- c(
  "Soru2",
  "Soru5",
  "Soru7",
  "Soru9",
  "Soru11",
  "Soru13",
  "Soru14",
  "Soru23",
  "Soru25",
  "Soru28"
)
recode_numberize <- function(x, ...) {
  dplyr::recode(
    x,
    "Bazı zamanlar" = 3,
    "Çoksık" = 5,
    "Hiçbir zaman" = 1,
    "Nadiren" = 2,
    "Sık sık" = 4,
    "Sıkça" = 4,
"Bazı zamanlarda" = 3,
"Çok sık" = 5,
"Sıksık" = 4
    )
}


mydata <- mydata %>% 
    mutate_at(.tbl = .,
              .vars = vars(starts_with("Soru"), -tersSorular),
              .funs = recode_numberize
    )



recode_numberize_ters <- function(x, ...) {
  recode(
    x,
    "Bazı zamanlar" = 3,
"Çoksık" = 1,
"Hiçbir zaman" = 5,
"Nadiren" = 4,
"Sık sık" = 2,
"Sıkça" = 2,
"Bazı zamanlarda" = 3,
"Çok sık" = 1,
"Sıksık" = 2
        )
}


mydata <- mydata %>% 
    mutate_at(.tbl = .,
              .vars = vars(tersSorular),
              .funs = recode_numberize
    )

mydata <- mydata %>% 
  # böyle yazınca missing olunca hesaplamıyor
  # mutate(
  #   CSS_total = rowSums(select(., CSS), na.rm = FALSE)
  # ) %>% 
  mutate(
    CSS_total = rowSums(select(., CSS), na.rm = TRUE)
  ) %>% 
mutate(
    BS_total = rowSums(select(., BS), na.rm = TRUE)
  ) %>% 
  mutate(
    STSS_total = rowSums(select(., STSS), na.rm = TRUE)
  )


mydata <- mydata %>% 
  naniar::replace_with_na_at(
    .vars = vars(ends_with("_total")),
    condition = ~.x == 0
    )

mydata <- mydata %>% 
  mutate_at(.tbl = .,
            .vars = vars(ends_with("_total")),
            .funs = list(Gr = 
                           ~ case_when(
                             . <= 22 ~ "Low",
                             . >= 23 & . <= 41 ~ "Average",
                             . >= 42 ~ "High",
                             TRUE ~ NA_character_
                           )
                           )

  ) %>% 
  mutate_at(.tbl = .,
            .vars = vars(ends_with("_Gr")),
            .funs = ~ factor(., levels=c("Low", "Average", "High"))
              )

# ## Reordering mydata$CSS_total_Gr
# mydata$CSS_total_Gr <- factor(mydata$CSS_total_Gr, )
# 
# ## Reordering mydata$BS_total_Gr
# mydata$BS_total_Gr <- factor(mydata$BS_total_Gr, levels=c("Low", "Average", "High"))
# 
# 
# ## Reordering mydata$STSS_total_Gr
# mydata$STSS_total_Gr <- factor(mydata$STSS_total_Gr, levels=c("Low", "Average", "High"))

visdat::vis_miss(mydata)
visdat::vis_miss(airquality,
                 cluster = TRUE)
visdat::vis_miss(airquality,
         sort_miss = TRUE)
# https://cran.r-project.org/web/packages/dlookr/vignettes/transformation.html

income <- dlookr::imputate_na(carseats, Income, US, method = "rpart")
income
attr(income,"var_type")
attr(income,"method")
attr(income,"na_pos")
attr(income,"type")
attr(income,"message")
attr(income,"success")
attr(income,"class")

summary(income)

plot(income)
carseats %>%
  mutate(Income_imp = dlookr::imputate_na(carseats, Income, US, method = "knn")) %>%
  group_by(US) %>%
  summarise(orig = mean(Income, na.rm = TRUE),
    imputation = mean(Income_imp))
library(mice)
urban <- dlookr::imputate_na(carseats, Urban, US, method = "mice")
urban 
summary(urban)
plot(urban)
price <- dlookr::imputate_outlier(carseats, Price, method = "capping")
price
summary(price)
plot(price)
carseats %>%
  mutate(Price_imp = dlookr::imputate_outlier(carseats, Price, method = "capping")) %>%
  group_by(US) %>%
  summarise(orig = mean(Price, na.rm = TRUE),
    imputation = mean(Price_imp, na.rm = TRUE))
carseats %>% 
  mutate(Income_minmax = dlookr::transform(carseats$Income, method = "minmax"),
    Sales_minmax = dlookr::transform(carseats$Sales, method = "minmax")) %>% 
  select(Income_minmax, Sales_minmax) %>% 
  boxplot()
dlookr::find_skewness(carseats)

dlookr::find_skewness(carseats, index = FALSE)

dlookr::find_skewness(carseats, value = TRUE)

dlookr::find_skewness(carseats, value = TRUE, thres = 0.1)
Advertising_log = transform(carseats$Advertising, method = "log")
# Advertising_log <- transform(carseats$Advertising, method = "log+1")
head(Advertising_log)
summary(Advertising_log)
plot(Advertising_log)
bin <- dlookr::binning(carseats$Income)
bin <- binning(carseats$Income, nbins = 4,
              labels = c("LQ1", "UQ1", "LQ3", "UQ3"))
binning(carseats$Income, nbins = 5, type = "equal")
binning(carseats$Income, nbins = 5, type = "pretty")
binning(carseats$Income, nbins = 5, type = "kmeans")
binning(carseats$Income, nbins = 5, type = "bclust")

bin
summary(bin)
plot(bin)



carseats %>%
 mutate(Income_bin = dlookr::binning(carseats$Income)) %>%
 group_by(ShelveLoc, Income_bin) %>%
 summarise(freq = n()) %>%
 arrange(desc(freq)) %>%
 head(10)

bin <- dlookr::binning_by(carseats, "US", "Advertising")
bin
summary(bin)
attr(bin, "iv") # information value 
attr(bin, "ivtable") # information value table

plot(bin, sub = "bins of Advertising variable")

# https://cran.r-project.org/web/packages/exploreR/vignettes/exploreR.html

(regressResults <- exploreR::masslm(iris,
                                   "Sepal.Length",
                                   ignore = "Species")
)

exploreR::massregplot(iris, "Sepal.Length", ignore = "Species")

(stand.Petals <- exploreR::standardize(iris,
                                      c("Petal.Width", "Petal.Length"))
)
carseats %>%
  dlookr::transformation_report(target = US)

carseats %>%
  dlookr::transformation_report(target = US, output_format = "html", 
    output_file = "transformation.html")

inspectdf::inspect_na(starwars)

inspectdf::inspect_na(starwars) %>% inspectdf::show_plot()

inspectdf::inspect_na(star_1, star_2)

inspectdf::inspect_na(star_1, star_2) %>% inspectdf::show_plot()
mydata %>%
  dplyr::select(-dplyr::contains("Date")) %>%
  report::report()
# cat(names(mydata), sep = " + \n")
library(arsenal)
tab1 <- arsenal::tableby(
  ~ Sex +
    Age +
    Race +
    PreinvasiveComponent +
    LVI +
    PNI +
    Death +
    Group +
    Grade +
    TStage +
    # `Anti-X-intensity` +
    # `Anti-Y-intensity` +
    LymphNodeMetastasis +
    Valid +
    Smoker +
    Grade_Level
  ,
  data = mydata 
)
summary(tab1)
library(tableone)
mydata %>% 
  dplyr::select(-keycolumns,
         -dateVariables
         ) %>% 
tableone::CreateTableOne(data = .)
# CreateTableOne(vars = myVars, data = mydata, factorVars = characterVariables)
# tab <- CreateTableOne(vars = myVars, data = pbc, factorVars = catVars)
# print(tab, showAllLevels = TRUE)
# ?print.TableOne
# summary(tab)
# print(tab, nonnormal = biomarkers)
# print(tab, nonnormal = biomarkers, exact = "stage", quote = TRUE, noSpaces = TRUE)
# tab3Mat <- print(tab3, nonnormal = biomarkers, exact = "stage", quote = FALSE, noSpaces = TRUE, printToggle = FALSE)
# write.csv(tab3Mat, file = "myTable.csv")
mydata %>% 
  dplyr::select(
    continiousVariables,
    numericVariables,
    integerVariables
  ) %>% 
summarytools::descr(., style = 'rmarkdown')
print(summarytools::descr(mydata), method = 'render', table.classes = 'st-small')
mydata %>% 
  summarytools::descr(.,
                      stats = "common",
                      transpose = TRUE,
                      headings = FALSE
                      )
mydata %>% 
  summarytools::descr(stats = "common") %>%
  summarytools::tb()
mydata$Sex %>% 
  summarytools::freq(cumul = FALSE, report.nas = FALSE) %>%
  summarytools::tb()
mydata %>%
  explore::describe() %>%
  dplyr::filter(unique < 5)
mydata %>%
  explore::describe() %>%
  dplyr::filter(na > 0)
mydata %>% explore::describe()
source(here::here("R", "gc_desc_cat.R"))
tab <- 
  mydata %>% 
  dplyr::select(
    -keycolumns
    ) %>% 
  tableone::CreateTableOne(data = .)
?print.CatTable
tab$CatTable
race_stats <- summarytools::freq(mydata$Race) 
print(race_stats,
      report.nas = FALSE,
      totals = FALSE,
      display.type = FALSE,
      Variable.label = "Race Group"
      )
mydata %>% explore::describe(PreinvasiveComponent)
## Frequency or custom tables for categorical variables
SmartEDA::ExpCTable(
  mydata,
  Target = NULL,
  margin = 1,
  clim = 10,
  nlim = 5,
  round = 2,
  bin = NULL,
  per = T
)
inspectdf::inspect_cat(mydata)

inspectdf::inspect_cat(mydata)$levels$Group
library(summarytools)

grouped_freqs <- stby(data = mydata$Smoker, 
                      INDICES = mydata$Sex, 
                      FUN = freq, cumul = FALSE, report.nas = FALSE)

grouped_freqs %>% tb(order = 2)
summarytools::stby(
  list(x = mydata$LVI, y = mydata$LymphNodeMetastasis), 
  mydata$PNI,
  summarytools::ctable
  )
with(mydata, 
     summarytools::stby(
       list(x = LVI, y = LymphNodeMetastasis), PNI,
       summarytools::ctable
       )
     )
SmartEDA::ExpCTable(
  mydata,
  Target = "Sex",
  margin = 1,
  clim = 10,
  nlim = NULL,
  round = 2,
  bin = 4,
  per = F
)
mydata %>% 
  dplyr::select(characterVariables) %>% 
  dplyr::select(PreinvasiveComponent,
         PNI,
         LVI
         ) %>% 
reactable::reactable(data = ., groupBy = c("PreinvasiveComponent", "PNI"), columns = list(
  LVI = reactable::colDef(aggregate = "count")
))
questionr:::icut()
source(here::here("R", "gc_desc_cont.R"))
tab <- tableone::CreateTableOne(data = mydata)
# ?print.ContTable
tab$ContTable
print(tab$ContTable, nonnormal = c("Anti-X-intensity"))
mydata %>% explore::describe(Age)
mydata %>% 
  dplyr::select(continiousVariables) %>% 
SmartEDA::ExpNumStat(
  data = .,
  by = "A",
  gp = NULL,
  Qnt = seq(0, 1, 0.1),
  MesofShape = 2,
  Outlier = TRUE,
  round = 2
)
inspectdf::inspect_num(mydata, breaks = 10)
inspectdf::inspect_num(mydata)$hist$Age
inspectdf::inspect_num(mydata, breaks = 10) %>%
  inspectdf::show_plot()
grouped_descr <- summarytools::stby(data = mydata, 
                      INDICES = mydata$Sex, 
                      FUN = summarytools::descr, stats = "common")
# grouped_descr %>% summarytools::tb(order = 2)
grouped_descr %>% summarytools::tb()
mydata %>%
  group_by(US) %>% 
  dlookr::describe(Sales, Income) 

carseats %>%
  group_by(US, Urban) %>% 
  dlookr::describe(Sales, Income) 

categ <- dlookr::target_by(carseats, US)
cat_num <- dlookr::relate(categ, Sales)
cat_num
summary(cat_num)
plot(cat_num)
   summarytools::stby(data = mydata, 
                               INDICES = mydata$PreinvasiveComponent, 
                               FUN = summarytools::descr,
                      stats = c("mean", "sd", "min", "med", "max"), 
                               transpose = TRUE)
with(mydata, 
     summarytools::stby(Age, PreinvasiveComponent, summarytools::descr), 
                   stats = c("mean", "sd", "min", "med", "max"),
                   transpose = TRUE
                   )
mydata %>% 
  group_by(PreinvasiveComponent) %>% 
  summarytools::descr(stats = "fivenum")
## Summary statistics by – category
SmartEDA::ExpNumStat(
  mydata,
  by = "GA",
  gp = "PreinvasiveComponent",
  Qnt = seq(0, 1, 0.1),
  MesofShape = 2,
  Outlier = TRUE,
  round = 2
)
mydata %>% 
  janitor::tabyl(Sex) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(Race) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(PreinvasiveComponent) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(LVI) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(PNI) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(Group) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(Grade) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(TStage) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(LymphNodeMetastasis) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(Grade_Level) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
  janitor::tabyl(DeathTime) %>%
  janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
  knitr::kable()
mydata %>% 
jmv::descriptives(
    data = .,
    vars = 'Age',
    hist = TRUE,
    dens = TRUE,
    box = TRUE,
    violin = TRUE,
    dot = TRUE,
    mode = TRUE,
    sd = TRUE,
    variance = TRUE,
    skew = TRUE,
    kurt = TRUE,
    quart = TRUE)
mydata %>% 
jmv::descriptives(
    data = .,
    vars = 'AntiX_intensity',
    hist = TRUE,
    dens = TRUE,
    box = TRUE,
    violin = TRUE,
    dot = TRUE,
    mode = TRUE,
    sd = TRUE,
    variance = TRUE,
    skew = TRUE,
    kurt = TRUE,
    quart = TRUE)
mydata %>% 
jmv::descriptives(
    data = .,
    vars = 'AntiY_intensity',
    hist = TRUE,
    dens = TRUE,
    box = TRUE,
    violin = TRUE,
    dot = TRUE,
    mode = TRUE,
    sd = TRUE,
    variance = TRUE,
    skew = TRUE,
    kurt = TRUE,
    quart = TRUE)
library(finalfit)
# dependent <- c("dependent1",
#                "dependent2"
#               )

# explanatory <- c("explanatory1",
#                  "explanatory2"
#                  )

dependent <- "PreinvasiveComponent"

explanatory <- c("Sex", "Age", "Grade", "TStage")

source(here::here("R", "gc_table_cross.R"))
CreateTableOne(vars = myVars, strata = "columnname", data = pbc, factorVars = catVars)
print(tab, nonnormal = biomarkers, exact = "exactVariable", smd = TRUE)

write2html(
  knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.keep.rmd.html"),
  quiet = TRUE, # passed to rmarkdown::render
  keep.rmd = TRUE
)
ctable(tobacco$gender, tobacco$smoker, style = 'rmarkdown')
print(ctable(tobacco$gender, tobacco$smoker), method = 'render')
print(ctable(tobacco$smoker, tobacco$diseased, prop = "r"), method = "render")
with(tobacco, 
     print(ctable(smoker, diseased, prop = 'n', totals = FALSE, chisq = TRUE),
           headings = FALSE, method = "render"))
# devtools::install_github("ewenharrison/summarizer")
# library(summarizer)
# data(colon_s)
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
colon_s %>%
  summary.factorlist(dependent, explanatory, p=TRUE) %>% 
    knitr::kable(row.names=FALSE, align=c("l", "l", "r", "r", "r"))

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  summary.factorlist(dependent, explanatory) %>% 
    knitr::kable(row.names=FALSE, align=c("l", "l", "r", "r", "r"))


library("rmngb")
# rmngb::pairwise.chisq.test(mydata$StageGr2, mydata$Ki67Gr)
rmngb::pairwise.fisher.test(mydata$StageGr2, mydata$Ki67Gr)

# rmngb::pairwise.chisq.test(mydata$LiverDistantMets, mydata$Ki67Gr, p.adj = "BH")
rmngb::pairwise.fisher.test(mydata$LiverDistantMets, mydata$Ki67Gr, p.adj = "BH")
# rmngb::pairwise.chisq.test(mydata$PNI, mydata$Ki67Gr, p.adj = "BH")
rmngb::pairwise.fisher.test(mydata$PNI, mydata$Ki67Gr, p.adj = "BH")
# rmngb::pairwise.chisq.test(mydata$LVI, mydata$Ki67Gr, p.adj = "BH")
rmngb::pairwise.fisher.test(mydata$LVI, mydata$Ki67Gr, p.adj = "BH")
MBStudy <- 
tibble::tribble(
           ~Grup,                           ~Diagnosis,   ~Number,
   "\"Grup1\"",           "\"Diseased\"", 1383L,
  "\"Grup2A\"",           "\"Diseased\"",   58L,
  "\"Grup2B\"",           "\"Diseased\"",  349L,
   "\"Grup3\"",           "\"Diseased\"", 5217L,
   "\"Grup1\"", "\"Stromal   Diseased\"",   13L,
  "\"Grup2A\"", "\"Stromal   Diseased\"",    2L,
  "\"Grup2B\"", "\"Stromal   Diseased\"",   47L,
   "\"Grup3\"", "\"Stromal   Diseased\"",  476L,
   "\"Grup1\"",   "\"Inflammation fibrosis\"",   56L,
  "\"Grup2A\"",   "\"Inflammation fibrosis\"",   52L,
  "\"Grup2B\"",   "\"Inflammation fibrosis\"",  267L,
   "\"Grup3\"",   "\"Inflammation fibrosis\"", 1387L
  )


MBStudy <- 
  tibble::tribble(
    ~Grup,                           ~Diagnosis,   ~Number,
    "\"Grup1\"",           "\"Diseased\"", 1383L,
    "\"Grup2A\"",           "\"Diseased\"",   58L,
    "\"Grup2B\"",           "\"Diseased\"",  349L,
    "\"Grup3\"",           "\"Diseased\"", 5217L,
    "\"Grup1\"", "\"Stromal   Diseased\"",   13L,
    "\"Grup2A\"", "\"Stromal   Diseased\"",    2L,
    "\"Grup2B\"", "\"Stromal   Diseased\"",   47L,
    "\"Grup3\"", "\"Stromal   Diseased\"",  476L,
    "\"Grup1\"",   "\"Inflammation fibrosis\"",   56L,
    "\"Grup2A\"",   "\"Inflammation fibrosis\"",   52L,
    "\"Grup2B\"",   "\"Inflammation fibrosis\"",  267L,
    "\"Grup3\"",   "\"Inflammation fibrosis\"", 1387L
  )


MBStudy <- 
data.frame(
  stringsAsFactors = FALSE,
                V1 = c("\"Grup1\"","\"Grup2A\"",
                       "\"Grup2B\"","\"Grup3\"","\"Grup1\"","\"Grup2A\"",
                       "\"Grup2B\"","\"Grup3\"","\"Grup1\"","\"Grup2A\"",
                       "\"Grup2B\"","\"Grup3\""),
                V2 = c("\"Diseased\"",
                       "\"Diseased\"","\"Diseased\"","\"Diseased\"",
                       "\"Stromal   Diseased\"","\"Stromal   Diseased\"",
                       "\"Stromal   Diseased\"",
                       "\"Stromal   Diseased\"","\"Inflammation fibrosis\"",
                       "\"Inflammation fibrosis\"","\"Inflammation fibrosis\"",
                       "\"Inflammation fibrosis\""),
                V3 = c(1383L,58L,349L,5217L,13L,
                       2L,47L,476L,56L,52L,267L,1387L)
)

MBStudy <- matrix(c(
1383L,                    13L,                    56L,
58L,                     2L,                    52L,
349L,                    47L,                   267L,
5217L,                   476L,                  1387L
  ), byrow = TRUE, nrow = 4, dimnames = list(c("Grup1", "Grup2A", "Grup2B", "Grup3"), c("Diseased", "Stromal Diseased", "Inflammation")))


RVAideMemoire::chisq.multcomp(MBStudy)
MBStudy
MB_table <- RVAideMemoire::fisher.multcomp(tab.cont = MBStudy)

MB_table$p.value %>% 
  as.data.frame() %>%
  tibble::rownames_to_column(var = "Grup") %>% 
  gt::gt(.) %>% 
  gt::fmt_number(., columns = dplyr::contains("Diseased"), decimals = 4)

rmngb::pairwise.fisher.test.table(MBStudy)

MBStudy2 <- matrix(c(
13L,    53L,
9L, 5L,
3L, 26L),
byrow = TRUE,
nrow = 3,
dimnames = list(
c("Diseased", "Inflammation", "Fibrosis"),
c("sw", "cds")
))

MBStudy2

MBStudy2_analysis <-  RVAideMemoire::fisher.multcomp(tab.cont = t(MBStudy2))

MBStudy2_analysis$p.value

mydata %>%
    summary_factorlist(dependent = 'PreinvasiveComponent', 
                       explanatory = explanatory,
                       # column = TRUE,
                       total_col = TRUE,
                       p = TRUE,
                       add_dependent_label = TRUE,
                       na_include=FALSE
                       # catTest = catTestfisher
                       ) -> table

knitr::kable(table, row.names = FALSE, align = c('l', 'l', 'r', 'r', 'r'))

table1 <- arsenal::tableby(PreinvasiveComponent ~ explanatory, mydata)

summary(table1)

knitr::kable(table1,
                         row.names = FALSE,
                         align = c('l', 'l', 'r', 'r', 'r', 'r'),
                         format = 'html') %>%
                kableExtra::kable_styling(kable_input = .,
                                          bootstrap_options = 'striped',
                                          full_width = F,
                                          position = 'left')
tangram::tangram(PreinvasiveComponent ~ explanatory, mydata)

tangram::html5(tangram::tangram(PreinvasiveComponent ~ explanatory, mydata),
                    fragment = TRUE,
                    inline = 'nejm.css',
                    caption = 'Cross TablePreinvasiveComponentNEJM Style',
                    id = 'tbl3')
tangram::html5(tangram::tangram(PreinvasiveComponent ~ explanatory, mydata),
                    fragment = TRUE,
                    inline = 'lancet.css',
                    caption = 'Cross TablePreinvasiveComponentLancet Style',
                    id = 'tbl3')
dependent <- c("dependent1",
               "dependent2"
                 )

explanatory <- c("explanatory1",
                 "explanatory2"
                 )
mydataCategorical <- mydata %>% 
    select(-var1,
           -var2
    )
mydataCategorical_variable <- explanatory[1]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[2]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[3]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[4]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[5]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[6]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[7]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[8]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[9]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[10]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[11]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[12]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[13]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[14]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[15]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[16]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
## column chart
SmartEDA::ExpCatViz(
  Carseats,
  target = "Urban",
  fname = NULL,
  clim = 10,
  col = NULL,
  margin = 2,
  Page = c(2, 1),
  sample = 2
)
## Stacked bar graph
SmartEDA::ExpCatViz(
  Carseats,
  target = "Urban",
  fname = NULL,
  clim = 10,
  col = NULL,
  margin = 2,
  Page = c(2, 1),
  sample = 2
)
## Variable importance graph using information values
SmartEDA::ExpCatStat(
  Carseats,
  Target = "Urban",
  result = "Stat",
  Pclass = "Yes",
  plot = TRUE,
  top = 20,
  Round = 2
)
inspectdf::inspect_cat(starwars) %>% inspectdf::show_plot()
inspectdf::inspect_cat(starwars) %>% 
  inspectdf::show_plot(high_cardinality = 1)
inspectdf::inspect_cat(star_1, star_2) %>% inspectdf::show_plot()
# mydataContinious
mydata %>%
    select(institution, starts_with("Slide")) %>%
    pivot_longer(cols = starts_with("Slide")) %>%
    ggplot(., aes(name, value)) -> p
p + geom_jitter() 
p + geom_jitter(aes(colour = institution)) 
dxchanges <- mydata %>%
    select(bx_no, starts_with("Slide")) %>% 
    filter(complete.cases(.)) %>%
    group_by(Slide1_infiltrative, Slide2_Medium, Slide3_Demarcated) %>% 
    tally()

library(ggalluvial)

ggplot(data = dxchanges,
       aes(axis1 = Slide1_infiltrative, axis2 = Slide2_Medium, axis3 = Slide3_Demarcated,
           y = n)) +
  scale_x_discrete(limits = c("Slide1", "Slide2", "Slide3"),
                   expand = c(.1, .05)
                   ) +
  xlab("Slide") +
  geom_alluvium(aes(fill = Slide1_infiltrative,
                    colour = Slide1_infiltrative
                    )) +
  geom_stratum() +
  geom_text(stat = "stratum", label.strata = TRUE) +
  theme_minimal() +
  ggtitle("PanNET")

## Generate Boxplot by category
SmartEDA::ExpNumViz(
  mtcars,
  target = "gear",
  type = 2,
  nlim = 25,
  fname = file.path(here::here(), "Mtcars2"),
  Page = c(2, 2)
)
## Generate Density plot
SmartEDA::ExpNumViz(
  mtcars,
  target = NULL,
  type = 3,
  nlim = 25,
  fname = file.path(here::here(), "Mtcars3"),
  Page = c(2, 2)
)
## Generate Scatter plot
SmartEDA::ExpNumViz(
  mtcars,
  target = "carb",
  type = 3,
  nlim = 25,
  fname = file.path(here::here(), "Mtcars4"),
  Page = c(2, 2)
)
SmartEDA::ExpNumViz(mtcars, target = "am", scatter = TRUE)
library(ggplot2)
library(plotly)
library(gapminder)

p <- gapminder %>%
  filter(year==1977) %>%
  ggplot( aes(gdpPercap, lifeExp, size = pop, color=continent)) +
  geom_point() +
  scale_x_log10() +
  theme_bw()

ggplotly(p)
scales::show_col(colours(), cex_label = .35)
gistr::gist("https://gist.github.com/sbalci/834ebc154c0ffcb7d5899c42dd3ab75e") %>% 
  gistr::embed() -> embedgist

# https://stackoverflow.com/questions/43053375/weighted-sankey-alluvial-diagram-for-visualizing-discrete-and-continuous-panel/48133004

library(tidyr)
library(dplyr)
library(alluvial)
library(ggplot2)
library(forcats)

set.seed(42)
individual <- rep(LETTERS[1:10],each=2)
timeperiod <- paste0("time_",rep(1:2,10))
discretechoice <- factor(paste0("choice_",sample(letters[1:3],20, replace=T)))
continuouschoice <- ceiling(runif(20, 0, 100))
d <- data.frame(individual, timeperiod, discretechoice, continuouschoice)

# stacked bar diagram of discrete choice by individual
g <- ggplot(data=d,aes(timeperiod,fill=fct_rev(discretechoice)))
g + geom_bar(position="stack") + guides(fill=guide_legend(title=NULL))
# alluvial diagram of discrete choice by individual
d_alluvial <- d %>%
  select(individual,timeperiod,discretechoice) %>%
  spread(timeperiod,discretechoice) %>%
  group_by(time_1,time_2) %>%
  summarize(count=n()) %>%
  ungroup()
alluvial(select(d_alluvial,-count),freq=d_alluvial$count)
# stacked bar diagram of discrete choice, weighting by continuous choice
g + geom_bar(position="stack",aes(weight=continuouschoice))
library(ggalluvial)
ggplot(
  data = d,
  aes(
    x = timeperiod,
    stratum = discretechoice,
    alluvium = individual,
    y = continuouschoice
  )
) +
  geom_stratum(aes(fill = discretechoice)) +
  geom_flow()
 # use of strata and labels
ggplot(as.data.frame(Titanic),
       aes(y = Freq,axis1 = Class, axis2 = Sex, axis3 = Age)) +
  geom_flow() +
  scale_x_discrete(limits = c("Class", "Sex", "Age")) +
  geom_stratum() + 
  geom_text(stat = "stratum", infer.label = TRUE) +
  ggtitle("Alluvial plot of Titanic passenger demographic data")

# use of facets
ggplot(as.data.frame(Titanic),aes(y = Freq,axis1 = Class, axis2 = Sex)) +geom_flow(aes(fill = Age), width = .4) +geom_stratum(width = .4) +geom_text(stat = "stratum", infer.label = TRUE, size = 3) +scale_x_discrete(limits = c("Class", "Sex")) +facet_wrap(~ Survived, scales = "fixed")
# time series alluvia of WorldPhones 
wph <- as.data.frame(as.table(WorldPhones))
names(wph) <- c("Year", "Region", "Telephones")
ggplot(wph,aes(x = Year, alluvium = Region, y = Telephones)) +geom_flow(aes(fill = Region, colour = Region), width = 0)
# rightward flow aesthetics for vaccine survey datad
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))

ggplot(vaccinations,
       aes(x = survey, 
           stratum = response, 
           alluvium = subject,
           y = freq, 
           fill = response 
           label = round(a, 3)
           )
       ) +
  geom_lode() + 
  geom_flow() +
  geom_stratum(alpha = 0) +
  geom_text(stat = "stratum")

CD44changes <- mydata %>%
    dplyr::select(TumorCD44, TomurcukCD44, PeritumoralTomurcukGr4) %>% 
    dplyr::filter(complete.cases(.)) %>%
    dplyr::group_by(TumorCD44, TomurcukCD44, PeritumoralTomurcukGr4) %>% 
    dplyr::tally()

library(ggalluvial)

ggplot(data = CD44changes,
       aes(axis1 = TumorCD44, axis2 = TomurcukCD44,
           y = n)) +
  scale_x_discrete(limits = c("TumorCD44", "TomurcukCD44"),
                   expand = c(.1, .05)
                   ) +
  xlab("Tumor Tomurcuk") +
  geom_alluvium(aes(fill = PeritumoralTomurcukGr4,
                    colour = PeritumoralTomurcukGr4                    )) +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", infer.label = TRUE) +
  # geom_text(stat = 'alluvium', infer.label = TRUE) +
  theme_minimal() +
  ggtitle("Changes in CD44")
library(arsenal)
dat <- data.frame(
  tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)),
  id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6),
  Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"),
  Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")),
  Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA),
  Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")),
  Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE),
  Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4),
  stringsAsFactors = FALSE
)


p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE)
summary(p)
dlookr::normality(carseats)
dlookr::normality(carseats, Sales, CompPrice, Income)
dlookr::normality(carseats, Sales:Income)
dlookr::normality(carseats, -(Sales:Income))
carseats %>%
  dlookr::normality() %>%
  dplyr::filter(p_value <= 0.01) %>% 
  arrange(abs(p_value))
carseats %>%
  group_by(ShelveLoc, US) %>%
  dlookr::normality(Income) %>% 
  arrange(desc(p_value))
carseats %>%
  mutate(log_income = log(Income)) %>%
  group_by(ShelveLoc, US) %>%
  dlookr::normality(log_income) %>%
  dplyr::filter(p_value > 0.01)
dlookr::plot_normality(carseats, Sales, CompPrice)
carseats %>%
  dplyr::filter(ShelveLoc == "Good") %>%
  group_by(US) %>%
  dlookr::plot_normality(Income)

mytable <- jmv::ttestIS(
    formula = HindexCTLA4 ~ PeritumoralTomurcukGr4,
    data = mydata,
    vars = HindexCTLA4,
    students = FALSE,
    mann = TRUE,
    norm = TRUE,
    meanDiff = TRUE,
    desc = TRUE,
    plots = TRUE)

cat("<pre class='jamovitable'>")
print(jtable(mytable$ttest))
cat("</pre>")
categ <- dlookr::target_by(carseats, US)
cat_cat <- dlookr::relate(categ, ShelveLoc)
cat_cat
summary(cat_cat)
plot(cat_cat)
## Summary statistics of categorical variables
SmartEDA::ExpCatStat(
  Carseats,
  Target = "Urban",
  result = "Stat",
  clim = 10,
  nlim = 5,
  Pclass = "Yes"
)
inspectdf::inspect_cat(star_1, star_2)
num <- dlookr::target_by(carseats, Sales)
num_num <- dlookr::relate(num, Price)
num_num
summary(num_num)
plot(num_num)
plot(num_num, hex_thres = 350)
## Inforamtion value and Odds value
SmartEDA::ExpCatStat(
  Carseats,
  Target = "Urban",
  result = "IV",
  clim = 10,
  nlim = 5,
  Pclass = "Yes"
)
# library(OptimalCutpoints)
# https://tidymodels.github.io/yardstick/reference/roc_curve.html

roc_fit <- yardstick::roc_curve(mydata,
                                truth = "classification", 
                                estimate = "test",
                                na_rm = TRUE,
                                  options = list(
                                    smooth = FALSE,
                                    print.auc = TRUE,
                                    ret = c("all_coords")
                                    )
                                )

ggplot2::autoplot(roc_fit)

library(pROC)

m1 <- pROC::roc(mydata,
          "classification",
          "test",
          auc = TRUE, 
          ci = TRUE,
          # plot = TRUE,
          # percent=TRUE, 
          na.rm=TRUE,
          # smooth = TRUE,
          ret = "all_coords",
          # ret = "roc",
          quiet = FALSE,
          legacy.axes = TRUE,
          print.auc = TRUE,
          # xlab = "False Positive",
          # ylab = "True Positive"
          )

m1

pROC::roc(mydata,
          "polyp_rec",
          "size",
          auc = TRUE, 
          ci = TRUE,
          # plot = TRUE,
          # percent=TRUE, 
          na.rm=TRUE,
          # smooth = TRUE,
          # ret = "all_coords",
          ret = "roc",
          quiet = FALSE,
          legacy.axes = TRUE,
          print.auc = TRUE,
          # xlab = "False Positive",
          # ylab = "True Positive"
          )


which.max(m1$youden)
m1[which.max(m1$youden),]

roc_obj <- pROC::roc(polyp_rec ~ size,
          data = mydata,
          auc = TRUE,
          ci = TRUE,
          plot = TRUE,
          # percent=TRUE, 
          na.rm=TRUE,
          # smooth = TRUE,
          # ret = "all_coords",
          ret = "roc",
          quiet = FALSE,
          legacy.axes = TRUE,
          print.auc = TRUE,
          xlab = "False Positive",
          ylab = "True Positive"
          )
# devtools::install_github("sachsmc/plotROC")
library(plotROC)
# shiny_plotROC()


iris %>% explore::explain_tree(target = Species)


iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
iris %>% select(-Species) %>% explain_tree(target = is_versicolor)

iris %>% explain_tree(target = Sepal.Length)
explore::explore(mydata)
mydata$int <- lubridate::interval(
  lubridate::ymd(mydata$SurgeryDate),
  lubridate::ymd(mydata$LastFollowUpDate)
  )
mydata$OverallTime <- lubridate::time_length(mydata$int, "month")
mydata$OverallTime <- round(mydata$OverallTime, digits = 1)
mydata$OverallTime <- mydata$genel_sagkalim
## Recoding mydata$Death into mydata$Outcome
mydata$Outcome <- forcats::fct_recode(as.character(mydata$Death),
               "1" = "TRUE",
               "0" = "FALSE")
mydata$Outcome <- as.numeric(as.character(mydata$Outcome))
table(mydata$Death, mydata$Outcome)
library(survival)
# data(lung)
# km <- with(lung, Surv(time, status))
km <- with(mydata, Surv(OverallTime, Outcome))
head(km,80)
plot(km)
# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html
dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "LVI"

mydata %>%
  finalfit::surv_plot(.data = .,
                      dependent = dependentKM,
                      explanatory = explanatoryKM,
                      xlab='Time (months)',
                      pval=TRUE,
                      legend = 'none',
                      break.time.by = 12,
                      xlim = c(0,60)
                      # legend.labs = c('a','b')
                      )
# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html

mydata %>%
  finalfit::surv_plot(.data = .,
                      dependent = "Surv(OverallTime, Outcome)",
                      explanatory = "LVI",
                      xlab='Time (months)',
                      pval=TRUE,
                      legend = 'none',
                      break.time.by = 12,
                      xlim = c(0,60)
                      # legend.labs = c('a','b')
                      )
library(finalfit)
library(survival)
explanatoryUni <- "LVI"
dependentUni <- "Surv(OverallTime, Outcome)"

mydata %>%
finalfit::finalfit(dependentUni, explanatoryUni) -> tUni

knitr::kable(tUni[, 1:4], row.names=FALSE, align=c('l', 'l', 'r', 'r', 'r', 'r'))
tUni_df <- tibble::as_tibble(tUni, .name_repair = "minimal") %>% 
  janitor::clean_names() 

tUni_df_descr <- paste0("When ",
                        tUni_df$dependent_surv_overall_time_outcome[1],
                        " is ",
                        tUni_df$x[2],
                        ", there is ",
                        tUni_df$hr_univariable[2],
                        " times risk than ",
                        "when ",
                        tUni_df$dependent_surv_overall_time_outcome[1],
                        " is ",
                        tUni_df$x[1],
                        "."
                        )
km_fit <- survfit(Surv(OverallTime, Outcome) ~ LVI, data = mydata)
km_fit
plot(km_fit)
# summary(km_fit)
km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% 
  janitor::clean_names() %>% 
  tibble::rownames_to_column()
km_fit_median_df <- summary(km_fit)
            km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>%
                tibble::rownames_to_column()

            names(km_fit_median_df) <- paste0("m", 1:dim(km_fit_median_df)[2])

            km_fit_median_definition <- 

            km_fit_median_df %>%
                dplyr::mutate(
                    description =
                        glue::glue(
                            "When {m1}, median survival is {m8} [{m9} - {m10}, 95% CI] months."
                        )
                ) %>%
                dplyr::select(description) %>%
                dplyr::pull() 
sTable <- summary(km_fit)$table
            st <- data.frame()

            for (i in seq_len(nrow(km_fit))) {
                if (nrow(km_fit) == 1)
                    g <- sTable
                else
                    g <- sTable[i,]
                nevents <- sum(g['events'])
                n <- g['n.max']
                ncensor <- n - nevents
                median <- g['median']
                mean <- g['*rmean']
                prop <- nevents / n

                print(rowNo=i, list(
                    censored=ncensor,
                    events=nevents,
                    n=n,
                    prop=nevents/n,
                    median=median,
                    mean=mean))
            }

            st$setStatus('complete')


            results1 <- st
km_fit
broom::tidy(km_fit)
km_fit_median_df %>% 
  dplyr::mutate(
    description = 
      glue::glue(
      "When {rowname}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months."
    )
  ) %>% 
  dplyr::select(description) %>% 
  dplyr::pull() -> km_fit_median_definition
summary(km_fit, times = c(12,36,60))
km_fit_summary <- summary(km_fit, times = c(12,36,60))

km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", "surv", "std.err", "lower", "upper")])
km_fit_df %>% 
  dplyr::mutate(
    description = 
      glue::glue(
      "When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI]."
    )
  ) %>% 
  dplyr::select(description) %>% 
  dplyr::pull() -> km_fit_definition
library(survival)
surv_fit <- survival::survfit(Surv(time, status) ~ ph.ecog, data=lung)
insight::is_model_supported(surv_fit)
insight::find_formula(surv_fit)
report::report_participants(mydata)
dependentKM <-  "Surv(OverallTime, Outcome2)"

explanatoryKM <- c("explanatory1",
               "explanatory2"
               )
source(here::here("R", "gc_survival.R"))
mydependent <-  "Surv(time, status)"
explanatory <- "Organ"

mysurvival <- function(mydata, mydependent, explanatory) {
    {{mydata}} %>%
        finalfit::surv_plot(dependent = {{mydependent}},
                            explanatory = {{explanatory}},
                            xlab='Time (months)',
                            pval=TRUE,
                            legend = 'none',
                            break.time.by = 12,
                            xlim = c(0,60)
        )
}


# library(tidyverse)
mysurvival(mydata = whippleables, mydependent = mydependent, explanatory = explanatory)

explanatory <- c("Organ", "LVI")

deneme <- purrr::map(explanatory, mysurvival, mydata = whippleables, mydependent = mydependent)

dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "TStage"

mydata %>%
  finalfit::surv_plot(.data = .,
                      dependent = dependentKM,
                      explanatory = explanatoryKM,
                      xlab='Time (months)',
                      pval=TRUE,
                      legend = 'none',
                      break.time.by = 12,
                      xlim = c(0,60)
                      # legend.labs = c('a','b')
                      )
survminer::pairwise_survdiff(
  formula = Surv(OverallTime, Outcome) ~ TStage, 
                             data = mydata,
                             p.adjust.method = "BH"
  )
km_fit
print(km_fit, 
      scale=1,
      digits = max(options()$digits - 4,3),
      print.rmean=getOption("survfit.print.rmean"),
      rmean = getOption('survfit.rmean'),
      print.median=getOption("survfit.print.median"),
      median = getOption('survfit.median')

      )
library(finalfit)
library(survival)
explanatoryMultivariate <- explanatoryKM
dependentMultivariate <- dependentKM

mydata %>%
  finalfit(dependentMultivariate, explanatoryMultivariate, metrics=TRUE) -> tMultivariate

knitr::kable(tMultivariate, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
# https://tidymodels.github.io/parsnip/reference/surv_reg.html
library(parsnip)
surv_reg()
#> Parametric Survival Regression Model Specification (regression)
#> # Parameters can be represented by a placeholder:
surv_reg(dist = varying())

#> Parametric Survival Regression Model Specification (regression)
#> 
#> Main Arguments:
#>   dist = varying()
#> 
model <- surv_reg(dist = "weibull")
model
#> Parametric Survival Regression Model Specification (regression)
#> 
#> Main Arguments:
#>   dist = weibull
#> update(model, dist = "lnorm")#> Parametric Survival Regression Model Specification (regression)
#> 
#> Main Arguments:
#>   dist = lnorm
#> 


# From randomForest
rf_1 <- randomForest(x, y, mtry = 12, ntree = 2000, importance = TRUE)

# From ranger
rf_2 <- ranger(
  y ~ ., 
  data = dat, 
  mtry = 12, 
  num.trees = 2000, 
  importance = 'impurity'
)

# From sparklyr
rf_3 <- ml_random_forest(
  dat, 
  intercept = FALSE, 
  response = "y", 
  features = names(dat)[names(dat) != "y"], 
  col.sample.rate = 12,
  num.trees = 2000
)




rand_forest(mtry = 12, trees = 2000) %>%
  set_engine("ranger", importance = 'impurity') %>%
  fit(y ~ ., data = dat)


rand_forest(mtry = 12, trees = 2000) %>%
  set_engine("spark") %>%
  fit(y ~ ., data = dat)



mb_followup$OverallTime <- mb_followup$months
mb_followup$Outcome <- mb_followup$`rec(1,0)`
mb_followup$Operation <- mb_followup$`op type (1,2,3)`

## Recoding mb_followup$Operation
mb_followup$Operation <- as.character(mb_followup$Operation)
mb_followup$Operation <- forcats::fct_recode(mb_followup$Operation,
               "Type3" = "3",
               "Type2" = "2",
               "Type1" = "1")


## Reordering mb_followup$Operation
mb_followup$Operation <- factor(mb_followup$Operation, levels=c("Type3", "Type2", "Type1"))

library(magrittr)
mb_followup %$% table(Operation, `op type (1,2,3)`) 

library(survival)
library(survminer)
library(finalfit)
mb_followup %>%
  finalfit::surv_plot('Surv(OverallTime, Outcome)', 'Operation', 
  xlab='Time (months)', pval=TRUE, legend = 'none',
  # pval.coord
    break.time.by = 12, xlim = c(0,60), ylim = c(0.8, 1)

# legend.labs = c('a','b')

)
explanatoryUni <- 'Operation'
dependentUni <- 'Surv(OverallTime, Outcome)'
mb_followup %>%
finalfit(dependentUni, explanatoryUni) -> tUni

knitr::kable(tUni[, 1:4], row.names=FALSE, align=c('l', 'l', 'r', 'r', 'r', 'r'))

tUni_df <- tibble::as_tibble(tUni, .name_repair = 'minimal') %>%
janitor::clean_names(dat = ., case = 'snake')


n_level <- dim(tUni_df)[1]

tUni_df_descr <- function(n) {
    paste0(
        'When ',
        tUni_df$dependent_surv_overall_time_outcome[1],
        ' is ',
        tUni_df$x[n + 1],
        ', there is ',
        tUni_df$hr_univariable[n + 1],
        ' times risk than ',
        'when ',
        tUni_df$dependent_surv_overall_time_outcome[1],
        ' is ',
        tUni_df$x[1],
        '.'
    )

}



results5 <- purrr::map(.x = c(2:n_level-1), .f = tUni_df_descr)

print(unlist(results5))

km_fit <- survfit(Surv(OverallTime, Outcome) ~ Operation, data = mb_followup)

# km_fit

# summary(km_fit)

km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>%
    janitor::clean_names(dat = ., case = 'snake') %>%
    tibble::rownames_to_column(.data = ., var = 'Derece')

km_fit_median_df

# km_fit_median_df %>% 
#   knitr::kable(format = "latex") %>% 
#   kableExtra::kable_styling(latex_options="scale_down")

km_fit_median_df %>%
    dplyr::mutate(
        description =
        glue::glue(
        'When, Derece, {Derece}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.'
)
    ) %>%
        dplyr::mutate(
description = gsub(pattern = 'thefactor=', replacement = ' is ', x = description)
        ) %>%
    dplyr::select(description) %>%
    dplyr::pull() -> km_fit_median_definition

# km_fit_median_definition




summary(km_fit, times = c(12,36,60))

km_fit_summary <- summary(km_fit, times = c(12,36,60))

km_fit_df <- as.data.frame(km_fit_summary[c('strata', 'time', 'n.risk', 'n.event', 'surv', 'std.err', 'lower', 'upper')])

km_fit_df %>% 
  knitr::kable(format = "latex") %>% 
  kableExtra::kable_styling(latex_options="scale_down")




km_fit_df %>%
    dplyr::mutate(
        description =
glue::glue(
    'When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].'
)
    ) %>%
    dplyr::select(description) %>%
    dplyr::pull() -> km_fit_definition

km_fit_definition

    survminer::pairwise_survdiff(
    formula = Surv(OverallTime, Outcome) ~ Operation,
    data = mb_followup,
    p.adjust.method = 'BH'
)
library(gt)
library(gtsummary)

library(survival)
fit1 <- survfit(Surv(ttdeath, death) ~ trt, trial)
tbl_strata_ex1 <-
  tbl_survival(
    fit1,
    times = c(12, 24),
    label = "{time} Months"
  )

fit2 <- survfit(Surv(ttdeath, death) ~ 1, trial)
tbl_nostrata_ex2 <-
  tbl_survival(
    fit2,
    probs = c(0.1, 0.2, 0.5),
    header_estimate = "**Months**"
  )





library(survival)
library(survminer)
library(finalfit)

mydata %>%
  finalfit::surv_plot('Surv(OverallTime, Outcome)', 'LVI', 
  xlab='Time (months)', pval=TRUE, legend = 'none',
    break.time.by = 12, xlim = c(0,60)

# legend.labs = c('a','b')

)
explanatoryUni <- 'LVI'
dependentUni <- 'Surv(OverallTime, Outcome)'
mydata %>%
finalfit(dependentUni, explanatoryUni, metrics=TRUE) -> tUni

knitr::kable(tUni[, 1:4], row.names=FALSE, align=c('l', 'l', 'r', 'r', 'r', 'r'))

tUni_df <- tibble::as_tibble(tUni, .name_repair = 'minimal') %>%
janitor::clean_names(dat = ., case = 'snake')


n_level <- dim(tUni_df)[1]

tUni_df_descr <- function(n) {
    paste0(
        'When ',
        tUni_df$dependent_surv_overall_time_outcome[1],
        ' is ',
        tUni_df$x[n + 1],
        ', there is ',
        tUni_df$hr_univariable[n + 1],
        ' times risk than ',
        'when ',
        tUni_df$dependent_surv_overall_time_outcome[1],
        ' is ',
        tUni_df$x[1],
        '.'
    )

}



results5 <- purrr::map(.x = c(2:n_level-1), .f = tUni_df_descr)

print(unlist(results5))

km_fit <- survfit(Surv(OverallTime, Outcome) ~ LVI, data = mydata)
km_fit

km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>%
    janitor::clean_names(dat = ., case = 'snake') %>%
    tibble::rownames_to_column(.data = ., var = 'LVI')



km_fit_median_df %>%
    dplyr::mutate(
        description =
        glue::glue(
        'When, LVI, {LVI}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.'
)
    ) %>%
        dplyr::mutate(
description = gsub(pattern = 'thefactor=', replacement = ' is ', x = description)
        ) %>%
    dplyr::select(description) %>%
    dplyr::pull() -> km_fit_median_definition

km_fit_median_definition




summary(km_fit, times = c(12,36,60))

km_fit_summary <- summary(km_fit, times = c(12,36,60))

km_fit_df <- as.data.frame(km_fit_summary[c('strata', 'time', 'n.risk', 'n.event', 'surv', 'std.err', 'lower', 'upper')])

km_fit_df




km_fit_df %>%
    dplyr::mutate(
        description =
glue::glue(
    'When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].'
)
    ) %>%
    dplyr::select(description) %>%
    dplyr::pull() -> km_fit_definition

km_fit_definition


summary(km_fit)$table

km_fit_median_df <- summary(km_fit)
results1html <- as.data.frame(km_fit_median_df$table) %>%
    janitor::clean_names(dat = ., case = 'snake') %>%
    tibble::rownames_to_column(.data = ., var = 'LVI')

results1html[,1] <- gsub(pattern = 'thefactor=',
 replacement = '',
 x = results1html[,1])

knitr::kable(results1html,
 row.names = FALSE,
 align = c('l', rep('r', 9)),
 format = 'html',
 digits = 1)

    survminer::pairwise_survdiff(
    formula = formula_p,
    data = self$data,
    p.adjust.method = 'BH'
)
library("shiny")
library("dplyr")
library("magrittr")
library("viridis")
library("readxl")
library("survival")
library("survminer")
library("finalfit")
library("glue")
mydata <- readxl::read_excel(here::here("data", "mydata.xlsx"))
mydata$int <- lubridate::interval(
  lubridate::ymd(mydata$SurgeryDate),
  lubridate::ymd(mydata$LastFollowUpDate)
  )
mydata$OverallTime <- lubridate::time_length(mydata$int, "month")
mydata$OverallTime <- round(mydata$OverallTime, digits = 1)

mydata$Outcome <- forcats::fct_recode(as.character(mydata$Death),
               "1" = "TRUE",
               "0" = "FALSE")

mydata$Outcome <- as.numeric(as.character(mydata$Outcome))

mydata %>% 
  select(-ID,
         -Name) %>% 
  inspectdf::inspect_types() %>% 
  dplyr::filter(type == "character") %>% 
  dplyr::select(col_name) %>% 
  pull() %>% 
  unlist() -> characterVariables
selectInput(
  inputId = "Factor",
  label = "Choose a Factor Affecting Survival",
  choices = characterVariables,
  selected = "LVI"
)


dependentKM <- "Surv(OverallTime, Outcome)"


renderPrint({

  print(input$Factor)


})

tags$b("Kaplan-Meier Plot, Log-Rank Test")
tags$br()

renderPlot({


  mydata %>%
    finalfit::surv_plot(
      .data = .,
      dependent = dependentKM,
      explanatory = input$Factor,
      xlab = 'Time (months)',
      pval = TRUE,
      legend = 'none',
      break.time.by = 12,
      xlim = c(0, 60)
    )

})



tags$b("Univariate Cox-Regression")
tags$br()


renderPrint({

  mydata %>%
    finalfit::finalfit(dependentKM, input$Factor) -> tUni

  knitr::kable(tUni[, 1:4],
               row.names = FALSE,
               align = c('l', 'l', 'r', 'r', 'r', 'r'))


})




tags$b("Median Survival")
tags$br()


renderPrint({

  formula_text <- paste0("Surv(OverallTime, Outcome) ~ ",input$Factor)

  km_fit <- survfit(as.formula(formula_text),
                              data = mydata)

  km_fit

})




tags$b("1-3-5-yr Survival")
tags$br()


renderPrint({

  formula_text <- paste0("Surv(OverallTime, Outcome) ~ ",input$Factor)

  km_fit <- survfit(as.formula(formula_text),
                              data = mydata)

  summary(km_fit, times = c(12, 36, 60))

})


renderPrint({


  formula_text <- paste0("Surv(OverallTime, Outcome) ~ ",input$Factor)

  km_fit <- survfit(as.formula(formula_text),
                              data = mydata)

km_fit_summary <- summary(km_fit, times = c(12,36,60))

km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", "surv", "std.err", "lower", "upper")])

km_fit_df %>% 
  dplyr::mutate(
    description = 
      glue::glue(
      "When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI]."
    )
  ) %>% 
  dplyr::select(description) %>% 
  pull()

})

#  https://easystats.github.io/correlation/
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
library("correlation")
correlation::correlation(iris)
library(dplyr)

iris %>% 
  select(Species, starts_with("Sepal")) %>% 
  group_by(Species) %>% 
  correlation::correlation() %>% 
  filter(r < 0.9)

correlation::correlation(select(iris, Species, starts_with("Sepal")),
            select(iris, Species, starts_with("Petal")),
            partial=TRUE)

correlation(iris, bayesian=TRUE)
library(report)
iris %>% 
  select(starts_with("Sepal")) %>% 
  correlation::correlation(bayesian=TRUE) %>% 
  report()
report::report(cor.test(iris$Sepal.Length, iris$Petal.Length))

iris %>% 
  group_by(Species) %>% 
  correlation() %>% 
  report() %>% 
  to_table()
iris %>% explore(Sepal.Length, Petal.Length)

iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
iris %>% explore(Sepal.Length, Petal.Length, target = is_versicolor)
dlookr::correlate(carseats)
dlookr::correlate(carseats, Sales, CompPrice, Income)
dlookr::correlate(carseats, Sales:Income)
dlookr::correlate(carseats, -(Sales:Income))
carseats %>%
  dlookr::correlate(Sales:Income) %>%
  dplyr::filter(as.integer(var1) > as.integer(var2))
carseats %>%
  dplyr::filter(ShelveLoc == "Good") %>%
  group_by(Urban, US) %>%
  dlookr::correlate(Sales) %>%
  dplyr::filter(abs(coef_corr) > 0.5)
dlookr::plot_correlate(carseats)
dlookr::plot_correlate(carseats, Sales, Price)
carseats %>%
  dplyr::filter(ShelveLoc == "Good") %>%
  dplyr::group_by(Urban, US) %>%
  dlookr::plot_correlate(Sales)
## Summary statistics by – overall with correlation
SmartEDA::ExpNumStat(
  Carseats,
  by = "A",
  gp = "Price",
  Qnt = seq(0, 1, 0.1),
  MesofShape = 1,
  Outlier = TRUE,
  round = 2
)
# https://alastairrushworth.github.io/inspectdf/articles/pkgdown/inspect_cor_exampes.html
inspectdf::inspect_cor(storms)

inspectdf::inspect_cor(storms) %>% inspectdf::show_plot()

inspectdf::inspect_cor(storms, storms[-c(1:200), ])

inspectdf::inspect_cor(storms, storms[-c(1:200), ]) %>% 
  slice(1:20) %>%
  inspectdf::show_plot()

cor %>% 
    report::to_values()
mydata %>%
  select(continiousVariables,
         -dateVariables) %>% 
visdat::vis_cor()
library(report)
model <- lm(Sepal.Length ~ Species, data = iris)
report::report(model)
# Table report for a linear model
lm(Sepal.Length ~ Petal.Length + Species, data=iris) %>% 
  report::report() %>% 
  report::to_table() %>% 
  kableExtra::kable()
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  summarizer(dependent, explanatory)
num_cat <- dlookr::relate(num, ShelveLoc)
num_cat
summary(num_cat)
plot(num_cat)

my_text <- kableExtra::text_spec("Some Text", 
                                 color = "red",
                                 background = "yellow"
                                 )
# `r my_text`
mylongtext <- paste("İstatistik Metod:

Sürekli verilerin ortalama, standart sapma, median, minimum ve maksimum değerleri verildi. Kategorik veriler ve gruplanan sürekli veriler için frekans tabloları oluşturuldu. Genel sağkalım analizinde ölüm tarihi ve son başvuru tarihi hasta dosyalarından elde edildi. 
Sağkalım analizinde Kaplan-Meier grafikleri, Log-rank testi ve Cox-Regresyon testleri uygulandı. Analizler R-project (version 3.6.0) ve RStudio ile survival ve finalfit paketleri kullanılarak yapıldı. p değeri 0.05 düzeyinde anlamlı olarak kabul edildi.


R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Therneau T (2015). A Package for Survival Analysis in S. version 2.38, https://CRAN.R-project.org/package=survival

Terry M. Therneau, Patricia M. Grambsch (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York. ISBN 0-387-98784-3.

Ewen Harrison, Tom Drake and Riinu Ots (2019). finalfit: Quickly Create Elegant Regression Results Tables and Plots when Modelling. R package version 0.9.6. https://github.com/ewenharrison/finalfit"
)


mylongtext <- strwrap(mylongtext)

# `r mylongtext`

boxplot(1:10)
plot(rnorm(10))
ggplot2::ggplot(mtcars,
                ggplot2::aes(x=mpg)
                ) + 
ggplot2::geom_histogram(fill="skyblue", alpha=0.5) + 
ggplot2::theme_minimal()
Block rmdnote

Block rmdtip

Block warning

projectName <- list.files(path = here::here(), pattern = "Rproj")
projectName <- gsub(pattern = ".Rproj", replacement = "", x = projectName)

analysisDate <- as.character(Sys.Date())

imageName <- paste0(projectName, analysisDate, ".RData")

save.image(file = here::here("data", imageName))

rdsName <- paste0(projectName, analysisDate, ".rds")

readr::write_rds(x = mydata, path = here::here("data", rdsName))

saveRDS(object = mydata, file = here::here("data", rdsName))

excelName <- paste0(projectName, analysisDate, ".xlsx")

rio::export(
  x = mydata,
  file = here::here("data", excelName),
  format = "xlsx"
)

# writexl::write_xlsx(mydata, here::here("data", excelName))

print(glue::glue(
    "saved data after analysis to ",
    rownames(file.info(here::here("data", excelName))),
    " : ",
    as.character(
        file.info(here::here("data", excelName))$ctime
    )
    )
)
mydata %>% 
  downloadthis::download_this(
    output_name = excelName,
    output_extension = ".csv",
    button_label = "Download data as csv",
    button_type = "default"
  )

mydata %>% 
  downloadthis::download_this(
    output_name = excelName,
    output_extension = ".xlsx",
    button_label = "Download data as xlsx",
    button_type = "primary"
  )
# pacman::p_load(here, lubridate, glue)
# here::here("data", glue("{today()}_trends.csv"))
# mydata %>% select(
#     -c(
#         rapor_yil,
#         rapor_no,
#         protokol_no,
#         istek_tarihi,
#         nux_yada_met_varsa_tarihi,
#         son_hastane_vizit_tarihi,
#         Outcome
#     )
# ) -> finalSummary
# 
# summarytools::view(summarytools::dfSummary(x = finalSummary
#                                            , style = "markdown"))
citation()
report::cite_packages(session = sessionInfo())
report::show_packages(session = sessionInfo()) %>% 
    kableExtra::kable()
# citation("tidyverse")
citation("readxl")
citation("janitor")
# citation("report")
citation("finalfit")
# citation("ggstatsplot")
if(!dir.exists(here::here("bib"))) {dir.create(here::here("bib"))}

knitr::write_bib(x = c(.packages(), "knitr", "shiny"),
                 file = here::here("bib", "packages.bib")
)
sessionInfo()
pacman::p_loaded(all = TRUE)
search()
library()
installed.packages()[1:5, c("Package", "Version")]
installed.packages()

\pagebreak

push all changes to GitHub repository

source(file = here::here("R", "force_git.R"))

References

sbalci/histopathology-template documentation built on June 29, 2023, 5:52 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sbalci/histopathology-template Template of R Codes Used in Histopathology Research.

Report.md In sbalci/histopathology-template: Template of R Codes Used in Histopathology Research.

Introduction

Materials & Methods

Statistical Methods

Header Codes

Generate Fake Data

Import Data

Study Population

Report General Features

Ethics and IRB

Always Respect Patient Privacy

Define Variable Types

Find Key Columns

Find ID and key columns to exclude from analysis

Variable Types

Define Variable Types

Find character variables

Find categorical variables

Find continious variables

Find numeric variables

Find integer variables

Find list variables

Find date variables

Overview the Data

View Data

Overview / Exploratory Data Analysis (EDA)

Control Data

Explore Data

Statistical Analysis

Results

Data Dictionary

Clean and Recode Data

Impute Missing Data

impute

Missing Data

impute continious

impute categorical

impute outlier

transform

min -max

skewness

log

binning

optimal binning

standardize

data transformation report

inspectdf

Descriptive Statistics

Table One

Categorical Variables

Descriptive Statistics Sex

Descriptive Statistics Race

Descriptive Statistics PreinvasiveComponent

Descriptive Statistics LVI

Descriptive Statistics PNI

Descriptive Statistics Group

Descriptive Statistics Grade

Descriptive Statistics TStage

Descriptive Statistics LymphNodeMetastasis

Descriptive Statistics Grade_Level

Descriptive Statistics DeathTime

Split-Group Stats Categorical

Grouped Categorical

Continious Variables

Split-Group Stats Continious

Grouped Continious

Cross Tables

chi-square posthoc pairwise

rmngb

RVAideMemoire

Plots

Categorical Variables

Plots

Continious Variables

Interactive graphics {#interactive}

Alluvial

Hypothesis Tests

Tests of Normality

jamovi

sbalci/histopathology-template
Template of R Codes Used in Histopathology Research.

Report.md
In sbalci/histopathology-template: Template of R Codes Used in Histopathology Research.

Find `character` variables

Find `categorical` variables

Find `continious` variables

Find `numeric` variables

Find `integer` variables

Find `list` variables

Find `date` variables