title: "Histopathology Research Template" description: | Codes Used in Histopathology Research Data Report for Histopathology Research Example Using Random Generated Fakedata author: - name: Serdar Balci, MD, Pathologist url: https://sbalci.github.io/histopathology-template/ affiliation: serdarbalci.com affiliation_url: https://www.serdarbalci.com/ date: "2020-05-13" mail: drserdarbalci@gmail.com linkedin: "serdar-balci-md-pathologist" twitter: "serdarbalci" github: "sbalci" home: "https://www.serdarbalci.com/" header-includes: - \usepackage{pdflscape} - \newcommand{\blandscape}{\begin{landscape}} - \newcommand{\elandscape}{\end{landscape}} - \usepackage{xcolor} - \usepackage{afterpage} - \renewcommand{\linethickness}{0.05em} - \usepackage{booktabs} - \usepackage{sectsty} \allsectionsfont{\nohang\centering \emph} - \usepackage{float} - \usepackage{svg} always_allow_html: yes output: html_document: toc: yes toc_float: yes number_sections: yes fig_caption: yes keep_md: yes highlight: kate theme: readable code_folding: "hide" includes: after_body: _footer.html css: css/style.css prettydoc::html_pretty: theme: leonids highlight: vignette toc: true number_sections: yes css: css/style.css includes: after_body: _footer.html rmarkdown::html_vignette: css: - !expr system.file("rmarkdown/templates/html_vignette/resources/vignette.css", package = "rmarkdown") redoc::redoc: highlight_outputs: TRUE margins: 1 line_numbers: FALSE distill::distill_article: toc: true pdf_document: fig_caption: yes highlight: kate number_sections: yes toc: yes latex_engine: lualatex toc_depth: 5 keep_tex: yes includes: in_header: highlight_echo.tex vignette: > %\VignetteIndexEntry{Histopathology Research Template} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} bibliography: bib/template.bib
h1{ text-align: center; } h2{ text-align: center; } h3{ text-align: center; } h4{ text-align: center; } h4.date{ text-align: center; }
https://doi.org/10.5281/zenodo.3635430
Histopathology Research Template 🔬
Describe Materials and Methods as highlighted in [@Knijn2015].^[From Table 1: Proposed items for reporting histopathology studies. Recommendations for reporting histopathology studies: a proposal Virchows Arch (2015) 466:611–615 DOI 10.1007/s00428-015-1762-3 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460276/]
Describe patient characteristics, and inclusion and exclusion criteria
Describe treatment details
Describe the type of material used
Specify how expression of the biomarker was assessed
Describe the number of independent (blinded) scorers and how they scored
State the method of case selection, study design, origin of the cases, and time frame
Describe the end of the follow-up period and median follow-up time
Define all clinical endpoints examined
Specify all applied statistical methods
Describe how interactions with other clinical/pathological factors were analyzed
Codes for general settings.^[See childRmd/_01header.Rmd
file for other general settings]
Setup global chunk settings^[Change echo = FALSE
to hide codes after knitting and Change cache = TRUE
to knit quickly. Change error=TRUE
to continue rendering while errors are present.]
knitr::opts_chunk$set(
eval = TRUE,
echo = TRUE,
fig.path = here::here("figs/"),
message = FALSE,
warning = FALSE,
error = TRUE,
cache = TRUE,
comment = NA,
tidy = TRUE,
fig.width = 6,
fig.height = 4
)
library(knitr)
hook_output = knit_hooks$get("output")
knit_hooks$set(output = function(x, options) {
# this hook is used only when the linewidth option is not NULL
if (!is.null(n <- options$linewidth)) {
x = knitr:::split_lines(x)
# any lines wider than n should be wrapped
if (any(nchar(x) > n))
x = strwrap(x, width = n)
x = paste(x, collapse = "\n")
}
hook_output(x, options)
})
# linewidth css
pre:not([class]) {
color: #333333;
background-color: #cccccc;
}
# linewidth css
pre:not([class]) {
color: #333333;
background-color: #cccccc;
}
# linewidth css
pre.jamovitable{
color:black;
background-color: white;
margin-bottom: 35px;
}
pre.jamovitable{
color:black;
background-color: white;
margin-bottom: 35px;
}
jtable <- function(jobject, digits = 3) {
snames <- sapply(jobject$columns, function(a) a$title)
asDF <- jobject$asDF
tnames <- unlist(lapply(names(asDF), function(n) snames[[n]]))
names(asDF) <- tnames
kableExtra::kable(asDF, "html", table.attr = "class=\"jmv-results-table-table\"",
row.names = F, digits = 3)
}
Block rmdnote
Block rmdtip
Block warning
Load Library
see R/loadLibrary.R
for the libraries loaded.
source(file = here::here("R", "loadLibrary.R"))
Codes for generating fake data.^[See childRmd/_02fakeData.Rmd
file for other codes]
Generate Fake Data
This code generates a fake histopathological data. Some sources for fake data generation here^[Synthea The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures. BMC Med Inform Decis Mak 19, 44 (2019) doi:10.1186/s12911-019-0793-0] , here^[https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-019-0793-0] , here^[Synthetic Patient Generation] , here^[Basic Setup and Running] , here^[intelligent patient data generator (iPDG)] , here^[https://medium.com/free-code-camp/how-our-test-data-generator-makes-fake-data-look-real-ace01c5bde4a] , here^[https://forums.librehealth.io/t/demo-data-generation/203] , here^[https://mihin.org/services/patient-generator/] , and here^[lung, cancer, breast datası ile birleştir] .
Use this code to generate fake clinicopathologic data
source(file = here::here("R", "gc_fake_data.R"))
wakefield::table_heat(x = fakedata, palette = "Set1", flip = TRUE, print = TRUE)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/plot fake data-1.png)
Codes for importing data.^[See childRmd/_03importData.Rmd
file for other codes]
Read the data
library(readxl)
mydata <- readxl::read_excel(here::here("data", "mydata.xlsx"))
# View(mydata) # Use to view data after importing
Add code for import multiple data purrr reduce
Codes for reporting general features.^[See childRmd/_04briefSummary.Rmd
file for other codes]
Dataframe Report
# Dataframe report
mydata %>% dplyr::select(-contains("Date")) %>% report::report(.)
The data contains 250 observations of the following variables:
- ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others (0 missing)
- Name: 249 entries: Aceyn, n = 1; Adalaide, n = 1; Adidas, n = 1 and 246 others (1 missing)
- Sex: 2 entries: Male, n = 127; Female, n = 122 (1 missing)
- Age: Mean = 49.54, SD = 14.16, Median = , MAD = 17.79, range: [25, 73], Skewness = 0.00, Kurtosis = -1.15, 1 missing
- Race: 7 entries: White, n = 158; Hispanic, n = 38; Black, n = 30 and 4 others (1 missing)
- PreinvasiveComponent: 2 entries: Absent, n = 203; Present, n = 46 (1 missing)
- LVI: 2 entries: Absent, n = 147; Present, n = 102 (1 missing)
- PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing)
- Death: 2 levels: FALSE (n = 83, 33.20%); TRUE (n = 166, 66.40%) and missing (n = 1, 0.40%)
- Group: 2 entries: Treatment, n = 131; Control, n = 118 (1 missing)
- Grade: 3 entries: 3, n = 109; 1, n = 78; 2, n = 62 (1 missing)
- TStage: 4 entries: 4, n = 118; 3, n = 65; 2, n = 43 and 1 other (0 missing)
- AntiX_intensity: Mean = 2.39, SD = 0.66, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.63, Kurtosis = -0.65, 1 missing
- AntiY_intensity: Mean = 2.02, SD = 0.80, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.03, Kurtosis = -1.42, 1 missing
- LymphNodeMetastasis: 2 entries: Absent, n = 144; Present, n = 105 (1 missing)
- Valid: 2 levels: FALSE (n = 116, 46.40%); TRUE (n = 133, 53.20%) and missing (n = 1, 0.40%)
- Smoker: 2 levels: FALSE (n = 130, 52.00%); TRUE (n = 119, 47.60%) and missing (n = 1, 0.40%)
- Grade_Level: 3 entries: high, n = 109; low, n = 77; moderate, n = 63 (1 missing)
- DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101 (0 missing)
mydata %>% explore::describe_tbl()
250 observations with 21 variables
18 variables containing missings (NA)
0 variables with no variance
\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{
Always Respect Patient Privacy - Health Information Privacy^[https://www.hhs.gov/hipaa/index.html] - Kişisel Verilerin Korunması^[Kişisel verilerin kaydedilmesi ve kişisel verileri hukuka aykırı olarak verme veya ele geçirme Türk Ceza Kanunu'nun 135. ve 136. maddesi kapsamında bizim hukuk sistemimizde suç olarak tanımlanmıştır. Kişisel verilerin kaydedilmesi suçunun cezası 1 ila 3 yıl hapis cezasıdır. Suçun nitelikli hali ise, kamu görevlisi tarafından görevin verdiği yetkinin kötüye kullanılarak veya belirli bir meslek veya sanatın sağladığı kolaylıktan yararlanılarak işlenmesidir ki bu durumda suçun cezası 1.5 ile 4.5 yıl hapis cezası olacaktır.]
} }
Codes for defining variable types.^[See childRmd/_06variableTypes.Rmd
file for other codes]
print column names as vector
dput(names(mydata))
c("ID", "Name", "Sex", "Age", "Race", "PreinvasiveComponent",
"LVI", "PNI", "LastFollowUpDate", "Death", "Group", "Grade",
"TStage", "AntiX_intensity", "AntiY_intensity", "LymphNodeMetastasis",
"Valid", "Smoker", "Grade_Level", "SurgeryDate", "DeathTime")
vctrs::vec_assert()
dplyr::all_equal()
arsenal::compare()
visdat::vis_compare()
See the code as function in R/find_key.R
.
keycolumns <- mydata %>% sapply(., FUN = dataMaid::isKey) %>% tibble::as_tibble() %>%
dplyr::select(which(.[1, ] == TRUE)) %>% names()
keycolumns
[1] "ID" "Name"
Get variable types
mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types()
# A tibble: 4 x 4
type cnt pcnt col_name
<chr> <int> <dbl> <list>
1 character 11 57.9 <chr [11]>
2 logical 3 15.8 <chr [3]>
3 numeric 3 15.8 <chr [3]>
4 POSIXct POSIXt 2 10.5 <chr [2]>
mydata %>% dplyr::select(-keycolumns, -contains("Date")) %>% describer::describe() %>%
knitr::kable(format = "markdown")
|.column_name |.column_class |.column_type | .count_elements| .mean_value| .sd_value|.q0_value | .q25_value| .q50_value| .q75_value|.q100_value | |:--------------------|:-------------|:------------|---------------:|-----------:|----------:|:-------------|----------:|----------:|----------:|:-----------| |Sex |character |character | 250| NA| NA|Female | NA| NA| NA|Male | |Age |numeric |double | 250| 49.538153| 14.1595015|25 | 37| 49| 61|73 | |Race |character |character | 250| NA| NA|Asian | NA| NA| NA|White | |PreinvasiveComponent |character |character | 250| NA| NA|Absent | NA| NA| NA|Present | |LVI |character |character | 250| NA| NA|Absent | NA| NA| NA|Present | |PNI |character |character | 250| NA| NA|Absent | NA| NA| NA|Present | |Death |logical |logical | 250| NA| NA|FALSE | NA| NA| NA|TRUE | |Group |character |character | 250| NA| NA|Control | NA| NA| NA|Treatment | |Grade |character |character | 250| NA| NA|1 | NA| NA| NA|3 | |TStage |character |character | 250| NA| NA|1 | NA| NA| NA|4 | |AntiX_intensity |numeric |double | 250| 2.389558| 0.6636071|1 | 2| 2| 3|3 | |AntiY_intensity |numeric |double | 250| 2.016064| 0.7980211|1 | 1| 2| 3|3 | |LymphNodeMetastasis |character |character | 250| NA| NA|Absent | NA| NA| NA|Present | |Valid |logical |logical | 250| NA| NA|FALSE | NA| NA| NA|TRUE | |Smoker |logical |logical | 250| NA| NA|FALSE | NA| NA| NA|TRUE | |Grade_Level |character |character | 250| NA| NA|high | NA| NA| NA|moderate | |DeathTime |character |character | 250| NA| NA|MoreThan1Year | NA| NA| NA|Within1Year |
Plot variable types
mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>% inspectdf::show_plot()
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/variable type plot inspectdf-1.png)
# https://github.com/ropensci/visdat
# http://visdat.njtierney.com/articles/using_visdat.html
# https://cran.r-project.org/web/packages/visdat/index.html
# http://visdat.njtierney.com/
# visdat::vis_guess(mydata)
visdat::vis_dat(mydata)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/variable type plot visdat-1.png)
mydata %>% explore::explore_tbl()
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/variable type plot explore-1.png)
character
variablescharacterVariables <- mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>%
dplyr::filter(type == "character") %>% dplyr::select(col_name) %>% dplyr::pull() %>%
unlist()
characterVariables
[1] "Sex" "Race" "PreinvasiveComponent"
[4] "LVI" "PNI" "Group"
[7] "Grade" "TStage" "LymphNodeMetastasis"
[10] "Grade_Level" "DeathTime"
categorical
variablescategoricalVariables <- mydata %>% dplyr::select(-keycolumns, -contains("Date")) %>%
describer::describe() %>% janitor::clean_names() %>% dplyr::filter(column_type ==
"factor") %>% dplyr::select(column_name) %>% dplyr::pull()
categoricalVariables
character(0)
continious
variablescontiniousVariables <- mydata %>% dplyr::select(-keycolumns, -contains("Date")) %>%
describer::describe() %>% janitor::clean_names() %>% dplyr::filter(column_type ==
"numeric" | column_type == "double") %>% dplyr::select(column_name) %>% dplyr::pull()
continiousVariables
[1] "Age" "AntiX_intensity" "AntiY_intensity"
numeric
variablesnumericVariables <- mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>%
dplyr::filter(type == "numeric") %>% dplyr::select(col_name) %>% dplyr::pull() %>%
unlist()
numericVariables
[1] "Age" "AntiX_intensity" "AntiY_intensity"
integer
variablesintegerVariables <- mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>%
dplyr::filter(type == "integer") %>% dplyr::select(col_name) %>% dplyr::pull() %>%
unlist()
integerVariables
NULL
list
variableslistVariables <- mydata %>% dplyr::select(-keycolumns) %>% inspectdf::inspect_types() %>%
dplyr::filter(type == "list") %>% dplyr::select(col_name) %>% dplyr::pull() %>%
unlist()
listVariables
NULL
date
variablesis_date <- function(x) inherits(x, c("POSIXct", "POSIXt"))
dateVariables <- names(which(sapply(mydata, FUN = is_date) == TRUE))
dateVariables
[1] "LastFollowUpDate" "SurgeryDate"
Codes for overviewing the data.^[See childRmd/_07overView.Rmd
file for other codes]
View(mydata)
reactable::reactable(data = mydata, sortable = TRUE, resizable = TRUE, filterable = TRUE,
searchable = TRUE, pagination = TRUE, paginationType = "numbers", showPageSizeOptions = TRUE,
highlight = TRUE, striped = TRUE, outlined = TRUE, compact = TRUE, wrap = FALSE,
showSortIcon = TRUE, showSortable = TRUE)
{"x":{"tag":{"name":"Reactable","attribs":{"data":{"ID":["001","002","003","004","005","006","007","008","009","010","011","012","013","014","015","016","017","018","019","020","021","022","023","024","025","026","027","028","029","030","031","032","033","034","035","036","037","038","039","040","041","042","043","044","045","046","047","048","049","050","051","052","053","054","055","056","057","058","059","060","061","062","063","064","065","066","067","068","069","070","071","072","073","074","075","076","077","078","079","080","081","082","083","084","085","086","087","088","089","090","091","092","093","094","095","096","097","098","099","100","101","102","103","104","105","106","107","108","109","110","111","112","113","114","115","116","117","118","119","120","121","122","123","124","125","126","127","128","129","130","131","132","133","134","135","136","137","138","139","140","141","142","143","144","145","146","147","148","149","150","151","152","153","154","155","156","157","158","159","160","161","162","163","164","165","166","167","168","169","170","171","172","173","174","175","176","177","178","179","180","181","182","183","184","185","186","187","188","189","190","191","192","193","194","195","196","197","198","199","200","201","202","203","204","205","206","207","208","209","210","211","212","213","214","215","216","217","218","219","220","221","222","223","224","225","226","227","228","229","230","231","232","233","234","235","236","237","238","239","240","241","242","243","244","245","246","247","248","249","250"],"Name":["Ariatna","Jahzlynn","Keylani","Proctor","Ibette","Jerame","Sharenna","Guin","Hollylynn","Eleazar","Kyleen","Tashya","Jud","Kaizleigh","Kylea","Altia","Secilia","Nikolle","Elroy","Mardis","Xitlali","Mckail","Edgel","Shaynah","Sirenity","Asil","Ruston","Syndi","Erdene","Shantrel","Estil","Raygine","Arieah","Kelaya","Murland","Kolia","Nataylia","Nancylee","Iayana","Wynnell","Chasia","Vignesh","Kahri","Enaya","Zephyn","Kimar","Tyzhane","Chong","Tayte","Travail","Sujei","Sundra","Emmett","Dashun","Sujeiry","Eimaan","Yatharth","Cedrina","Nicteha","Nigel","Karmisha","Darleane","Lynnie","Yaretcy","Elion","Naydia","Bevely","Kaleem","Oluwatoyosi","Makston","Eldridge","Melah","Poppie","Lavonia","Jeremey","Kaos","Alaija","Malaila","Travante","Gevonte","Sherridan","Berra","Oluwafikayomi","Kynsie","Cylena","Yarizmar","Nada","Wilford","Regine","Kyheim","Clarese","Sylvania","Brandiss","Orlin","Shineka","Deserai","Chemika","Emelia","Johnisha","Sehajveer","Jacoria","Marquavion","Deema","Larencia","Jostein","Jeffery","Drennen","Dakotah","Wynonah","Valicia","Vihaa","Aceyn","Arbaz","Nyella","Caelen","Reather","Thuytrang","Jahla","Ihla","Ruqaya","Nataliz","Deylani","Kokoro","Niviah","Ladanian","Adalaide","Haim","Daveda","Stasi","Quian","Calven","Bradli","Katarina","Jameis","Cennie","Estoria","Jayceyon","Kemori","Constanc","Christianson","Saprina","Medeline","Kynzley","Eribella","Naleigha","Taidyn","Takanori","Ulissa","Makamae","Kristianne","Tracilynn","Terressa","Dorrance","Girlee","Brilliant","Toie","Fredonia","Ryllie","Damion","Theryn","Keneka","Charmella","Ronicia","Aramis","Infantof","Kandus","Dezon","Shambrica","Nealy","Shanitta","Delba","Orphia","Lamariya","Eliuth",null,"Shanekqa","Moretta","Suleica","Kumiko","Zainub","Versia","Dhane","Minnetta","Jaron","Dache","Janat","Maralou","Rhodney","Jazlyne","Camerynn","Dakota","Ramla","Dicki","Loic","Eygpt","Maita","Alicha","Jaleen","Rikeisha","Kentay","Takwon","Nadeane","Karneisha","Helenann","Jathziry","Jeune","Crosslyn","Bonna","Tamilla","Keshawnda","Correna","Immer","Naomigrace","Paitlynn","Jannice","Phillipmichael","Nessie","Keyson","Kyanna","Lillyth","Quanasia","Prisicilla","Teraji","Ehud","Jayva","Rosiland","Anastasia","Hydeia","Kaylen","Angelena","Eadon","Zaiyah","Sathvik","Adidas","Daniale","Santez","Harker","Bricia","Reyaansh","Deklyn","Trestin","Taylan","Elyzza","Krissandra","Kindsay","Railynne","Danixa","Sam","Nylen","Jaricka"],"Sex":["Female","Female","Female","Female","Male","Female","Female","Male","Male","Female","Male","Female","Female","Male","Female","Male","Female","Female","Female","Male","Female","Female","Male","Female","Female","Female","Male","Female","Male","Male","Female","Male","Female","Female","Female","Female","Female","Male","Female","Female","Male","Male","Male","Male","Female","Male","Male","Male","Male","Male","Female","Male","Female","Female","Male","Female","Female","Female","Female","Female","Male","Male","Male","Female","Female","Female","Male","Male","Male","Male","Female","Male","Male","Female","Male","Female","Male","Male","Female","Female","Female","Female","Female","Female","Male","Male","Female","Male","Female","Male","Female","Female","Male","Male","Male","Male","Male","Male","Female","Male","Male","Male","Female","Female","Female","Male","Female","Male","Male","Female","Male","Female","Female","Female","Male","Male","Female","Male","Male","Female","Female","Female","Male","Male","Female","Male","Female","Female","Male","Female","Female","Male","Female","Male","Female","Male","Female","Male","Female","Female","Female","Female","Male","Female","Male","Female","Male","Female","Male","Male","Male","Female","Female","Male","Male","Female","Female","Male","Male","Male","Male","Female","Male","Female","Male","Male","Male","Female","Female","Female","Female","Male","Female","Male","Female","Female","Female","Male","Male","Male","Male","Female","Male","Male","Female","Male","Female","Female","Male","Male","Female","Male","Female","Male","Male","Female","Female","Female","Female","Male","Male","Female","Male","Male","Male","Male","Female","Female","Male","Female","Male","Male","Male","Female","Male","Female","Male","Male","Male","Female","Male","Male","Male","Female","Male","Female","Male","Female","Male","Female","Female","Female","Male","Male","Female",null,"Female","Male","Female","Male","Male","Male","Female","Male","Male","Female","Male","Male","Male","Male"],"Age":[30,32,53,57,47,58,59,54,35,27,53,55,72,51,46,65,58,34,54,45,59,26,65,44,49,25,72,26,54,73,63,40,44,58,62,51,61,60,29,32,61,68,68,44,72,51,40,62,32,40,53,28,53,59,55,51,57,48,28,33,42,43,"NA",65,60,26,52,46,40,32,32,70,72,49,30,71,42,49,53,55,72,54,68,47,67,36,54,72,64,65,48,32,56,51,66,54,43,30,52,42,58,33,43,38,56,61,42,46,28,37,54,65,53,27,73,70,32,48,39,68,52,34,47,28,49,43,29,67,41,44,31,73,29,72,73,68,32,41,36,67,61,36,30,73,41,49,35,60,41,71,69,35,41,46,41,38,27,44,70,67,71,61,68,36,41,33,31,46,31,72,72,60,58,33,54,33,34,34,35,56,39,47,62,40,66,72,71,73,46,59,34,69,29,40,59,31,45,49,68,26,55,52,42,58,59,43,45,33,32,25,36,64,63,34,35,59,70,52,44,43,58,48,25,61,57,25,50,59,71,63,46,67,31,54,71,59,40,48,55,60,48,40,57,50,25,28,72,54,66,34],"Race":["White","White","White","Hispanic","White","White","Black","White","White","Native","White","Asian","White","White","White","Hispanic","White","White","White","White","Hispanic","White","White","White","Black","White","Black","White","White","Black","White","Hispanic","White","White","White","White","Asian","White","White","White","White","White","White","Black","White","Hispanic","White","Asian","Hispanic","Native","White","White","Black","Asian","White","White","Hispanic","Hispanic","White","White","White","White","Black","White","White","Bi-Racial","White","Asian","White","White","Black","White","Hispanic","Black","White","White","Hispanic","Bi-Racial","Black","White","White","White","White","Hispanic","Hispanic","Hispanic","White","White","White","Black","Black","Black","White","Black","Black","Black","White","White","Black","Hispanic","White","White","Hispanic","Black","Asian","Black","White","Bi-Racial","White","White","White","Bi-Racial","Asian","White","Hispanic","Hispanic","White","Hispanic","White","White","White","White","White","White","White","Bi-Racial","Asian","White","Black","White","White","Hispanic","White","White","Hispanic","White","White","White","Black","White","White","Black","Hispanic","White","Hispanic","White","White","White","Other","White","White","White","White","Hispanic","Asian","White","Hispanic","White","White","White","White","White","White","White","White","White","Asian","White","Black","White","White","White","White","White","White","Asian","Black","Black","White","White","Black","Hispanic","White","White","White","White","Asian","White","Hispanic","White","Black","White","White","White","Hispanic","Hispanic","Asian","White","White","Hispanic","White","Hispanic","White","White","White","White","Hispanic","White","White","Hispanic","White",null,"White","White","White","White","Hispanic","White","Black","White","White","White","White","White","White","White","White","White","Black","Asian","White","White","White","White","White","White","Asian","Hispanic","White","White","Hispanic","Hispanic","White","Black","White","Hispanic","White","White","White","Hispanic"],"PreinvasiveComponent":["Absent","Absent","Absent","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Present",null,"Absent","Present","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Present","Present","Absent","Present","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Present","Present","Absent","Absent","Present","Absent","Present","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Present","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Present","Absent","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Present","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Present","Present","Absent","Absent","Absent","Absent"],"LVI":["Present","Absent","Absent","Present","Absent","Absent","Present","Absent","Present","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Present","Absent","Present","Present","Absent","Absent","Present","Present","Absent","Present","Present","Absent","Present","Absent","Absent","Present","Absent","Absent","Absent","Present","Present","Absent","Present","Absent","Absent","Absent","Present","Present","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Absent","Present","Present","Present","Present","Absent","Present","Absent","Present","Present","Present","Absent","Present","Absent","Present","Absent","Present","Present","Present","Present","Present","Present","Present","Present","Present","Present","Absent","Absent","Present","Present","Present","Absent","Absent","Present","Present","Present","Present","Absent","Present","Present","Present","Present","Absent","Absent","Present","Present","Present","Present","Present","Absent","Present","Absent","Absent","Absent","Present","Present","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Present","Present","Present","Present","Present","Present","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Present","Absent","Present","Absent","Present","Present","Present","Present","Absent","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent",null,"Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Present","Absent","Present","Absent","Present","Absent","Absent","Present","Absent","Absent","Absent","Present","Present","Present","Present","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Present","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent"],"PNI":["Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Present","Absent","Absent","Present","Absent","Absent","Present","Absent","Present","Present","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Present","Absent","Absent","Present","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Absent","Present","Present","Present","Present","Absent","Absent","Present","Absent","Present","Absent","Present","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Present","Present","Absent","Present","Absent","Present","Absent","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Absent","Present","Present","Absent","Present","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Present","Present","Present","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Present","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Present","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Present","Present","Present","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Present","Present","Present","Present","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent",null,"Absent","Present","Absent","Absent","Present","Absent","Absent","Present","Absent","Absent","Absent","Absent","Present","Absent","Absent","Present","Present","Absent","Present"],"LastFollowUpDate":["2019-09-26T00:00:00","2019-04-26T00:00:00","2019-10-26T00:00:00","2019-05-26T00:00:00","2019-08-26T00:00:00","2019-05-26T00:00:00","2019-04-26T00:00:00","2019-07-26T00:00:00","2019-06-26T00:00:00","2019-10-26T00:00:00","2019-12-26T00:00:00","2019-04-26T00:00:00","2019-11-26T00:00:00","2019-11-26T00:00:00","2019-10-26T00:00:00","2019-09-26T00:00:00","2019-11-26T00:00:00","2019-04-26T00:00:00","2019-04-26T00:00:00","2019-07-26T00:00:00","2019-03-26T00:00:00","2019-03-26T00:00:00","2020-01-26T00:00:00","2019-03-26T00:00:00","2020-02-26T00:00:00","2019-04-26T00:00:00","2019-06-26T00:00:00","2019-04-26T00:00:00","2020-02-26T00:00:00","2019-07-26T00:00:00","2019-10-26T00:00:00","2019-12-26T00:00:00","2020-01-26T00:00:00","2019-12-26T00:00:00","2019-08-26T00:00:00","2019-04-26T00:00:00","2019-06-26T00:00:00","2019-08-26T00:00:00","2020-01-26T00:00:00","2020-01-26T00:00:00","2020-02-26T00:00:00","2019-12-26T00:00:00","2019-11-26T00:00:00","2019-10-26T00:00:00","2019-10-26T00:00:00","2019-10-26T00:00:00","2019-08-26T00:00:00","2019-08-26T00:00:00","2019-09-26T00:00:00","2019-03-26T00:00:00","2019-06-26T00:00:00","2019-05-26T00:00:00","2019-04-26T00:00:00","2019-08-26T00:00:00","2019-03-26T00:00:00","2019-09-26T00:00:00","2019-11-26T00:00:00","2019-12-26T00:00:00","2019-03-26T00:00:00","2019-07-26T00:00:00","2020-02-26T00:00:00","2019-06-26T00:00:00","2019-09-26T00:00:00","2019-06-26T00:00:00","2019-06-26T00:00:00","2019-11-26T00:00:00","2020-02-26T00:00:00","2020-01-26T00:00:00","2020-02-26T00:00:00","2019-06-26T00:00:00","2019-04-26T00:00:00","2020-02-26T00:00:00","2019-07-26T00:00:00","2019-06-26T00:00:00","2019-10-26T00:00:00","2019-08-26T00:00:00","2019-05-26T00:00:00","2019-07-26T00:00:00","2019-07-26T00:00:00","2019-08-26T00:00:00","2019-04-26T00:00:00","2020-01-26T00:00:00","2019-12-26T00:00:00","2019-10-26T00:00:00","2020-02-26T00:00:00","2019-04-26T00:00:00","2019-09-26T00:00:00","2019-08-26T00:00:00","2019-12-26T00:00:00","2019-03-26T00:00:00","2019-06-26T00:00:00","2019-07-26T00:00:00","2019-07-26T00:00:00","2019-04-26T00:00:00","2020-02-26T00:00:00","2020-02-26T00:00:00","2019-08-26T00:00:00","2019-12-26T00:00:00","2019-06-26T00:00:00","2019-11-26T00:00:00","2019-11-26T00:00:00","2019-09-26T00:00:00","2019-03-26T00:00:00","2019-09-26T00:00:00","2019-11-26T00:00:00","2019-04-26T00:00:00","2020-02-26T00:00:00","2019-07-26T00:00:00","2019-07-26T00:00:00","2019-05-26T00:00:00","2019-12-26T00:00:00","2019-05-26T00:00:00","2020-02-26T00:00:00","2019-03-26T00:00:00","2020-01-26T00:00:00","2019-08-26T00:00:00","2019-05-26T00:00:00","2019-10-26T00:00:00","2019-06-26T00:00:00","2019-12-26T00:00:00","2019-06-26T00:00:00","2019-11-26T00:00:00","2019-09-26T00:00:00","2019-12-26T00:00:00","2020-01-26T00:00:00","2019-12-26T00:00:00","2019-10-26T00:00:00","2019-12-26T00:00:00","2019-05-26T00:00:00","2019-04-26T00:00:00","2019-08-26T00:00:00","2019-10-26T00:00:00","2019-06-26T00:00:00","2019-11-26T00:00:00","2019-11-26T00:00:00","2019-06-26T00:00:00","2019-03-26T00:00:00","2019-03-26T00:00:00","2019-11-26T00:00:00","2019-07-26T00:00:00","2019-08-26T00:00:00","2019-12-26T00:00:00","2019-11-26T00:00:00","2019-12-26T00:00:00","2019-07-26T00:00:00","2019-05-26T00:00:00","2020-01-26T00:00:00","2019-05-26T00:00:00","2019-10-26T00:00:00","2019-07-26T00:00:00","2019-10-26T00:00:00","2019-12-26T00:00:00","2019-08-26T00:00:00","2019-11-26T00:00:00","2019-09-26T00:00:00","2019-03-26T00:00:00","2019-12-26T00:00:00","2020-02-26T00:00:00","2019-11-26T00:00:00","2019-09-26T00:00:00","2019-06-26T00:00:00","2019-08-26T00:00:00","2019-11-26T00:00:00","2019-03-26T00:00:00","2019-12-26T00:00:00","2019-07-26T00:00:00","2020-01-26T00:00:00","2019-10-26T00:00:00","2019-04-26T00:00:00","2019-09-26T00:00:00","2019-03-26T00:00:00","2019-11-26T00:00:00","2019-10-26T00:00:00","2019-06-26T00:00:00","2019-11-26T00:00:00","2020-01-26T00:00:00","2019-08-26T00:00:00","2019-03-26T00:00:00","2019-06-26T00:00:00","2020-02-26T00:00:00","2019-10-26T00:00:00","2020-02-26T00:00:00","2019-12-26T00:00:00","2020-01-26T00:00:00","2020-01-26T00:00:00","2019-06-26T00:00:00","2019-03-26T00:00:00","2020-01-26T00:00:00","2019-08-26T00:00:00","2020-02-26T00:00:00","2019-04-26T00:00:00","2019-08-26T00:00:00","2019-07-26T00:00:00","2020-01-26T00:00:00","2019-10-26T00:00:00","2019-06-26T00:00:00","2020-01-26T00:00:00","2019-10-26T00:00:00","2019-11-26T00:00:00","2019-03-26T00:00:00","2019-05-26T00:00:00","2019-10-26T00:00:00","2019-04-26T00:00:00","2019-10-26T00:00:00","2020-02-26T00:00:00","2019-09-26T00:00:00","2019-04-26T00:00:00","2019-12-26T00:00:00","2019-09-26T00:00:00","2019-11-26T00:00:00","2019-03-26T00:00:00","2019-07-26T00:00:00","2019-03-26T00:00:00","2019-06-26T00:00:00","2019-07-26T00:00:00","2019-07-26T00:00:00","2019-08-26T00:00:00","2019-05-26T00:00:00","2019-03-26T00:00:00","2019-06-26T00:00:00","2019-05-26T00:00:00","2019-10-26T00:00:00","2020-02-26T00:00:00","2019-09-26T00:00:00","2019-08-26T00:00:00","2020-01-26T00:00:00","2020-02-26T00:00:00","2020-01-26T00:00:00","2019-09-26T00:00:00","2019-10-26T00:00:00","2019-03-26T00:00:00","2019-05-26T00:00:00","2020-01-26T00:00:00","2019-12-26T00:00:00","2019-08-26T00:00:00","2019-06-26T00:00:00",null,"2019-05-26T00:00:00","2019-09-26T00:00:00","2020-02-26T00:00:00","2020-02-26T00:00:00","2020-02-26T00:00:00","2019-07-26T00:00:00","2019-03-26T00:00:00","2020-02-26T00:00:00","2019-04-26T00:00:00","2020-01-26T00:00:00","2019-10-26T00:00:00","2019-12-26T00:00:00","2019-08-26T00:00:00"],"Death":[false,true,true,true,true,false,false,true,true,true,true,true,true,true,true,true,true,null,true,true,false,false,false,false,true,true,false,true,true,true,true,false,true,true,false,false,true,true,true,true,true,true,false,true,true,true,true,false,true,false,false,false,true,false,true,false,true,true,true,true,true,true,true,true,false,true,false,false,true,true,true,true,true,true,true,true,true,false,false,true,true,true,false,true,false,false,true,true,false,false,true,false,true,true,true,false,true,false,false,true,false,false,true,true,false,true,false,true,true,true,true,true,true,true,true,false,true,true,true,true,true,true,false,true,true,false,false,true,true,true,false,true,false,true,true,true,true,true,false,true,true,false,true,false,false,true,true,true,true,false,false,true,true,false,true,true,false,true,false,false,true,true,true,false,false,false,true,true,false,false,true,false,true,true,true,true,true,true,true,false,false,false,false,false,true,true,false,false,true,true,false,false,true,false,true,true,true,true,true,false,true,true,true,false,true,true,true,true,true,true,true,false,true,true,false,false,false,false,true,false,true,true,true,true,true,true,true,false,true,false,true,true,true,true,true,true,true,true,false,false,true,false,true,true,true,false,true,false,true,true],"Group":["Control","Control","Control","Control","Control","Control","Control","Control","Treatment","Control","Control","Treatment","Control","Treatment","Treatment","Control","Control","Treatment","Treatment","Treatment","Control","Treatment","Treatment","Control","Treatment","Control","Treatment","Treatment","Control","Treatment","Control","Treatment","Treatment","Treatment","Control","Control","Treatment","Treatment","Control","Control","Control","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment","Control","Control","Treatment","Treatment","Treatment","Treatment","Control","Control","Treatment","Control","Treatment","Control","Treatment","Treatment","Control","Control","Treatment","Control","Control","Treatment","Control","Control","Control","Treatment","Treatment","Treatment","Treatment","Control","Treatment","Control","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment","Control","Control","Control","Control","Treatment","Control","Control","Treatment","Treatment","Control","Control","Control","Treatment","Treatment","Control","Control","Treatment","Treatment","Control","Treatment","Control","Control","Control","Treatment","Control","Treatment","Treatment","Control","Treatment","Control","Treatment","Treatment","Treatment","Control","Treatment","Control","Treatment","Control","Control","Control","Treatment","Treatment","Treatment","Treatment","Treatment","Control","Control","Treatment","Control","Control","Treatment","Control","Treatment","Treatment","Treatment","Treatment","Control","Treatment","Control","Treatment","Control","Control","Treatment","Treatment","Treatment","Treatment","Control","Treatment","Control","Treatment","Control","Control","Treatment","Control","Treatment","Treatment","Control","Control","Control","Treatment","Treatment","Treatment","Control","Control","Treatment","Treatment","Treatment","Treatment","Control","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment","Control","Control","Control","Control","Control","Control","Control","Control","Control","Treatment","Control","Control","Treatment","Treatment","Treatment","Treatment","Control","Control","Treatment","Treatment","Control","Treatment","Control","Control","Control","Control","Control","Treatment","Treatment","Treatment","Control","Treatment","Treatment","Control","Control","Control","Control","Control","Treatment","Treatment","Control","Control","Treatment","Treatment","Treatment","Treatment","Treatment","Control","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment","Control",null,"Control","Control","Control","Control","Control","Control","Treatment","Treatment","Control","Treatment","Control","Treatment","Treatment"],"Grade":["1","1","2","1","2","2","3","1","2","1","2","3","3","3","2","3","3","3","3","3","3","1","1","1","3","3","2","3","3","3","1","3","1","3","1","3","3","3","3","2","3","3","3","1","2","1","1","3","3","2","2","1","1","3","1","3","2","1","3","3","3","3","3","1","3","3","1","1","3","3","1","3","2","1","3","1","3","3","3","1","2","1","2","2","3","2","1","1","3","2","1","3","2","1","2","1","3","3",null,"1","2","1","3","3","3","3","1","3","3","1","2","2","2","3","2","2","3","1","1","3","2","3","3","2","3","1","2","1","1","1","3","3","3","1","2","3","1","1","3","2","1","2","2","1","3","2","1","1","2","2","3","3","3","2","2","1","3","2","3","1","2","1","3","1","3","1","2","3","3","1","2","1","1","1","3","3","1","1","2","1","3","2","1","1","2","1","3","3","3","3","3","1","2","2","3","1","3","2","2","1","3","3","3","3","3","2","1","3","1","3","3","3","3","2","2","2","3","2","2","1","3","2","1","1","1","3","3","3","3","2","1","2","2","3","1","2","2","1","1","1","3","3","3","3","2","3","1","3","3","3"],"TStage":["4","4","3","3","1","3","3","3","4","4","4","3","2","4","2","3","3","4","4","2","2","4","1","4","3","4","3","4","4","4","4","4","4","4","3","4","2","2","4","3","4","1","4","1","4","4","2","4","4","4","4","4","2","4","4","4","1","2","4","4","4","4","4","3","4","4","4","4","1","3","4","4","4","4","4","2","4","3","2","4","4","3","3","1","4","4","2","4","4","2","2","4","1","4","4","3","3","4","2","4","4","4","4","4","3","4","4","1","3","3","3","4","4","3","3","4","2","3","4","4","4","4","4","1","4","2","3","2","3","3","2","4","4","4","4","2","4","4","3","2","4","4","4","4","3","2","4","4","4","4","1","2","3","4","3","3","2","1","4","2","3","2","4","3","3","1","3","4","4","2","4","3","4","3","2","1","1","3","4","3","1","3","3","2","4","4","3","3","3","4","2","3","4","1","4","3","1","4","1","1","1","1","2","3","4","3","3","4","4","4","3","4","2","2","3","3","4","4","3","2","4","3","3","3","4","1","3","1","4","3","2","2","4","4","2","3","3","4","2","3","4","2","2","2","4","3","2","4","2","4"],"AntiX_intensity":[2,2,2,2,3,1,1,3,2,3,2,3,1,3,1,2,3,2,3,3,3,3,2,3,2,2,3,3,3,2,3,3,2,2,1,3,2,3,3,2,3,2,3,2,3,2,3,1,3,1,2,3,2,2,2,2,2,3,2,3,2,3,3,2,1,3,2,3,3,3,2,3,3,3,3,3,3,3,2,2,3,2,3,3,3,2,3,1,3,1,2,3,3,3,3,2,3,3,2,"NA",3,3,3,3,3,2,3,2,3,3,3,3,2,3,2,1,3,2,2,2,2,2,1,1,3,3,3,3,3,2,3,3,2,2,3,3,3,3,2,2,2,2,3,2,2,3,2,2,3,3,2,1,2,3,2,2,3,3,2,3,2,1,2,3,2,3,3,1,2,2,3,3,2,2,3,3,3,3,2,2,2,3,2,2,3,3,2,3,2,2,2,2,1,1,1,3,1,2,3,3,1,2,2,3,1,2,3,3,3,2,1,3,2,3,3,3,3,3,3,3,2,2,2,2,3,2,2,3,1,3,2,2,3,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,2,1],"AntiY_intensity":[2,2,2,3,2,1,2,3,3,1,1,2,1,3,1,2,1,3,2,1,1,3,1,1,2,1,2,3,2,1,1,3,2,3,3,2,1,3,3,3,2,3,3,2,2,3,2,3,2,2,2,"NA",3,2,1,3,3,3,2,3,2,2,3,1,2,2,3,3,3,1,1,2,2,1,1,1,3,2,2,1,1,2,3,3,1,3,1,2,1,2,1,3,2,1,3,2,2,2,2,2,2,2,1,2,3,1,3,1,1,3,3,3,3,3,2,3,2,2,3,1,3,3,2,1,1,3,1,3,3,2,1,3,2,1,3,2,2,3,2,2,3,3,3,2,3,1,2,3,2,1,1,2,3,3,2,3,1,1,1,1,1,1,2,3,3,3,2,2,1,2,2,1,1,1,2,1,3,1,3,2,1,3,2,2,1,3,1,2,2,3,2,2,2,1,3,2,1,3,3,2,1,3,3,3,3,1,3,2,1,3,1,1,2,1,2,2,2,2,2,3,2,1,3,1,1,2,2,2,1,2,1,3,1,1,1,3,2,1,2,2,1,3,1,1,2,3,2,2,3,1],"LymphNodeMetastasis":["Present","Absent","Present","Present","Present","Absent","Absent","Absent","Absent","Present","Present","Present","Present","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present",null,"Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Present","Absent","Present","Absent","Absent","Present","Present","Absent","Absent","Present","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Present","Present","Present","Present","Absent","Absent","Absent","Absent","Present","Present","Absent","Absent","Present","Present","Absent","Present","Absent","Present","Absent","Absent","Absent","Absent","Absent","Present","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Present","Absent","Present","Present","Present","Absent","Absent","Absent","Absent","Absent","Present","Present","Present","Present","Absent","Absent","Absent","Present","Present","Present","Absent","Present","Absent","Present","Present","Present","Absent","Present","Present","Present","Present","Present","Absent","Absent","Present","Present","Absent","Present","Absent","Present","Present","Absent","Present","Absent","Present","Absent","Present","Absent","Present","Absent","Present","Present","Present","Present","Present","Present","Absent","Present","Present","Present","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Present","Absent","Absent","Present","Present","Present","Absent","Absent","Absent","Present","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Present","Present","Present","Absent","Present","Absent","Absent","Absent","Absent","Present","Absent","Absent","Absent","Absent","Present","Present","Present","Absent","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Present","Absent","Absent","Absent","Absent","Present","Absent","Present","Absent","Present","Absent","Absent","Absent","Present","Absent","Absent","Absent","Present","Present","Absent","Absent","Present","Present","Present","Present","Absent","Present","Present","Present","Absent","Absent","Absent"],"Valid":[true,false,true,true,false,false,true,true,true,true,false,true,false,false,false,true,true,false,false,true,false,false,false,false,false,false,true,false,false,true,true,true,false,true,false,false,false,false,false,false,true,true,true,true,true,false,false,true,false,false,false,true,true,true,false,false,true,false,true,true,true,true,true,false,true,false,false,true,false,false,true,false,true,true,true,true,false,false,false,true,false,false,true,true,false,false,false,false,true,false,false,true,true,false,true,true,true,true,true,true,false,false,false,false,true,false,true,true,true,false,true,true,true,false,false,false,true,true,true,false,true,true,false,false,true,false,true,true,true,true,false,false,false,true,false,true,true,false,true,true,null,false,false,false,true,false,false,true,true,true,true,true,true,false,false,false,true,true,true,false,true,true,true,false,true,true,false,true,true,true,true,true,false,false,false,true,true,false,false,true,false,true,false,false,false,true,true,false,false,true,false,false,false,true,true,true,false,true,false,true,false,false,true,false,false,true,true,false,true,true,false,true,false,true,true,true,true,true,true,false,false,false,false,true,true,true,false,true,true,false,true,true,false,false,false,false,true,true,true,true,true,true,true,true,true,true,false,false,false,false],"Smoker":[true,true,false,false,false,false,true,true,true,true,true,false,true,true,true,false,false,false,true,false,true,true,true,false,false,false,true,true,false,true,true,false,true,false,true,false,false,false,true,true,true,false,false,false,true,true,false,true,false,false,true,false,false,false,true,false,false,true,true,false,false,true,false,null,true,true,false,false,true,true,true,false,false,false,true,true,false,false,true,true,true,true,true,true,false,false,true,true,true,true,false,false,true,false,true,false,true,false,false,false,true,true,true,true,true,true,true,true,false,true,false,true,false,true,false,true,true,false,true,true,false,true,false,false,false,false,false,false,true,false,true,false,true,true,false,true,false,true,false,true,false,false,false,true,true,false,true,true,false,false,true,false,true,true,true,false,true,true,false,true,false,true,false,false,true,false,false,false,false,false,false,false,true,false,true,true,false,false,false,false,true,false,true,true,false,true,false,true,false,true,false,true,false,true,false,false,true,false,true,false,true,false,true,false,true,false,false,false,false,false,false,true,true,false,true,false,false,false,true,false,false,false,false,true,true,false,false,false,false,true,true,true,false,true,false,true,false,true,false,false,true,true,true,false,false,true,false,false,false,false],"Grade_Level":["moderate","moderate","high","low","high","moderate","high","high","high","moderate","high","high","moderate","low","high","high","low",null,"high","low","high","low","low","moderate","low","high","moderate","moderate","high","high","high","low","low","high","high","high","low","low","moderate","moderate","low","high","high","low","low","high","low","high","high","low","moderate","low","low","low","high","low","low","moderate","high","high","high","high","high","low","moderate","low","high","high","low","high","low","high","low","high","low","moderate","high","moderate","high","low","high","moderate","high","moderate","low","moderate","high","high","moderate","moderate","moderate","high","low","low","low","high","high","high","moderate","low","low","moderate","moderate","moderate","moderate","high","high","high","high","moderate","high","high","moderate","moderate","moderate","low","moderate","moderate","low","low","moderate","moderate","high","high","low","high","high","moderate","low","moderate","high","low","low","high","low","moderate","moderate","high","moderate","moderate","low","high","high","low","moderate","low","low","moderate","high","high","high","low","high","high","low","high","high","high","moderate","high","low","high","high","high","moderate","high","low","moderate","high","high","high","low","low","high","high","high","low","moderate","low","moderate","high","high","low","high","moderate","moderate","low","high","low","low","moderate","high","high","high","high","low","low","high","high","high","low","moderate","low","moderate","moderate","moderate","moderate","moderate","high","high","high","moderate","low","high","low","low","moderate","low","moderate","low","high","high","low","high","low","moderate","low","low","moderate","high","high","high","high","low","low","high","high","high","high","moderate","high","low","moderate","high","low","high","high","high","low","low"],"SurgeryDate":["2019-05-10T00:00:00","2018-09-03T00:00:00","2019-03-22T00:00:00","2018-09-28T00:00:00","2018-10-07T00:00:00","2018-10-28T00:00:00","2018-08-15T00:00:00","2018-08-27T00:00:00","2019-03-10T00:00:00","2019-03-06T00:00:00","2019-04-13T00:00:00","2018-10-25T00:00:00",null,"2019-02-11T00:00:00","2018-11-20T00:00:00","2018-10-06T00:00:00","2019-02-21T00:00:00","2018-09-09T00:00:00","2018-12-23T00:00:00","2019-03-05T00:00:00","2018-06-03T00:00:00","2018-07-16T00:00:00","2019-07-25T00:00:00","2018-10-12T00:00:00","2019-08-13T00:00:00","2018-05-13T00:00:00","2019-03-02T00:00:00","2018-06-20T00:00:00","2019-11-26T00:00:00","2019-01-14T00:00:00","2018-11-16T00:00:00","2019-06-09T00:00:00","2019-04-05T00:00:00","2019-06-04T00:00:00","2019-05-16T00:00:00","2018-05-19T00:00:00","2018-11-02T00:00:00","2019-01-25T00:00:00","2019-07-16T00:00:00","2019-03-19T00:00:00","2019-07-25T00:00:00","2019-01-19T00:00:00","2019-02-06T00:00:00","2019-04-01T00:00:00","2019-07-23T00:00:00","2019-07-08T00:00:00","2019-01-02T00:00:00","2018-11-11T00:00:00","2019-03-25T00:00:00","2018-05-14T00:00:00","2018-07-19T00:00:00","2019-02-18T00:00:00","2018-09-15T00:00:00","2018-11-21T00:00:00","2018-05-28T00:00:00","2018-10-19T00:00:00","2019-01-25T00:00:00","2019-07-14T00:00:00","2018-06-10T00:00:00","2019-02-14T00:00:00","2019-08-30T00:00:00","2018-10-15T00:00:00","2019-05-22T00:00:00","2018-09-19T00:00:00","2018-11-17T00:00:00","2019-05-09T00:00:00","2019-07-27T00:00:00","2019-05-06T00:00:00","2019-10-26T00:00:00","2019-02-24T00:00:00","2018-06-05T00:00:00","2019-10-07T00:00:00","2018-12-30T00:00:00","2018-12-09T00:00:00","2019-05-17T00:00:00","2018-12-26T00:00:00","2018-08-18T00:00:00","2018-11-14T00:00:00","2018-11-07T00:00:00","2018-12-01T00:00:00","2018-05-01T00:00:00","2019-09-16T00:00:00","2019-03-29T00:00:00","2018-11-04T00:00:00","2019-03-05T00:00:00","2018-10-27T00:00:00","2018-11-30T00:00:00","2019-05-28T00:00:00","2019-01-30T00:00:00","2018-09-28T00:00:00","2018-08-20T00:00:00","2019-04-18T00:00:00","2019-02-06T00:00:00","2018-12-14T00:00:00","2019-11-22T00:00:00","2019-08-06T00:00:00","2018-10-07T00:00:00","2019-08-30T00:00:00","2019-01-23T00:00:00","2019-02-20T00:00:00","2019-05-18T00:00:00","2019-05-23T00:00:00","2018-07-03T00:00:00","2018-12-24T00:00:00","2019-04-19T00:00:00","2018-09-17T00:00:00","2019-03-03T00:00:00","2018-12-10T00:00:00","2018-10-16T00:00:00","2019-02-20T00:00:00","2019-09-17T00:00:00","2018-08-13T00:00:00","2019-05-29T00:00:00","2018-11-30T00:00:00","2019-06-21T00:00:00","2019-04-21T00:00:00","2018-10-05T00:00:00","2018-12-12T00:00:00","2018-12-29T00:00:00","2019-01-21T00:00:00","2018-07-26T00:00:00","2019-01-28T00:00:00","2019-01-12T00:00:00","2019-01-29T00:00:00","2019-03-13T00:00:00","2019-03-01T00:00:00","2019-03-30T00:00:00","2019-04-21T00:00:00","2018-06-22T00:00:00","2019-01-22T00:00:00","2018-10-21T00:00:00","2019-01-04T00:00:00","2018-07-07T00:00:00","2019-06-14T00:00:00","2019-02-26T00:00:00","2018-08-28T00:00:00","2018-04-03T00:00:00","2018-08-13T00:00:00","2019-07-20T00:00:00","2019-04-17T00:00:00","2019-02-07T00:00:00","2019-09-02T00:00:00","2019-03-12T00:00:00","2019-07-05T00:00:00","2019-01-06T00:00:00","2018-07-21T00:00:00","2019-10-08T00:00:00","2018-10-11T00:00:00","2019-01-20T00:00:00","2018-10-23T00:00:00","2017-08-04T00:00:00","2018-02-23T00:00:00","2017-09-07T00:00:00","2018-04-08T00:00:00","2018-01-14T00:00:00","2016-04-13T00:00:00","2017-07-13T00:00:00","2018-08-15T00:00:00","2017-05-23T00:00:00","2017-07-03T00:00:00","2016-10-21T00:00:00","2017-01-14T00:00:00","2017-01-13T00:00:00","2017-05-21T00:00:00","2018-04-20T00:00:00","2017-11-21T00:00:00","2017-03-09T00:00:00","2018-02-26T00:00:00","2017-10-13T00:00:00","2017-03-08T00:00:00","2017-12-15T00:00:00","2017-11-17T00:00:00","2016-11-16T00:00:00","2016-10-23T00:00:00","2018-10-18T00:00:00","2018-06-04T00:00:00","2017-09-03T00:00:00","2016-08-26T00:00:00","2018-04-18T00:00:00","2017-11-16T00:00:00","2017-07-05T00:00:00","2018-12-05T00:00:00","2017-07-21T00:00:00","2018-01-13T00:00:00","2018-07-29T00:00:00","2017-11-07T00:00:00","2016-08-29T00:00:00","2018-07-16T00:00:00","2017-09-20T00:00:00","2019-02-04T00:00:00","2017-11-04T00:00:00","2017-10-23T00:00:00","2018-07-12T00:00:00","2017-07-26T00:00:00","2017-08-30T00:00:00","2018-05-03T00:00:00","2018-06-05T00:00:00","2017-10-04T00:00:00","2017-06-10T00:00:00","2017-03-08T00:00:00","2017-09-01T00:00:00","2018-06-17T00:00:00","2017-12-19T00:00:00","2018-08-24T00:00:00","2018-06-13T00:00:00","2017-08-21T00:00:00","2017-01-07T00:00:00","2017-11-20T00:00:00","2016-11-16T00:00:00","2018-01-27T00:00:00","2016-12-21T00:00:00","2016-09-27T00:00:00","2017-07-18T00:00:00","2016-08-15T00:00:00","2018-06-14T00:00:00","2016-08-29T00:00:00","2018-02-21T00:00:00","2016-08-16T00:00:00","2018-02-14T00:00:00","2017-08-28T00:00:00","2017-03-28T00:00:00","2017-10-25T00:00:00","2017-06-05T00:00:00","2017-08-25T00:00:00","2016-11-30T00:00:00","2016-05-09T00:00:00","2015-12-07T00:00:00","2016-12-16T00:00:00","2015-11-16T00:00:00","2015-05-23T00:00:00","2014-07-21T00:00:00","2016-04-21T00:00:00","2017-01-12T00:00:00","2015-09-01T00:00:00","2016-03-04T00:00:00","2015-01-09T00:00:00","2015-10-23T00:00:00","2015-09-12T00:00:00","2015-12-08T00:00:00","2016-05-02T00:00:00","2016-06-20T00:00:00","2015-12-18T00:00:00","2015-06-28T00:00:00","2016-04-08T00:00:00","2016-12-03T00:00:00","2014-06-22T00:00:00","2015-09-13T00:00:00","2015-01-27T00:00:00","2016-08-11T00:00:00","2015-01-26T00:00:00"],"DeathTime":["Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","MoreThan1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","Within1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year","MoreThan1Year"]},"columns":[{"accessor":"ID","name":"ID","type":"character"},{"accessor":"Name","name":"Name","type":"character"},{"accessor":"Sex","name":"Sex","type":"character"},{"accessor":"Age","name":"Age","type":"numeric"},{"accessor":"Race","name":"Race","type":"character"},{"accessor":"PreinvasiveComponent","name":"PreinvasiveComponent","type":"character"},{"accessor":"LVI","name":"LVI","type":"character"},{"accessor":"PNI","name":"PNI","type":"character"},{"accessor":"LastFollowUpDate","name":"LastFollowUpDate","type":"Date"},{"accessor":"Death","name":"Death","type":"logical"},{"accessor":"Group","name":"Group","type":"character"},{"accessor":"Grade","name":"Grade","type":"character"},{"accessor":"TStage","name":"TStage","type":"character"},{"accessor":"AntiX_intensity","name":"AntiX_intensity","type":"numeric"},{"accessor":"AntiY_intensity","name":"AntiY_intensity","type":"numeric"},{"accessor":"LymphNodeMetastasis","name":"LymphNodeMetastasis","type":"character"},{"accessor":"Valid","name":"Valid","type":"logical"},{"accessor":"Smoker","name":"Smoker","type":"logical"},{"accessor":"Grade_Level","name":"Grade_Level","type":"character"},{"accessor":"SurgeryDate","name":"SurgeryDate","type":"Date"},{"accessor":"DeathTime","name":"DeathTime","type":"character"}],"resizable":true,"filterable":true,"searchable":true,"defaultPageSize":10,"showPageSizeOptions":true,"pageSizeOptions":[10,25,50,100],"paginationType":"numbers","showPageInfo":true,"minRows":1,"highlight":true,"outlined":true,"striped":true,"compact":true,"nowrap":true,"showSortable":true,"dataKey":"d702932b3c8d62ad0a262b06ff754689"},"children":[]},"class":"reactR_markup"},"evals":[],"jsHooks":[]}
Summary of Data via summarytools 📦
summarytools::view(summarytools::dfSummary(mydata %>% dplyr::select(-keycolumns)))
if (!dir.exists(here::here("out"))) {
dir.create(here::here("out"))
}
summarytools::view(x = summarytools::dfSummary(mydata %>% dplyr::select(-keycolumns)),
file = here::here("out", "mydata_summary.html"))
Summary via dataMaid 📦
if (!dir.exists(here::here("out"))) {
dir.create(here::here("out"))
}
dataMaid::makeDataReport(data = mydata, file = here::here("out", "dataMaid_mydata.Rmd"),
replace = TRUE, openResult = FALSE, render = FALSE, quiet = TRUE)
Summary via explore 📦
if (!dir.exists(here::here("out"))) {
dir.create(here::here("out"))
}
mydata %>% dplyr::select(-dateVariables) %>% explore::report(output_file = "mydata_report.html",
output_dir = here::here("out"))
Glimpse of Data
dplyr::glimpse(mydata %>% dplyr::select(-keycolumns, -dateVariables))
Observations: 250
Variables: 17
$ Sex <chr> "Female", "Female", "Female", "Female", "Male", …
$ Age <dbl> 30, 32, 53, 57, 47, 58, 59, 54, 35, 27, 53, 55, …
$ Race <chr> "White", "White", "White", "Hispanic", "White", …
$ PreinvasiveComponent <chr> "Absent", "Absent", "Absent", "Absent", "Absent"…
$ LVI <chr> "Present", "Absent", "Absent", "Present", "Absen…
$ PNI <chr> "Absent", "Absent", "Absent", "Present", "Absent…
$ Death <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRU…
$ Group <chr> "Control", "Control", "Control", "Control", "Con…
$ Grade <chr> "1", "1", "2", "1", "2", "2", "3", "1", "2", "1"…
$ TStage <chr> "4", "4", "3", "3", "1", "3", "3", "3", "4", "4"…
$ AntiX_intensity <dbl> 2, 2, 2, 2, 3, 1, 1, 3, 2, 3, 2, 3, 1, 3, 1, 2, …
$ AntiY_intensity <dbl> 2, 2, 2, 3, 2, 1, 2, 3, 3, 1, 1, 2, 1, 3, 1, 2, …
$ LymphNodeMetastasis <chr> "Present", "Absent", "Present", "Present", "Pres…
$ Valid <lgl> TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRU…
$ Smoker <lgl> TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TR…
$ Grade_Level <chr> "moderate", "moderate", "high", "low", "high", "…
$ DeathTime <chr> "Within1Year", "Within1Year", "Within1Year", "Wi…
mydata %>% explore::describe()
# A tibble: 21 x 8
variable type na na_pct unique min mean max
<chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
1 ID chr 0 0 250 NA NA NA
2 Name chr 1 0.4 250 NA NA NA
3 Sex chr 1 0.4 3 NA NA NA
4 Age dbl 1 0.4 50 25 49.5 73
5 Race chr 1 0.4 8 NA NA NA
6 PreinvasiveComponent chr 1 0.4 3 NA NA NA
7 LVI chr 1 0.4 3 NA NA NA
8 PNI chr 1 0.4 3 NA NA NA
9 LastFollowUpDate dat 1 0.4 13 NA NA NA
10 Death lgl 1 0.4 3 0 0.67 1
# … with 11 more rows
Explore
explore::explore(mydata)
Control Data if matching expectations
visdat::vis_expect(data = mydata, expectation = ~.x == -1, show_perc = TRUE)
visdat::vis_expect(mydata, ~.x >= 25)
See missing values
visdat::vis_miss(airquality, cluster = TRUE)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/missing values visdat-1.png)
visdat::vis_miss(airquality, sort_miss = TRUE)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/missing values visdat 2-1.png)
xray::anomalies(mydata)
$variables
Variable q qNA pNA qZero pZero qBlank pBlank qInf pInf
1 Smoker 250 1 0.4% 130 52% 0 - 0 -
2 Valid 250 1 0.4% 116 46.4% 0 - 0 -
3 Death 250 1 0.4% 83 33.2% 0 - 0 -
4 Sex 250 1 0.4% 0 - 0 - 0 -
5 PreinvasiveComponent 250 1 0.4% 0 - 0 - 0 -
6 LVI 250 1 0.4% 0 - 0 - 0 -
7 PNI 250 1 0.4% 0 - 0 - 0 -
8 Group 250 1 0.4% 0 - 0 - 0 -
9 LymphNodeMetastasis 250 1 0.4% 0 - 0 - 0 -
10 Grade 250 1 0.4% 0 - 0 - 0 -
11 AntiX_intensity 250 1 0.4% 0 - 0 - 0 -
12 AntiY_intensity 250 1 0.4% 0 - 0 - 0 -
13 Grade_Level 250 1 0.4% 0 - 0 - 0 -
14 Race 250 1 0.4% 0 - 0 - 0 -
15 LastFollowUpDate 250 1 0.4% 0 - 0 - 0 -
16 Age 250 1 0.4% 0 - 0 - 0 -
17 SurgeryDate 250 1 0.4% 0 - 0 - 0 -
18 Name 250 1 0.4% 0 - 0 - 0 -
19 DeathTime 250 0 - 0 - 0 - 0 -
20 TStage 250 0 - 0 - 0 - 0 -
21 ID 250 0 - 0 - 0 - 0 -
qDistinct type anomalous_percent
1 3 Logical 52.4%
2 3 Logical 46.8%
3 3 Logical 33.6%
4 3 Character 0.4%
5 3 Character 0.4%
6 3 Character 0.4%
7 3 Character 0.4%
8 3 Character 0.4%
9 3 Character 0.4%
10 4 Character 0.4%
11 4 Numeric 0.4%
12 4 Numeric 0.4%
13 4 Character 0.4%
14 8 Character 0.4%
15 13 Timestamp 0.4%
16 50 Numeric 0.4%
17 233 Timestamp 0.4%
18 250 Character 0.4%
19 2 Character -
20 4 Character -
21 250 Character -
$problem_variables
[1] Variable q qNA pNA
[5] qZero pZero qBlank pBlank
[9] qInf pInf qDistinct type
[13] anomalous_percent problems
<0 rows> (or 0-length row.names)
xray::distributions(mydata)
================================================================================
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-2.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-3.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-4.png)
[1] "Ignoring variable LastFollowUpDate: Unsupported type for visualization."
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-5.png)
[1] "Ignoring variable SurgeryDate: Unsupported type for visualization."
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/xray 2-6.png)
Variable p_1 p_10 p_25 p_50 p_75 p_90 p_99
1 AntiX_intensity 1 1.8 2 2 3 3 3
2 AntiY_intensity 1 1 1 2 3 3 3
3 Age 25 30.8 37 49 61 70 73
Summary of Data via DataExplorer 📦
DataExplorer::plot_str(mydata)
DataExplorer::plot_str(mydata, type = "r")
DataExplorer::introduce(mydata)
# A tibble: 1 x 9
rows columns discrete_columns continuous_colu… all_missing_col…
<int> <int> <int> <int> <int>
1 250 21 18 3 0
# … with 4 more variables: total_missing_values <int>, complete_rows <int>,
# total_observations <int>, memory_usage <dbl>
DataExplorer::plot_intro(mydata)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 4-1.png)
DataExplorer::plot_missing(mydata)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 5-1.png)
Drop columns
mydata2 <- DataExplorer::drop_columns(mydata, "TStage")
DataExplorer::plot_bar(mydata)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 7-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 7-2.png)
DataExplorer::plot_bar(mydata, with = "Death")
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 8-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 8-2.png)
DataExplorer::plot_histogram(mydata)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/DataExplorer 9-1.png)
Learn these tests as highlighted in [@Schmidt2017].^[Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty. Archives of Pathology & Laboratory Medicine: February 2017, Vol. 141, No. 2, pp. 279-287. https://doi.org/10.5858/arpa.2016-0200-OA]
Write results as described in [@Knijn2015]^[From Table 1: Proposed items for reporting histopathology studies. Recommendations for reporting histopathology studies: a proposal Virchows Arch (2015) 466:611–615 DOI 10.1007/s00428-015-1762-3 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460276/]
Describe the number of patients included in the analysis and reason for dropout
Report patient/disease characteristics (including the biomarker of interest) with the number of missing values
Describe the interaction of the biomarker of interest with established prognostic variables
Include at least 90 % of initial cases included in univariate and multivariate analyses
Report the estimated effect (relative risk/odds ratio, confidence interval, and p value) in univariate analysis
Report the estimated effect (hazard rate/odds ratio, confidence interval, and p value) in multivariate analysis
Report the estimated effects (hazard ratio/odds ratio, confidence interval, and p value) of other prognostic factors included in multivariate analysis
Codes for generating data dictionary.^[See childRmd/_08dataDictionary.Rmd
file for other codes]
Codes for clean and recode data.^[See childRmd/_09cleanRecode.Rmd
file for other codes]
questionr::irec()
questionr::iorder()
questionr::icut()
iris %>% mutate(sumVar = rowSums(.[1:4]))
iris %>% mutate(sumVar = rowSums(select(., contains("Sepal")))) %>% head
iris %>% mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>% head
iRenameColumn.R
iSelectColumn.R
<= 22 Low
>= 23 & <= 41 Average
>=42 High
Codes for missing data and impute.^[See childRmd/_10impute.Rmd
file for other codes]
Multiple imputation support in Finalfit https://www.datasurg.net/2019/09/25/multiple-imputation-support-in-finalfit/
Missing data https://finalfit.org/articles/missing.html
Plot missing data
visdat::vis_miss(mydata)
\pagebreak
Codes for Descriptive Statistics.^[See childRmd/_11descriptives.Rmd
file for other codes]
Report Data properties via report 📦
mydata %>% dplyr::select(-dplyr::contains("Date")) %>% report::report()
The data contains 250 observations of the following variables:
- ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others (0 missing)
- Name: 249 entries: Aceyn, n = 1; Adalaide, n = 1; Adidas, n = 1 and 246 others (1 missing)
- Sex: 2 entries: Male, n = 127; Female, n = 122 (1 missing)
- Age: Mean = 49.54, SD = 14.16, Median = , MAD = 17.79, range: [25, 73], Skewness = 0.00, Kurtosis = -1.15, 1 missing
- Race: 7 entries: White, n = 158; Hispanic, n = 38; Black, n = 30 and 4 others (1 missing)
- PreinvasiveComponent: 2 entries: Absent, n = 203; Present, n = 46 (1 missing)
- LVI: 2 entries: Absent, n = 147; Present, n = 102 (1 missing)
- PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing)
- Death: 2 levels: FALSE (n = 83, 33.20%); TRUE (n = 166, 66.40%) and missing (n = 1, 0.40%)
- Group: 2 entries: Treatment, n = 131; Control, n = 118 (1 missing)
- Grade: 3 entries: 3, n = 109; 1, n = 78; 2, n = 62 (1 missing)
- TStage: 4 entries: 4, n = 118; 3, n = 65; 2, n = 43 and 1 other (0 missing)
- AntiX_intensity: Mean = 2.39, SD = 0.66, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.63, Kurtosis = -0.65, 1 missing
- AntiY_intensity: Mean = 2.02, SD = 0.80, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.03, Kurtosis = -1.42, 1 missing
- LymphNodeMetastasis: 2 entries: Absent, n = 144; Present, n = 105 (1 missing)
- Valid: 2 levels: FALSE (n = 116, 46.40%); TRUE (n = 133, 53.20%) and missing (n = 1, 0.40%)
- Smoker: 2 levels: FALSE (n = 130, 52.00%); TRUE (n = 119, 47.60%) and missing (n = 1, 0.40%)
- Grade_Level: 3 entries: high, n = 109; low, n = 77; moderate, n = 63 (1 missing)
- DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101 (0 missing)
Table 1 via arsenal 📦
# cat(names(mydata), sep = " + \n")
library(arsenal)
tab1 <- arsenal::tableby(
~ Sex +
Age +
Race +
PreinvasiveComponent +
LVI +
PNI +
Death +
Group +
Grade +
TStage +
# `Anti-X-intensity` +
# `Anti-Y-intensity` +
LymphNodeMetastasis +
Valid +
Smoker +
Grade_Level
,
data = mydata
)
summary(tab1)
| | Overall (N=250) | |:---------------------------|:---------------:| |Sex | | | N-Miss | 1 | | Female | 122 (49.0%) | | Male | 127 (51.0%) | |Age | | | N-Miss | 1 | | Mean (SD) | 49.538 (14.160) | | Range | 25.000 - 73.000 | |Race | | | N-Miss | 1 | | Asian | 15 (6.0%) | | Bi-Racial | 5 (2.0%) | | Black | 30 (12.0%) | | Hispanic | 38 (15.3%) | | Native | 2 (0.8%) | | Other | 1 (0.4%) | | White | 158 (63.5%) | |PreinvasiveComponent | | | N-Miss | 1 | | Absent | 203 (81.5%) | | Present | 46 (18.5%) | |LVI | | | N-Miss | 1 | | Absent | 147 (59.0%) | | Present | 102 (41.0%) | |PNI | | | N-Miss | 1 | | Absent | 171 (68.7%) | | Present | 78 (31.3%) | |Death | | | N-Miss | 1 | | FALSE | 83 (33.3%) | | TRUE | 166 (66.7%) | |Group | | | N-Miss | 1 | | Control | 118 (47.4%) | | Treatment | 131 (52.6%) | |Grade | | | N-Miss | 1 | | 1 | 78 (31.3%) | | 2 | 62 (24.9%) | | 3 | 109 (43.8%) | |TStage | | | 1 | 24 (9.6%) | | 2 | 43 (17.2%) | | 3 | 65 (26.0%) | | 4 | 118 (47.2%) | |LymphNodeMetastasis | | | N-Miss | 1 | | Absent | 144 (57.8%) | | Present | 105 (42.2%) | |Valid | | | N-Miss | 1 | | FALSE | 116 (46.6%) | | TRUE | 133 (53.4%) | |Smoker | | | N-Miss | 1 | | FALSE | 130 (52.2%) | | TRUE | 119 (47.8%) | |Grade_Level | | | N-Miss | 1 | | high | 109 (43.8%) | | low | 77 (30.9%) | | moderate | 63 (25.3%) |
Table 1 via tableone 📦
library(tableone)
mydata %>% dplyr::select(-keycolumns, -dateVariables) %>% tableone::CreateTableOne(data = .)
Overall
n 250
Sex = Male (%) 127 (51.0)
Age (mean (SD)) 49.54 (14.16)
Race (%)
Asian 15 ( 6.0)
Bi-Racial 5 ( 2.0)
Black 30 (12.0)
Hispanic 38 (15.3)
Native 2 ( 0.8)
Other 1 ( 0.4)
White 158 (63.5)
PreinvasiveComponent = Present (%) 46 (18.5)
LVI = Present (%) 102 (41.0)
PNI = Present (%) 78 (31.3)
Death = TRUE (%) 166 (66.7)
Group = Treatment (%) 131 (52.6)
Grade (%)
1 78 (31.3)
2 62 (24.9)
3 109 (43.8)
TStage (%)
1 24 ( 9.6)
2 43 (17.2)
3 65 (26.0)
4 118 (47.2)
AntiX_intensity (mean (SD)) 2.39 (0.66)
AntiY_intensity (mean (SD)) 2.02 (0.80)
LymphNodeMetastasis = Present (%) 105 (42.2)
Valid = TRUE (%) 133 (53.4)
Smoker = TRUE (%) 119 (47.8)
Grade_Level (%)
high 109 (43.8)
low 77 (30.9)
moderate 63 (25.3)
DeathTime = Within1Year (%) 149 (59.6)
Descriptive Statistics of Continuous Variables
mydata %>% dplyr::select(continiousVariables, numericVariables, integerVariables) %>%
summarytools::descr(., style = "rmarkdown")
print(summarytools::descr(mydata), method = "render", table.classes = "st-small")
mydata %>% summarytools::descr(., stats = "common", transpose = TRUE, headings = FALSE)
mydata %>% summarytools::descr(stats = "common") %>% summarytools::tb()
mydata$Sex %>% summarytools::freq(cumul = FALSE, report.nas = FALSE) %>% summarytools::tb()
mydata %>% explore::describe() %>% dplyr::filter(unique < 5)
# A tibble: 15 x 8
variable type na na_pct unique min mean max
<chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
1 Sex chr 1 0.4 3 NA NA NA
2 PreinvasiveComponent chr 1 0.4 3 NA NA NA
3 LVI chr 1 0.4 3 NA NA NA
4 PNI chr 1 0.4 3 NA NA NA
5 Death lgl 1 0.4 3 0 0.67 1
6 Group chr 1 0.4 3 NA NA NA
7 Grade chr 1 0.4 4 NA NA NA
8 TStage chr 0 0 4 NA NA NA
9 AntiX_intensity dbl 1 0.4 4 1 2.39 3
10 AntiY_intensity dbl 1 0.4 4 1 2.02 3
11 LymphNodeMetastasis chr 1 0.4 3 NA NA NA
12 Valid lgl 1 0.4 3 0 0.53 1
13 Smoker lgl 1 0.4 3 0 0.48 1
14 Grade_Level chr 1 0.4 4 NA NA NA
15 DeathTime chr 0 0 2 NA NA NA
mydata %>% explore::describe() %>% dplyr::filter(na > 0)
# A tibble: 18 x 8
variable type na na_pct unique min mean max
<chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
1 Name chr 1 0.4 250 NA NA NA
2 Sex chr 1 0.4 3 NA NA NA
3 Age dbl 1 0.4 50 25 49.5 73
4 Race chr 1 0.4 8 NA NA NA
5 PreinvasiveComponent chr 1 0.4 3 NA NA NA
6 LVI chr 1 0.4 3 NA NA NA
7 PNI chr 1 0.4 3 NA NA NA
8 LastFollowUpDate dat 1 0.4 13 NA NA NA
9 Death lgl 1 0.4 3 0 0.67 1
10 Group chr 1 0.4 3 NA NA NA
11 Grade chr 1 0.4 4 NA NA NA
12 AntiX_intensity dbl 1 0.4 4 1 2.39 3
13 AntiY_intensity dbl 1 0.4 4 1 2.02 3
14 LymphNodeMetastasis chr 1 0.4 3 NA NA NA
15 Valid lgl 1 0.4 3 0 0.53 1
16 Smoker lgl 1 0.4 3 0 0.48 1
17 Grade_Level chr 1 0.4 4 NA NA NA
18 SurgeryDate dat 1 0.4 233 NA NA NA
mydata %>% explore::describe()
# A tibble: 21 x 8
variable type na na_pct unique min mean max
<chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
1 ID chr 0 0 250 NA NA NA
2 Name chr 1 0.4 250 NA NA NA
3 Sex chr 1 0.4 3 NA NA NA
4 Age dbl 1 0.4 50 25 49.5 73
5 Race chr 1 0.4 8 NA NA NA
6 PreinvasiveComponent chr 1 0.4 3 NA NA NA
7 LVI chr 1 0.4 3 NA NA NA
8 PNI chr 1 0.4 3 NA NA NA
9 LastFollowUpDate dat 1 0.4 13 NA NA NA
10 Death lgl 1 0.4 3 0 0.67 1
# … with 11 more rows
Use R/gc_desc_cat.R
to generate gc_desc_cat.Rmd
containing descriptive statistics for categorical variables
source(here::here("R", "gc_desc_cat.R"))
mydata %>% janitor::tabyl(Sex) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
Sex n percent valid_percent
Female 122 48.8% 49.0% Male 127 50.8% 51.0% NA 1 0.4% -
\pagebreak
mydata %>% janitor::tabyl(Race) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
Race n percent valid_percent
Asian 15 6.0% 6.0% Bi-Racial 5 2.0% 2.0% Black 30 12.0% 12.0% Hispanic 38 15.2% 15.3% Native 2 0.8% 0.8% Other 1 0.4% 0.4% White 158 63.2% 63.5% NA 1 0.4% -
\pagebreak
mydata %>% janitor::tabyl(PreinvasiveComponent) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
PreinvasiveComponent n percent valid_percent
Absent 203 81.2% 81.5% Present 46 18.4% 18.5% NA 1 0.4% -
\pagebreak
mydata %>% janitor::tabyl(LVI) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
LVI n percent valid_percent
Absent 147 58.8% 59.0% Present 102 40.8% 41.0% NA 1 0.4% -
\pagebreak
mydata %>% janitor::tabyl(PNI) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
PNI n percent valid_percent
Absent 171 68.4% 68.7% Present 78 31.2% 31.3% NA 1 0.4% -
\pagebreak
mydata %>% janitor::tabyl(Group) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
Group n percent valid_percent
Control 118 47.2% 47.4% Treatment 131 52.4% 52.6% NA 1 0.4% -
\pagebreak
mydata %>% janitor::tabyl(Grade) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
Grade n percent valid_percent
1 78 31.2% 31.3% 2 62 24.8% 24.9% 3 109 43.6% 43.8% NA 1 0.4% -
\pagebreak
mydata %>% janitor::tabyl(TStage) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
TStage n percent
1 24 9.6% 2 43 17.2% 3 65 26.0% 4 118 47.2%
\pagebreak
mydata %>% janitor::tabyl(LymphNodeMetastasis) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
LymphNodeMetastasis n percent valid_percent
Absent 144 57.6% 57.8% Present 105 42.0% 42.2% NA 1 0.4% -
\pagebreak
mydata %>% janitor::tabyl(Grade_Level) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
Grade_Level n percent valid_percent
high 109 43.6% 43.8% low 77 30.8% 30.9% moderate 63 25.2% 25.3% NA 1 0.4% -
\pagebreak
mydata %>% janitor::tabyl(DeathTime) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()
DeathTime n percent
MoreThan1Year 101 40.4% Within1Year 149 59.6%
\pagebreak
race_stats <- summarytools::freq(mydata$Race)
print(race_stats, report.nas = FALSE, totals = FALSE, display.type = FALSE, Variable.label = "Race Group")
mydata %>% explore::describe(PreinvasiveComponent)
variable = PreinvasiveComponent
type = character
na = 1 of 250 (0.4%)
unique = 3
Absent = 203 (81.2%)
Present = 46 (18.4%)
NA = 1 (0.4%)
## Frequency or custom tables for categorical variables
SmartEDA::ExpCTable(mydata, Target = NULL, margin = 1, clim = 10, nlim = 5, round = 2,
bin = NULL, per = T)
Variable Valid Frequency Percent CumPercent
1 Sex Female 122 48.8 48.8
2 Sex Male 127 50.8 99.6
3 Sex NA 1 0.4 100.0
4 Sex TOTAL 250 NA NA
5 Race Asian 15 6.0 6.0
6 Race Bi-Racial 5 2.0 8.0
7 Race Black 30 12.0 20.0
8 Race Hispanic 38 15.2 35.2
9 Race NA 1 0.4 35.6
10 Race Native 2 0.8 36.4
11 Race Other 1 0.4 36.8
12 Race White 158 63.2 100.0
13 Race TOTAL 250 NA NA
14 PreinvasiveComponent Absent 203 81.2 81.2
15 PreinvasiveComponent NA 1 0.4 81.6
16 PreinvasiveComponent Present 46 18.4 100.0
17 PreinvasiveComponent TOTAL 250 NA NA
18 LVI Absent 147 58.8 58.8
19 LVI NA 1 0.4 59.2
20 LVI Present 102 40.8 100.0
21 LVI TOTAL 250 NA NA
22 PNI Absent 171 68.4 68.4
23 PNI NA 1 0.4 68.8
24 PNI Present 78 31.2 100.0
25 PNI TOTAL 250 NA NA
26 Group Control 118 47.2 47.2
27 Group NA 1 0.4 47.6
28 Group Treatment 131 52.4 100.0
29 Group TOTAL 250 NA NA
30 Grade 1 78 31.2 31.2
31 Grade 2 62 24.8 56.0
32 Grade 3 109 43.6 99.6
33 Grade NA 1 0.4 100.0
34 Grade TOTAL 250 NA NA
35 TStage 1 24 9.6 9.6
36 TStage 2 43 17.2 26.8
37 TStage 3 65 26.0 52.8
38 TStage 4 118 47.2 100.0
39 TStage TOTAL 250 NA NA
40 LymphNodeMetastasis Absent 144 57.6 57.6
41 LymphNodeMetastasis NA 1 0.4 58.0
42 LymphNodeMetastasis Present 105 42.0 100.0
43 LymphNodeMetastasis TOTAL 250 NA NA
44 Grade_Level high 109 43.6 43.6
45 Grade_Level low 77 30.8 74.4
46 Grade_Level moderate 63 25.2 99.6
47 Grade_Level NA 1 0.4 100.0
48 Grade_Level TOTAL 250 NA NA
49 DeathTime MoreThan1Year 101 40.4 40.4
50 DeathTime Within1Year 149 59.6 100.0
51 DeathTime TOTAL 250 NA NA
52 AntiX_intensity 1 25 10.0 10.0
53 AntiX_intensity 2 102 40.8 50.8
54 AntiX_intensity 3 122 48.8 99.6
55 AntiX_intensity NA 1 0.4 100.0
56 AntiX_intensity TOTAL 250 NA NA
57 AntiY_intensity 1 77 30.8 30.8
58 AntiY_intensity 2 91 36.4 67.2
59 AntiY_intensity 3 81 32.4 99.6
60 AntiY_intensity NA 1 0.4 100.0
61 AntiY_intensity TOTAL 250 NA NA
inspectdf::inspect_cat(mydata)
# A tibble: 16 x 5
col_name cnt common common_pcnt levels
<chr> <int> <chr> <dbl> <named list>
1 Death 3 TRUE 66.4 <tibble [3 × 3]>
2 DeathTime 2 Within1Year 59.6 <tibble [2 × 3]>
3 Grade 4 3 43.6 <tibble [4 × 3]>
4 Grade_Level 4 high 43.6 <tibble [4 × 3]>
5 Group 3 Treatment 52.4 <tibble [3 × 3]>
6 ID 250 001 0.4 <tibble [250 × 3]>
7 LVI 3 Absent 58.8 <tibble [3 × 3]>
8 LymphNodeMetastasis 3 Absent 57.6 <tibble [3 × 3]>
9 Name 250 Aceyn 0.4 <tibble [250 × 3]>
10 PNI 3 Absent 68.4 <tibble [3 × 3]>
11 PreinvasiveComponent 3 Absent 81.2 <tibble [3 × 3]>
12 Race 8 White 63.2 <tibble [8 × 3]>
13 Sex 3 Male 50.8 <tibble [3 × 3]>
14 Smoker 3 FALSE 52 <tibble [3 × 3]>
15 TStage 4 4 47.2 <tibble [4 × 3]>
16 Valid 3 TRUE 53.2 <tibble [3 × 3]>
inspectdf::inspect_cat(mydata)$levels$Group
# A tibble: 3 x 3
value prop cnt
<chr> <dbl> <int>
1 Treatment 0.524 131
2 Control 0.472 118
3 <NA> 0.004 1
library(summarytools)
grouped_freqs <- stby(data = mydata$Smoker, INDICES = mydata$Sex, FUN = freq, cumul = FALSE,
report.nas = FALSE)
grouped_freqs %>% tb(order = 2)
summarytools::stby(list(x = mydata$LVI, y = mydata$LymphNodeMetastasis), mydata$PNI,
summarytools::ctable)
with(mydata, summarytools::stby(list(x = LVI, y = LymphNodeMetastasis), PNI, summarytools::ctable))
mydata %>% dplyr::select(characterVariables) %>% dplyr::select(PreinvasiveComponent,
PNI, LVI) %>% reactable::reactable(data = ., groupBy = c("PreinvasiveComponent",
"PNI"), columns = list(LVI = reactable::colDef(aggregate = "count")))
\pagebreak
questionr:::icut()
source(here::here("R", "gc_desc_cont.R"))
Descriptive Statistics Age
mydata %>% jmv::descriptives(data = ., vars = "Age", hist = TRUE, dens = TRUE, box = TRUE,
violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE, skew = TRUE,
kurt = TRUE, quart = TRUE)
DESCRIPTIVES
Descriptives
──────────────────────────────────
Age
──────────────────────────────────
N 249
Missing 1
Mean 49.5
Median 49.0
Mode 72.0
Standard deviation 14.2
Variance 200
Minimum 25.0
Maximum 73.0
Skewness 0.00389
Std. error skewness 0.154
Kurtosis -1.15
Std. error kurtosis 0.307
25th percentile 37.0
50th percentile 49.0
75th percentile 61.0
──────────────────────────────────
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics Age-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics Age-2.png)
\pagebreak
Descriptive Statistics AntiX_intensity
mydata %>% jmv::descriptives(data = ., vars = "AntiX_intensity", hist = TRUE, dens = TRUE,
box = TRUE, violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE,
skew = TRUE, kurt = TRUE, quart = TRUE)
DESCRIPTIVES
Descriptives
──────────────────────────────────────────
AntiX_intensity
──────────────────────────────────────────
N 249
Missing 1
Mean 2.39
Median 2.00
Mode 3.00
Standard deviation 0.664
Variance 0.440
Minimum 1.00
Maximum 3.00
Skewness -0.631
Std. error skewness 0.154
Kurtosis -0.640
Std. error kurtosis 0.307
25th percentile 2.00
50th percentile 2.00
75th percentile 3.00
──────────────────────────────────────────
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics AntiX_intensity-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics AntiX_intensity-2.png)
\pagebreak
Descriptive Statistics AntiY_intensity
mydata %>% jmv::descriptives(data = ., vars = "AntiY_intensity", hist = TRUE, dens = TRUE,
box = TRUE, violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE,
skew = TRUE, kurt = TRUE, quart = TRUE)
DESCRIPTIVES
Descriptives
──────────────────────────────────────────
AntiY_intensity
──────────────────────────────────────────
N 249
Missing 1
Mean 2.02
Median 2.00
Mode 2.00
Standard deviation 0.798
Variance 0.637
Minimum 1.00
Maximum 3.00
Skewness -0.0289
Std. error skewness 0.154
Kurtosis -1.43
Std. error kurtosis 0.307
25th percentile 1.00
50th percentile 2.00
75th percentile 3.00
──────────────────────────────────────────
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics AntiY_intensity-1.png)![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Descriptive Statistics AntiY_intensity-2.png)
\pagebreak
tab <- tableone::CreateTableOne(data = mydata)
# ?print.ContTable
tab$ContTable
Overall
n 250
Age (mean (SD)) 49.54 (14.16)
AntiX_intensity (mean (SD)) 2.39 (0.66)
AntiY_intensity (mean (SD)) 2.02 (0.80)
print(tab$ContTable, nonnormal = c("Anti-X-intensity"))
Overall
n 250
Age (mean (SD)) 49.54 (14.16)
AntiX_intensity (mean (SD)) 2.39 (0.66)
AntiY_intensity (mean (SD)) 2.02 (0.80)
mydata %>% explore::describe(Age)
variable = Age
type = double
na = 1 of 250 (0.4%)
unique = 50
min|max = 25 | 73
q05|q95 = 28 | 72
q25|q75 = 37 | 61
median = 49
mean = 49.53815
mydata %>% dplyr::select(continiousVariables) %>% SmartEDA::ExpNumStat(data = .,
by = "A", gp = NULL, Qnt = seq(0, 1, 0.1), MesofShape = 2, Outlier = TRUE, round = 2)
inspectdf::inspect_num(mydata, breaks = 10)
# A tibble: 3 x 10
col_name min q1 median mean q3 max sd pcnt_na hist
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <named list>
1 Age 25 37 49 49.5 61 73 14.2 0.4 <tibble [12…
2 AntiX_intens… 1 2 2 2.39 3 3 0.664 0.4 <tibble [12…
3 AntiY_intens… 1 1 2 2.02 3 3 0.798 0.4 <tibble [12…
inspectdf::inspect_num(mydata)$hist$Age
# A tibble: 27 x 2
value prop
<chr> <dbl>
1 [-Inf, 24) 0
2 [24, 26) 0.0201
3 [26, 28) 0.0281
4 [28, 30) 0.0361
5 [30, 32) 0.0361
6 [32, 34) 0.0602
7 [34, 36) 0.0482
8 [36, 38) 0.0241
9 [38, 40) 0.0161
10 [40, 42) 0.0602
# … with 17 more rows
inspectdf::inspect_num(mydata, breaks = 10) %>% inspectdf::show_plot()
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/inspectdf 5-1.png)
grouped_descr <- summarytools::stby(data = mydata, INDICES = mydata$Sex, FUN = summarytools::descr,
stats = "common")
# grouped_descr %>% summarytools::tb(order = 2)
grouped_descr %>% summarytools::tb()
summarytools::stby(data = mydata, INDICES = mydata$PreinvasiveComponent, FUN = summarytools::descr,
stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE)
with(mydata, summarytools::stby(Age, PreinvasiveComponent, summarytools::descr),
stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE)
mydata %>% group_by(PreinvasiveComponent) %>% summarytools::descr(stats = "fivenum")
## Summary statistics by – category
SmartEDA::ExpNumStat(mydata, by = "GA", gp = "PreinvasiveComponent", Qnt = seq(0,
1, 0.1), MesofShape = 2, Outlier = TRUE, round = 2)
Vname Group TN nNeg nZero nPos NegInf PosInf NA_Value
1 Age PreinvasiveComponent:All 250 0 0 249 0 0 1
2 Age PreinvasiveComponent:Absent 203 0 0 203 0 0 0
3 Age PreinvasiveComponent:Present 46 0 0 45 0 0 1
4 Age PreinvasiveComponent:NA 0 0 0 0 0 0 0
Per_of_Missing sum min max mean median SD CV IQR Skewness Kurtosis
1 0.40 12335 25 73 49.54 49 14.16 0.29 24.0 0.00 -1.16
2 0.00 10117 25 73 49.84 51 14.34 0.29 23.5 -0.02 -1.20
3 2.17 2170 25 72 48.22 49 13.55 0.28 22.0 0.08 -0.98
4 NaN 0 Inf -Inf NaN NA NA NA NA NaN NaN
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% LB.25% UB.75% nOutliers
1 25 30.8 34.0 40.4 45.0 49 54.0 59.0 64 70.0 73 1.00 97.00 0
2 25 31.0 34.0 40.6 45.0 51 54.0 59.0 65 70.8 73 2.25 96.25 0
3 25 30.8 34.8 40.2 43.6 49 51.8 56.8 59 68.6 72 3.00 91.00 0
4 NA NA NA NA NA NA NA NA NA NA NA NA NA 0
\pagebreak
\newpage \blandscape
Codes for cross tables.^[See childRmd/_12crossTables.Rmd
file for other codes]
library(finalfit)
# dependent <- c('dependent1', 'dependent2' )
# explanatory <- c('explanatory1', 'explanatory2' )
dependent <- "PreinvasiveComponent"
explanatory <- c("Sex", "Age", "Grade", "TStage")
Change column = TRUE
argument to get row or column percentages.
source(here::here("R", "gc_table_cross.R"))
Cross Table PreinvasiveComponent
mydata %>%
summary_factorlist(dependent = 'PreinvasiveComponent',
explanatory = explanatory,
# column = TRUE,
total_col = TRUE,
p = TRUE,
add_dependent_label = TRUE,
na_include=FALSE
# catTest = catTestfisher
) -> table
knitr::kable(table, row.names = FALSE, align = c('l', 'l', 'r', 'r', 'r'))
Dependent: PreinvasiveComponent Absent Present Total p
Sex Female 104 (51.2) 17 (37.8) 121 (48.8) 0.102 Male 99 (48.8) 28 (62.2) 127 (51.2) Age Mean (SD) 49.8 (14.3) 48.2 (13.6) 49.5 (14.2) 0.492 Grade 1 68 (33.7) 9 (19.6) 77 (31.0) 0.100 2 46 (22.8) 16 (34.8) 62 (25.0) 3 88 (43.6) 21 (45.7) 109 (44.0) TStage 1 18 (8.9) 6 (13.0) 24 (9.6) 0.117 2 38 (18.7) 4 (8.7) 42 (16.9) 3 48 (23.6) 17 (37.0) 65 (26.1) 4 99 (48.8) 19 (41.3) 118 (47.4)
\pagebreak
\pagebreak
\pagebreak
\pagebreak
\pagebreak
library(DT)
datatable(mtcars, rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=T) )
\newpage \blandscape
\elandscape
\elandscape
Codes for generating Plots.^[See childRmd/_13plots.Rmd
file for other codes]
R allows to build any type of interactive graphic. My favourite library is plotly that will turn any of your ggplot2 graphic interactive in one supplementary line of code. Try to hover points, to select a zone, to click on the legend.
library(ggplot2)
library(plotly)
library(gapminder)
p <- gapminder %>% filter(year == 1977) %>% ggplot(aes(gdpPercap, lifeExp, size = pop,
color = continent)) + geom_point() + scale_x_log10() + theme_bw()
ggplotly(p)
scales::show_col(colours(), cex_label = 0.35)
embedgist <- gistr::gist("https://gist.github.com/sbalci/834ebc154c0ffcb7d5899c42dd3ab75e") %>%
gistr::embed()
# https://stackoverflow.com/questions/43053375/weighted-sankey-alluvial-diagram-for-visualizing-discrete-and-continuous-panel/48133004
library(tidyr)
library(dplyr)
library(alluvial)
library(ggplot2)
library(forcats)
set.seed(42)
individual <- rep(LETTERS[1:10], each = 2)
timeperiod <- paste0("time_", rep(1:2, 10))
discretechoice <- factor(paste0("choice_", sample(letters[1:3], 20, replace = T)))
continuouschoice <- ceiling(runif(20, 0, 100))
d <- data.frame(individual, timeperiod, discretechoice, continuouschoice)
# stacked bar diagram of discrete choice by individual
g <- ggplot(data = d, aes(timeperiod, fill = fct_rev(discretechoice)))
g + geom_bar(position = "stack") + guides(fill = guide_legend(title = NULL))
# alluvial diagram of discrete choice by individual
d_alluvial <- d %>% select(individual, timeperiod, discretechoice) %>% spread(timeperiod,
discretechoice) %>% group_by(time_1, time_2) %>% summarize(count = n()) %>% ungroup()
Error in UseMethod("ungroup"): no applicable method for 'ungroup' applied to an object of class "list"
alluvial(select(d_alluvial, -count), freq = d_alluvial$count)
Error in log_select(.data, .fun = dplyr::select, .funname = "select", : object 'd_alluvial' not found
# stacked bar diagram of discrete choice, weighting by continuous choice
g + geom_bar(position = "stack", aes(weight = continuouschoice))
library(ggalluvial)
ggplot(data = d, aes(x = timeperiod, stratum = discretechoice, alluvium = individual,
y = continuouschoice)) + geom_stratum(aes(fill = discretechoice)) + geom_flow()
CD44changes <- mydata %>% dplyr::select(TumorCD44, TomurcukCD44, PeritumoralTomurcukGr4) %>%
dplyr::filter(complete.cases(.)) %>% dplyr::group_by(TumorCD44, TomurcukCD44,
PeritumoralTomurcukGr4) %>% dplyr::tally()
Error: Can't subset columns that don't exist.
[31mx[39m The column `TumorCD44` doesn't exist.
library(ggalluvial)
ggplot(data = CD44changes, aes(axis1 = TumorCD44, axis2 = TomurcukCD44, y = n)) +
scale_x_discrete(limits = c("TumorCD44", "TomurcukCD44"), expand = c(0.1, 0.05)) +
xlab("Tumor Tomurcuk") + geom_alluvium(aes(fill = PeritumoralTomurcukGr4, colour = PeritumoralTomurcukGr4)) +
geom_stratum(alpha = 0.5) + geom_text(stat = "stratum", infer.label = TRUE) +
# geom_text(stat = 'alluvium', infer.label = TRUE) +
theme_minimal() + ggtitle("Changes in CD44")
Error in ggplot(data = CD44changes, aes(axis1 = TumorCD44, axis2 = TomurcukCD44, : object 'CD44changes' not found
Codes for generating paired tests.^[See childRmd/_14pairedTests.Rmd
file for other codes]
Codes for generating hypothesis tests.^[See childRmd/_15hypothesisTests.Rmd
file for other codes]
mytable <- jmv::ttestIS(formula = HindexCTLA4 ~ PeritumoralTomurcukGr4, data = mydata,
vars = HindexCTLA4, students = FALSE, mann = TRUE, norm = TRUE, meanDiff = TRUE,
desc = TRUE, plots = TRUE)
Error: Argument 'vars' contains 'HindexCTLA4' which is not present in the dataset
cat("<pre class='jamovitable'>")
wzxhzdk:162 wzxhzdk:163 wzxhzdk:164
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/t.test.html
t.test(mtcars$mpg ~ mtcars$am) %>% report::report()
report(t.test(iris$Sepal.Length, iris$Petal.Length))
Frequently Used Statistical Tests^[Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty. Archives of Pathology & Laboratory Medicine: February 2017, Vol. 141, No. 2, pp. 279-287. https://doi.org/10.5858/arpa.2016-0200-OA] by [@Schmidt2017]
\newpage \blandscape
Codes for ROC.^[See childRmd/_16ROC.Rmd
file for other codes]
Codes for Decision Tree.^[See childRmd/_17decisionTree.Rmd
]
Explore
explore::explore(mydata)
Codes for Survival Analysis^[See childRmd/_18survival.Rmd
file for other codes, and childRmd/_19shinySurvival.Rmd
for shiny
application]
https://link.springer.com/article/10.1007/s00701-019-04096-9
Calculate survival time
mydata$int <- lubridate::interval(lubridate::ymd(mydata$SurgeryDate), lubridate::ymd(mydata$LastFollowUpDate))
mydata$OverallTime <- lubridate::time_length(mydata$int, "month")
mydata$OverallTime <- round(mydata$OverallTime, digits = 1)
recode death status outcome as numbers for survival analysis
## Recoding mydata$Death into mydata$Outcome
mydata$Outcome <- forcats::fct_recode(as.character(mydata$Death), `1` = "TRUE", `0` = "FALSE")
mydata$Outcome <- as.numeric(as.character(mydata$Outcome))
it is always a good practice to double-check after recoding^[JAMA retraction after miscoding – new Finalfit function to check recoding]
table(mydata$Death, mydata$Outcome)
0 1
FALSE 83 0
TRUE 0 166
library(survival)
# data(lung) km <- with(lung, Surv(time, status))
km <- with(mydata, Surv(OverallTime, Outcome))
head(km, 80)
[1] 4.5+ 7.8 7.1 7.9 10.6 6.9+ 8.4+ 11.0 3.5 7.6 8.4 6.0
[13] NA 9.5 11.2 11.7 9.2 7.6? 4.1 4.7 9.7+ 8.3+ 6.0+ 5.5+
[25] 6.4 11.4 3.8+ 10.2 3.0 6.4 11.3 6.5+ 9.7 6.7 3.3+ 11.2+
[37] 7.8 7.0 6.3 10.2 7.0 11.2 9.7+ 6.8 3.1 3.6 7.8 9.5+
[49] 6.0 10.4+ 11.2+ 3.3+ 7.4 9.2+ 9.9 11.2+ 10.0 5.4 9.5 5.4
[61] 5.9 8.4 4.1 9.2 7.3+ 6.6 7.0+ 8.6+ 4.0 4.1 10.7 4.7
[73] 6.9 6.6 5.3 8.0 9.3 8.4+ 8.6+ 8.8
plot(km)
Kaplan-Meier Plot Log-Rank Test
# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html
dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "LVI"
mydata %>%
finalfit::surv_plot(.data = .,
dependent = dependentKM,
explanatory = explanatoryKM,
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
# legend.labs = c('a','b')
)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Kaplan-Meier Plot Log-Rank Test-1.png)
# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html
mydata %>%
finalfit::surv_plot(.data = .,
dependent = "Surv(OverallTime, Outcome)",
explanatory = "LVI",
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
# legend.labs = c('a','b')
)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Kaplan-Meier Plot Log-Rank Test 2-1.png)
library(finalfit)
library(survival)
explanatoryUni <- "LVI"
dependentUni <- "Surv(OverallTime, Outcome)"
tUni <- mydata %>% finalfit::finalfit(dependentUni, explanatoryUni)
knitr::kable(tUni[, 1:4], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))
Dependent: Surv(OverallTime, Outcome) all HR (univariable)
LVI Absent 147 (100.0) - Present 102 (100.0) 1.59 (1.15-2.20, p=0.005)
tUni_df <- tibble::as_tibble(tUni, .name_repair = "minimal") %>% janitor::clean_names()
tUni_df_descr <- paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1],
" is ", tUni_df$x[2], ", there is ", tUni_df$hr_univariable[2], " times risk than ",
"when ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1],
".")
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}
\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{
$ When LVI is Present, there is 1.59 (1.15-2.20, p=0.005) times risk than when LVI is Absent. $
} }
km_fit <- survfit(Surv(OverallTime, Outcome) ~ LVI, data = mydata)
km_fit
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)
4 observations deleted due to missingness
n events median 0.95LCL 0.95UCL
LVI=Absent 144 100 22.0 14.3 31.0
LVI=Present 102 64 10.5 9.9 13.8
plot(km_fit)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Median Survivals-1.png)
# summary(km_fit)
km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% janitor::clean_names() %>%
tibble::rownames_to_column()
km_fit_median_definition <- km_fit_median_df %>% dplyr::mutate(description = glue::glue("When {rowname}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.")) %>%
dplyr::select(description) %>% dplyr::pull()
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}
\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{
When LVI=Absent, median survival is 22 [14.3 - 31, 95% CI] months., When LVI=Present, median survival is 10.5 [9.9 - 13.8, 95% CI] months.
} }
summary(km_fit, times = c(12, 36, 60))
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)
4 observations deleted due to missingness
LVI=Absent
time n.risk n.event survival std.err lower 95% CI upper 95% CI
12 75 52 0.617 0.0421 0.539 0.705
36 19 35 0.252 0.0452 0.177 0.358
LVI=Present
time n.risk n.event survival std.err lower 95% CI upper 95% CI
12 23 49 0.383 0.0566 0.2870 0.512
36 4 12 0.134 0.0488 0.0657 0.274
km_fit_summary <- summary(km_fit, times = c(12, 36, 60))
km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event",
"surv", "std.err", "lower", "upper")])
km_fit_definition <- km_fit_df %>% dplyr::mutate(description = glue::glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>%
dplyr::select(description) %>% dplyr::pull()
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}
\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{
When LVI=Absent, 12 month survival is 62% [54%-70.5%, 95% CI]., When LVI=Absent, 36 month survival is 25% [18%-35.8%, 95% CI]., When LVI=Present, 12 month survival is 38% [29%-51.2%, 95% CI]., When LVI=Present, 36 month survival is 13% [7%-27.4%, 95% CI].
} }
source(here::here("R", "gc_survival.R"))
Kaplan-Meier Plot Log-Rank Test
library(survival)
library(survminer)
library(finalfit)
mydata %>%
finalfit::surv_plot('Surv(OverallTime, Outcome)', 'LVI',
xlab='Time (months)', pval=TRUE, legend = 'none',
break.time.by = 12, xlim = c(0,60)
# legend.labs = c('a','b')
)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Kaplan-Meier LVI-1.png)
Univariate Cox-Regression
explanatoryUni <- "LVI"
dependentUni <- "Surv(OverallTime, Outcome)"
tUni <- mydata %>% finalfit(dependentUni, explanatoryUni, metrics = TRUE)
knitr::kable(tUni[, 1:4], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))
Error in tUni[, 1:4]: incorrect number of dimensions
Univariate Cox-Regression Summary
tUni_df <- tibble::as_tibble(tUni, .name_repair = "minimal") %>% janitor::clean_names(dat = .,
case = "snake")
n_level <- dim(tUni_df)[1]
tUni_df_descr <- function(n) {
paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[n +
1], ", there is ", tUni_df$hr_univariable[n + 1], " times risk than ", "when ",
tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1], ".")
}
results5 <- purrr::map(.x = c(2:n_level - 1), .f = tUni_df_descr)
print(unlist(results5))
[1] "When is c(\"Absent\", \"Present\"), there is times risk than when is c(\"LVI\", \"\")."
\pagebreak
Median Survival
km_fit <- survfit(Surv(OverallTime, Outcome) ~ LVI, data = mydata)
km_fit
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)
4 observations deleted due to missingness
n events median 0.95LCL 0.95UCL
LVI=Absent 144 100 22.0 14.3 31.0
LVI=Present 102 64 10.5 9.9 13.8
km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% janitor::clean_names(dat = .,
case = "snake") %>% tibble::rownames_to_column(.data = ., var = "LVI")
km_fit_median_definition <- km_fit_median_df %>% dplyr::mutate(description = glue::glue("When, LVI, {LVI}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.")) %>%
dplyr::mutate(description = gsub(pattern = "thefactor=", replacement = " is ",
x = description)) %>% dplyr::select(description) %>% dplyr::pull()
km_fit_median_definition
When, LVI, LVI=Absent, median survival is 22 [14.3 - 31, 95% CI] months.
When, LVI, LVI=Present, median survival is 10.5 [9.9 - 13.8, 95% CI] months.
1-3-5-yr survival
summary(km_fit, times = c(12, 36, 60))
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)
4 observations deleted due to missingness
LVI=Absent
time n.risk n.event survival std.err lower 95% CI upper 95% CI
12 75 52 0.617 0.0421 0.539 0.705
36 19 35 0.252 0.0452 0.177 0.358
LVI=Present
time n.risk n.event survival std.err lower 95% CI upper 95% CI
12 23 49 0.383 0.0566 0.2870 0.512
36 4 12 0.134 0.0488 0.0657 0.274
km_fit_summary <- summary(km_fit, times = c(12, 36, 60))
km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event",
"surv", "std.err", "lower", "upper")])
km_fit_df
strata time n.risk n.event surv std.err lower upper
1 LVI=Absent 12 75 52 0.6165782 0.04211739 0.53931696 0.7049078
2 LVI=Absent 36 19 35 0.2520087 0.04515881 0.17737163 0.3580528
3 LVI=Present 12 23 49 0.3833784 0.05662684 0.28701265 0.5120993
4 LVI=Present 36 4 12 0.1340646 0.04881983 0.06566707 0.2737036
km_fit_definition <- km_fit_df %>% dplyr::mutate(description = glue::glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>%
dplyr::select(description) %>% dplyr::pull()
km_fit_definition
When LVI=Absent, 12 month survival is 62% [54%-70.5%, 95% CI].
When LVI=Absent, 36 month survival is 25% [18%-35.8%, 95% CI].
When LVI=Present, 12 month survival is 38% [29%-51.2%, 95% CI].
When LVI=Present, 36 month survival is 13% [7%-27.4%, 95% CI].
\pagebreak
summary(km_fit)$table
records n.max n.start events *rmean *se(rmean) median 0.95LCL
LVI=Absent 144 144 144 100 24.71341 1.571856 22.0 14.3
LVI=Present 102 102 102 64 17.48672 1.904576 10.5 9.9
0.95UCL
LVI=Absent 31.0
LVI=Present 13.8
km_fit_median_df <- summary(km_fit)
results1html <- as.data.frame(km_fit_median_df$table) %>% janitor::clean_names(dat = .,
case = "snake") %>% tibble::rownames_to_column(.data = ., var = "LVI")
results1html[, 1] <- gsub(pattern = "thefactor=", replacement = "", x = results1html[,
1])
knitr::kable(results1html, row.names = FALSE, align = c("l", rep("r", 9)), format = "html",
digits = 1)
LVI
records
n_max
n_start
events
rmean
se_rmean
median
x0_95lcl
x0_95ucl
LVI=Absent
144
144
144
100
24.7
1.6
22.0
14.3
31.0
LVI=Present
102
102
102
64
17.5
1.9
10.5
9.9
13.8
\pagebreak
Pairwise Comparisons
\pagebreak
dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "TStage"
mydata %>%
finalfit::surv_plot(.data = .,
dependent = dependentKM,
explanatory = explanatoryKM,
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
# legend.labs = c('a','b')
)
![](/Users/serdarbalciold/histopathRprojects/histopathology-template/figs/Kaplan-Meier Plot Log-Rank Test TStage-1.png)
km_fit
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)
4 observations deleted due to missingness
n events median 0.95LCL 0.95UCL
LVI=Absent 144 100 22.0 14.3 31.0
LVI=Present 102 64 10.5 9.9 13.8
print(km_fit,
scale=1,
digits = max(options()$digits - 4,3),
print.rmean=getOption("survfit.print.rmean"),
rmean = getOption('survfit.rmean'),
print.median=getOption("survfit.print.median"),
median = getOption('survfit.median')
)
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)
4 observations deleted due to missingness
n events median 0.95LCL 0.95UCL
LVI=Absent 144 100 22.0 14.3 31.0
LVI=Present 102 64 10.5 9.9 13.8
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% hr_plot(dependent, explanatory)
library(survival)
library(survminer)
library(finalfit)
mb_followup %>%
finalfit::surv_plot('Surv(OverallTime, Outcome)', 'Operation',
xlab='Time (months)', pval=TRUE, legend = 'none',
# pval.coord
break.time.by = 12, xlim = c(0,60), ylim = c(0.8, 1)
# legend.labs = c('a','b')
)
Univariate Cox-Regression
explanatoryUni <- "Operation"
dependentUni <- "Surv(OverallTime, Outcome)"
tUni <- mb_followup %>% finalfit(dependentUni, explanatoryUni)
knitr::kable(tUni[, 1:4], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))
Univariate Cox-Regression Summary
tUni_df <- tibble::as_tibble(tUni, .name_repair = "minimal") %>% janitor::clean_names(dat = .,
case = "snake")
n_level <- dim(tUni_df)[1]
tUni_df_descr <- function(n) {
paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[n +
1], ", there is ", tUni_df$hr_univariable[n + 1], " times risk than ", "when ",
tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1], ".")
}
results5 <- purrr::map(.x = c(2:n_level - 1), .f = tUni_df_descr)
print(unlist(results5))
\pagebreak
Median Survival
km_fit <- survfit(Surv(OverallTime, Outcome) ~ Operation, data = mb_followup)
# km_fit
# summary(km_fit)
km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% janitor::clean_names(dat = .,
case = "snake") %>% tibble::rownames_to_column(.data = ., var = "Derece")
km_fit_median_df
# km_fit_median_df %>% knitr::kable(format = 'latex') %>%
# kableExtra::kable_styling(latex_options='scale_down')
km_fit_median_definition <- km_fit_median_df %>% dplyr::mutate(description = glue::glue("When, Derece, {Derece}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.")) %>%
dplyr::mutate(description = gsub(pattern = "thefactor=", replacement = " is ",
x = description)) %>% dplyr::select(description) %>% dplyr::pull()
# km_fit_median_definition
1-3-5-yr survival
summary(km_fit, times = c(12, 36, 60))
km_fit_summary <- summary(km_fit, times = c(12, 36, 60))
km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event",
"surv", "std.err", "lower", "upper")])
km_fit_df %>% knitr::kable(format = "latex") %>% kableExtra::kable_styling(latex_options = "scale_down")
km_fit_definition <- km_fit_df %>% dplyr::mutate(description = glue::glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>%
dplyr::select(description) %>% dplyr::pull()
km_fit_definition
\pagebreak
Pairwise Comparisons
survminer::pairwise_survdiff(formula = Surv(OverallTime, Outcome) ~ Operation, data = mb_followup,
p.adjust.method = "BH")
\pagebreak
library(gt)
library(gtsummary)
library(survival)
fit1 <- survfit(Surv(ttdeath, death) ~ trt, trial)
tbl_strata_ex1 <- tbl_survival(fit1, times = c(12, 24), label = "{time} Months")
fit2 <- survfit(Surv(ttdeath, death) ~ 1, trial)
tbl_nostrata_ex2 <- tbl_survival(fit2, probs = c(0.1, 0.2, 0.5), header_estimate = "**Months**")
Codes for generating Survival Analysis.^[See childRmd/_18survival.Rmd
file for other codes]
Codes for generating Shiny Survival Analysis.^[See childRmd/_19shinySurvival.Rmd
file for other codes]
\elandscape
Codes for generating correlation analysis.^[See childRmd/_20correlation.Rmd
file for other codes]
https://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.test.html
https://neuropsychology.github.io/psycho.R/2018/05/20/correlation.html
devtools::install_github("neuropsychology/psycho.R") # Install the newest version
remove.packages("psycho")
renv::install("neuropsychology/psycho.R@0.4.0")
# devtools::install_github("neuropsychology/psycho.R@0.4.0")
library(psycho)
<!-- library(tidyverse) -->
cor <- psycho::affective %>%
correlation()
summary(cor)
plot(cor)
print(cor)
summary(cor) %>%
knitr::kable(format = "latex") %>%
kableExtra::kable_styling(latex_options="scale_down")
ggplot(mydata, aes(x = tx_zamani_verici_yasi, y = trombosit)) +
geom_point() +
geom_smooth(method = lm, size = 1)
Codes used in models^[See childRmd/_21models.Rmd
file for other codes]
Use these descriptions to add autoreporting of new models
generate automatic reporting of model via easystats/report 📦
library(report)
model <- lm(Sepal.Length ~ Species, data = iris)
report::report(model)
We fitted a linear model (estimated using OLS) to predict Sepal.Length with Species (formula = Sepal.Length ~ Species). Standardized parameters were obtained by fitting the model on a standardized version of the dataset. Effect sizes were labelled following Funder's (2019) recommendations.
The model explains a significant and substantial proportion of variance (R2 = 0.62, F(2, 147) = 119.26, p < .001, adj. R2 = 0.61). The model's intercept, corresponding to Sepal.Length = 0 and Species = setosa, is at 5.01 (SE = 0.07, 95% CI [4.86, 5.15], p < .001). Within this model:
- The effect of Speciesversicolor is positive and can be considered as very large and significant (beta = 1.12, SE = 0.12, 95% CI [0.88, 1.37], std. beta = 1.12, p < .001).
- The effect of Speciesvirginica is positive and can be considered as very large and significant (beta = 1.91, SE = 0.12, 95% CI [1.66, 2.16], std. beta = 1.91, p < .001).
Table report for a linear model
model <- lm(Sepal.Length ~ Petal.Length + Species, data=iris)
r <- report(model)
to_text(r)
to_table(r)
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html
model <- glm(vs ~ mpg + cyl, data=mtcars, family="binomial")
r <- report(model)
to_fulltext(r)
to_fulltable(r)
Where a multivariable model contains a subset of the variables specified in the full univariable set, this can be specified.
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
dependent = 'mort_5yr'
colon_s %>%
summarizer(dependent, explanatory, explanatory.multi)
Random effects.
e.g. lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial")
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
random.effect = "hospital"
dependent = 'mort_5yr'
colon_s %>%
summarizer(dependent, explanatory, explanatory.multi, random.effect)
metrics=TRUE provides common model metrics.
colon_s %>%
summarizer(dependent, explanatory, explanatory.multi, metrics=TRUE)
Cox proportional hazards
e.g. survival::coxph(dependent ~ explanatory)
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"
colon_s %>%
summarizer(dependent, explanatory)
Rather than going all-in-one, any number of subset models can be manually added on to a summary.factorlist() table using summarizer.merge(). This is particularly useful when models take a long-time to run or are complicated.
Note requirement for glm.id=TRUE. fit2df is a subfunction extracting most common models to a dataframe.
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
random.effect = "hospital"
dependent = 'mort_5yr'
# Separate tables
colon_s %>%
summary.factorlist(dependent, explanatory, glm.id=TRUE) -> example.summary
colon_s %>%
glmuni(dependent, explanatory) %>%
fit2df(estimate.suffix=" (univariable)") -> example.univariable
colon_s %>%
glmmulti(dependent, explanatory) %>%
fit2df(estimate.suffix=" (multivariable)") -> example.multivariable
colon_s %>%
glmmixed(dependent, explanatory, random.effect) %>%
fit2df(estimate.suffix=" (multilevel") -> example.multilevel
# Pipe together
example.summary %>%
summarizer.merge(example.univariable) %>%
summarizer.merge(example.multivariable) %>%
summarizer.merge(example.multilevel) %>%
select(-c(glm.id, index)) -> example.final
example.final
Cox Proportional Hazards example with separate tables merged together.
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
dependent = "Surv(time, status)"
# Separate tables
colon_s %>%
summary.factorlist(dependent, explanatory, glm.id=TRUE) -> example2.summary
colon_s %>%
coxphuni(dependent, explanatory) %>%
fit2df(estimate.suffix=" (univariable)") -> example2.univariable
colon_s %>%
coxphmulti(dependent, explanatory.multi) %>%
fit2df(estimate.suffix=" (multivariable)") -> example2.multivariable
# Pipe together
example2.summary %>%
summarizer.merge(example2.univariable) %>%
summarizer.merge(example2.multivariable) %>%
select(-c(glm.id, index)) -> example2.final
example2.final
# OR plot
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
or.plot(dependent, explanatory)
# Previously fitted models (`glmmulti()` or `glmmixed()`) can be provided directly to `glmfit`
# HR plot (not fully tested)
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"
colon_s %>%
hr.plot(dependent, explanatory, dependent_label = "Survival")
# Previously fitted models (`coxphmulti`) can be provided directly using `coxfit`
# Full report for a Bayesian logistic mixed model with effect sizes
library(rstanarm)
stan_glmer(vs ~ mpg + (1|cyl), data=mtcars, family="binomial") %>%
report(standardize="smart", effsize="cohen1988") %>%
to_fulltext()
Test if your model is a good model
https://easystats.github.io/performance/
\pagebreak
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{
Some Text ile sağkalım açısından bir ilişki bulunmamıştır (p = 0.22).
} }
my_text <- kableExtra::text_spec("Some Text", color = "red", background = "yellow")
# `r my_text`
\pagebreak
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{
} }
\pagebreak
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}\noindent\colorbox{yellow}{ \parbox{\dimexpr\linewidth-2\fboxsep}{
Text Here
} }
\pagebreak
\pagecolor{yellow}\afterpage{\nopagecolor}
\pagebreak
content of sub-chapter #1
content of sub-chapter #2
content of sub-chapter #3
Block rmdnote
Block rmdtip
Block warning
\pagebreak
Interpret the results in context of the working hypothesis elaborated in the introduction and other relevant studies; include a discussion of limitations of the study.
Discuss potential clinical applications and implications for future research
\pagebreak
Codes for explaining the software and the packages that are used in the analysis^[See childRmd/_23footer.Rmd
file for other codes]
projectName <- list.files(path = here::here(), pattern = "Rproj")
projectName <- gsub(pattern = ".Rproj", replacement = "", x = projectName)
analysisDate <- as.character(Sys.Date())
imageName <- paste0(projectName, analysisDate, ".RData")
save.image(file = here::here("data", imageName))
rdsName <- paste0(projectName, analysisDate, ".rds")
readr::write_rds(x = mydata, path = here::here("data", rdsName))
saveRDS(object = mydata, file = here::here("data", rdsName))
excelName <- paste0(projectName, analysisDate, ".xlsx")
rio::export(x = mydata, file = here::here("data", excelName), format = "xlsx")
# writexl::write_xlsx(mydata, here::here('data', excelName))
print(glue::glue("saved data after analysis to ", rownames(file.info(here::here("data",
excelName))), " : ", as.character(file.info(here::here("data", excelName))$ctime)))
saved data after analysis to /Users/serdarbalciold/histopathRprojects/histopathology-template/data/histopathology-template2020-02-26.xlsx : 2020-02-26 15:31:04
mydata %>% downloadthis::download_this(output_name = excelName, output_extension = ".csv",
button_label = "Download data as csv", button_type = "default")
mydata %>% downloadthis::download_this(output_name = excelName, output_extension = ".xlsx",
button_label = "Download data as xlsx", button_type = "primary")
\pagebreak
# use summarytools to generate final data summary
# summarytools::view(summarytools::dfSummary(x = mydata
# , style = "markdown"))
\pagebreak
Why and how to cite software and packages?^[Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science 2:e86. DOI: 10.7717/peerj-cs.86 https://www.force11.org/software-citation-principles]
citation()
To cite R in publications use:
R Core Team (2019). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.
URL https://www.R-project.org/.
A BibTeX entry for LaTeX users is
@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2019},
url = {https://www.R-project.org/},
}
We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.
The jamovi project (2019). jamovi. (Version 0.9) [Computer Software]. Retrieved from https://www.jamovi.org. R Core Team (2018). R: A Language and envionment for statistical computing. [Computer software]. Retrieved from https://cran.r-project.org/. Fox, J., & Weisberg, S. (2018). car: Companion to Applied Regression. [R package]. Retrieved from https://cran.r-project.org/package=car. Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686 Data processing was carried out with R (R Core Team, 2019) and the easystats ecosystem (Lüdecke, Waggoner, & Makowski, 2019; Makowski, Ben-Shachar, & Lüdecke, 2019)
report::cite_packages(session = sessionInfo())
Alastair Rushworth (2019). inspectdf: Inspection, Comparison and Visualisation of Data Frames. R package version 0.0.7. https://CRAN.R-project.org/package=inspectdf Alboukadel Kassambara (2020). ggpubr: 'ggplot2' Based Publication Ready Plots. R package version 0.2.5. https://CRAN.R-project.org/package=ggpubr Alboukadel Kassambara, Marcin Kosinski and Przemyslaw Biecek (2019). survminer: Drawing Survival Curves using 'ggplot2'. R package version 0.4.6. https://CRAN.R-project.org/package=survminer Benjamin Elbers (2020). tidylog: Logging for 'dplyr' and 'tidyr' Functions. R package version 1.0.0. https://CRAN.R-project.org/package=tidylog Boxuan Cui (2020). DataExplorer: Automate Data Exploration and Treatment. R package version 0.8.1. https://CRAN.R-project.org/package=DataExplorer Chung-hong Chan, Geoffrey CH Chan, Thomas J. Leeper, and Jason Becker (2018). rio: A Swiss-army knife for data file I/O. R package version 0.5.16. David Robinson and Alex Hayes (2020). broom: Convert Statistical Analysis Objects into Tidy Tibbles. R package version 0.5.4. https://CRAN.R-project.org/package=broom Dayanand Ubrangala, Kiran R, Ravi Prasad Kondapalli and Sayan Putatunda (2020). SmartEDA: Summarize and Explore the Data. R package version 0.3.3. https://CRAN.R-project.org/package=SmartEDA Dirk Eddelbuettel and Romain Francois (2011). Rcpp: Seamless R and C++ Integration. Journal of Statistical Software, 40(8), 1-18. URL http://www.jstatsoft.org/v40/i08/. Dirk Eddelbuettel with contributions by Antoine Lucas, Jarek Tuszynski, Henrik Bengtsson, Simon Urbanek, Mario Frasca, Bryan Lewis, Murray Stokely, Hannes Muehleisen, Duncan Murdoch, Jim Hester, Wush Wu, Qiang Kou, Thierry Onkelinx, Michel Lang, Viliam Simko, Kurt Hornik, Radford Neal, Kendon Bell, Matthew de Queljoe, Ion Suruceanu and Bill Denney. (2020). digest: Create Compact Hash Digests of R Objects. R package version 0.6.24. https://CRAN.R-project.org/package=digest Ethan Heinzen, Jason Sinnwell, Elizabeth Atkinson, Tina Gunderson and Gregory Dougherty (2020). arsenal: An Arsenal of 'R' Functions for Large-Scale Statistical Summaries. R package version 3.4.0. https://CRAN.R-project.org/package=arsenal Ewen Harrison, Tom Drake and Riinu Ots (2019). finalfit: Quickly Create Elegant Regression Results Tables and Plots when Modelling. R package version 0.9.7. https://CRAN.R-project.org/package=finalfit Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL http://www.jstatsoft.org/v40/i03/. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. Hadley Wickham (2019). feather: R Bindings to the Feather 'API'. R package version 0.3.5. https://CRAN.R-project.org/package=feather Hadley Wickham (2019). forcats: Tools for Working with Categorical Variables (Factors). R package version 0.4.0. https://CRAN.R-project.org/package=forcats Hadley Wickham (2019). httr: Tools for Working with URLs and HTTP. R package version 1.4.1. https://CRAN.R-project.org/package=httr Hadley Wickham (2019). modelr: Modelling Functions that Work with the Pipe. R package version 0.1.5. https://CRAN.R-project.org/package=modelr Hadley Wickham (2019). rvest: Easily Harvest (Scrape) Web Pages. R package version 0.3.5. https://CRAN.R-project.org/package=rvest Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. https://CRAN.R-project.org/package=stringr Hadley Wickham and Evan Miller (2019). haven: Import and Export 'SPSS', 'Stata' and 'SAS' Files. R package version 2.2.0. https://CRAN.R-project.org/package=haven Hadley Wickham and Jennifer Bryan (2019). readxl: Read Excel Files. R package version 1.3.1. https://CRAN.R-project.org/package=readxl Hadley Wickham and Lionel Henry (2020). tidyr: Tidy Messy Data. R package version 1.0.2. https://CRAN.R-project.org/package=tidyr Hadley Wickham and Yihui Xie (2019). evaluate: Parsing and Evaluation Tools that Provide More Details than the Default. R package version 0.14. https://CRAN.R-project.org/package=evaluate Hadley Wickham, Jim Hester and Jeroen Ooms (2019). xml2: Parse XML. R package version 1.2.2. https://CRAN.R-project.org/package=xml2 Hadley Wickham, Jim Hester and Romain Francois (2018). readr: Read Rectangular Text Data. R package version 1.3.1. https://CRAN.R-project.org/package=readr Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2020). dplyr: A Grammar of Data Manipulation. R package version 0.8.4. https://CRAN.R-project.org/package=dplyr Jeremy Stephens, Kirill Simonov, Yihui Xie, Zhuoer Dong, Hadley Wickham, Jeffrey Horner, reikoch, Will Beasley, Brendan O'Connor and Gregory R. Warnes (2020). yaml: Methods to Convert R Data to YAML and Back. R package version 2.2.1. https://CRAN.R-project.org/package=yaml Jeroen Ooms (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805 [stat.CO] URL https://arxiv.org/abs/1403.2805. Jim Hester (2019). glue: Interpreted String Literals. R package version 1.3.1. https://CRAN.R-project.org/package=glue Jim Hester and Gábor Csárdi (2019). pak: Another Approach to Package Installation. R package version 0.1.2. https://CRAN.R-project.org/package=pak Jim Hester, Gábor Csárdi, Hadley Wickham, Winston Chang, Martin Morgan and Dan Tenenbaum (2020). remotes: R Package Installation from Remote Repositories, Including 'GitHub'. R package version 2.1.1. https://CRAN.R-project.org/package=remotes JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2020). rmarkdown: Dynamic Documents for R. R package version 2.1. URL https://rmarkdown.rstudio.com. JJ Allaire, Jeffrey Horner, Yihui Xie, Vicent Marti and Natacha Porte (2019). markdown: Render Markdown with the C Library 'Sundown'. R package version 1.1. https://CRAN.R-project.org/package=markdown Kazuki Yoshida (2019). tableone: Create 'Table 1' to Describe Baseline Characteristics. R package version 0.10.0. https://CRAN.R-project.org/package=tableone Kevin Ushey (2020). renv: Project Environments. R package version 0.9.3. https://CRAN.R-project.org/package=renv Kirill Müller (2017). here: A Simpler Way to Find Your Files. R package version 0.1. https://CRAN.R-project.org/package=here Kirill Müller (2020). hms: Pretty Time of Day. R package version 0.5.3. https://CRAN.R-project.org/package=hms Kirill Müller and Hadley Wickham (2019). tibble: Simple Data Frames. R package version 2.1.3. https://CRAN.R-project.org/package=tibble Koji Makiyama (2016). magicfor: Magic Functions to Obtain Results from for Loops. R package version 0.1.0. https://CRAN.R-project.org/package=magicfor Lionel Henry and Hadley Wickham (2019). purrr: Functional Programming Tools. R package version 0.3.3. https://CRAN.R-project.org/package=purrr Lionel Henry and Hadley Wickham (2020). rlang: Functions for Base Types and Core R and 'Tidyverse' Features. R package version 0.4.4. https://CRAN.R-project.org/package=rlang Makowski, D. & Lüdecke, D. (2019). The report package for R: Ensuring the use of best practices for results reporting. CRAN. Available from https://github.com/easystats/report. doi: . Pablo Seibelt (2017). xray: X Ray Vision on your Datasets. R package version 0.2. https://CRAN.R-project.org/package=xray Paul Hendricks (2015). describer: Describe Data in R Using Common Descriptive Statistics. R package version 0.2.0. https://CRAN.R-project.org/package=describer Petersen AH, Ekstrøm CT (2019). "dataMaid: Your Assistant forDocumenting Supervised Data Quality Screening in R." Journal ofStatistical Software, 90(6), 1-38. doi: 10.18637/jss.v090.i06 (URL:https://doi.org/10.18637/jss.v090.i06). Rinker, T. W. (2018). wakefield: Generate Random Data. version 0.3.3. Buffalo, New York. https://github.com/trinker/wakefield Rinker, T. W. & Kurkiewicz, D. (2017). pacman: Package Management for R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman Roland Krasser (2020). explore: Simplifies Exploratory Data Analysis. R package version 0.5.4. https://CRAN.R-project.org/package=explore RStudio and Inc. (2019). htmltools: Tools for HTML. R package version 0.4.0. https://CRAN.R-project.org/package=htmltools Sam Firke (2020). janitor: Simple Tools for Examining and Cleaning Dirty Data. R package version 1.2.1. https://CRAN.R-project.org/package=janitor Simon Garnier (2018). viridis: Default Color Maps from 'matplotlib'. R package version 0.5.1. https://CRAN.R-project.org/package=viridis Simon Garnier (2018). viridisLite: Default Color Maps from 'matplotlib' (Lite Version). R package version 0.3.0. https://CRAN.R-project.org/package=viridisLite Simon Urbanek (2015). base64enc: Tools for base64 encoding. R package version 0.1-3. https://CRAN.R-project.org/package=base64enc Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr Therneau T (2015). A Package for Survival Analysis in S. version2.38, . Tierney N (2017). "visdat: Visualising Whole Data Frames." JOSS,2(16), 355. doi: 10.21105/joss.00355 (URL:https://doi.org/10.21105/joss.00355), . Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2019). shiny: Web Application Framework for R. R package version 1.4.0. https://CRAN.R-project.org/package=shiny Yihui Xie (2019). formatR: Format R Code Automatically. R package version 1.7. https://CRAN.R-project.org/package=formatR Yihui Xie (2020). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.28. Yihui Xie (2020). mime: Map Filenames to MIME Types. R package version 0.9. https://CRAN.R-project.org/package=mime Yixuan Qiu and Yihui Xie (2019). highr: Syntax Highlighting for R Source Code. R package version 0.8. https://CRAN.R-project.org/package=highr
report::show_packages(session = sessionInfo()) %>% kableExtra::kable()
# citation('tidyverse')
citation("readxl")
To cite package 'readxl' in publications use:
Hadley Wickham and Jennifer Bryan (2019). readxl: Read Excel Files. R
package version 1.3.1. https://CRAN.R-project.org/package=readxl
A BibTeX entry for LaTeX users is
@Manual{,
title = {readxl: Read Excel Files},
author = {Hadley Wickham and Jennifer Bryan},
year = {2019},
note = {R package version 1.3.1},
url = {https://CRAN.R-project.org/package=readxl},
}
citation("janitor")
To cite package 'janitor' in publications use:
Sam Firke (2020). janitor: Simple Tools for Examining and Cleaning
Dirty Data. R package version 1.2.1.
https://CRAN.R-project.org/package=janitor
A BibTeX entry for LaTeX users is
@Manual{,
title = {janitor: Simple Tools for Examining and Cleaning Dirty Data},
author = {Sam Firke},
year = {2020},
note = {R package version 1.2.1},
url = {https://CRAN.R-project.org/package=janitor},
}
# citation('report')
citation("finalfit")
To cite package 'finalfit' in publications use:
Ewen Harrison, Tom Drake and Riinu Ots (2019). finalfit: Quickly
Create Elegant Regression Results Tables and Plots when Modelling. R
package version 0.9.7. https://CRAN.R-project.org/package=finalfit
A BibTeX entry for LaTeX users is
@Manual{,
title = {finalfit: Quickly Create Elegant Regression Results Tables and Plots when
Modelling},
author = {Ewen Harrison and Tom Drake and Riinu Ots},
year = {2019},
note = {R package version 0.9.7},
url = {https://CRAN.R-project.org/package=finalfit},
}
# citation('ggstatsplot')
if (!dir.exists(here::here("bib"))) {
dir.create(here::here("bib"))
}
knitr::write_bib(x = c(.packages(), "knitr", "shiny"), file = here::here("bib", "packages.bib"))
\pagebreak
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.15.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] survminer_0.4.6 ggpubr_0.2.5 viridis_0.5.1 viridisLite_0.3.0
[5] shiny_1.4.0 survival_3.1-8 magrittr_1.5 report_0.1.0
[9] wakefield_0.3.3 SmartEDA_0.3.3 magicfor_0.1.0 tableone_0.10.0
[13] arsenal_3.4.0 DataExplorer_0.8.1 xray_0.2 visdat_0.5.3
[17] inspectdf_0.0.7 describer_0.2.0 dataMaid_1.4.0 finalfit_0.9.7
[21] explore_0.5.4 rio_0.5.16 janitor_1.2.1 formatR_1.7
[25] renv_0.9.3 rlang_0.4.4 glue_1.3.1 tidylog_1.0.0
[29] broom_0.5.4 modelr_0.1.5 rvest_0.3.5 xml2_1.2.2
[33] readxl_1.3.1 httr_1.4.1 haven_2.2.0 feather_0.3.5
[37] lubridate_1.7.4 hms_0.5.3 forcats_0.4.0 stringr_1.4.0
[41] tibble_2.1.3 purrr_0.3.3 readr_1.3.1 tidyr_1.0.2
[45] dplyr_0.8.4 ggplot2_3.2.1 rmarkdown_2.1 mime_0.9
[49] base64enc_0.1-3 jsonlite_1.6.1 knitr_1.28 htmltools_0.4.0
[53] Rcpp_1.0.3 yaml_2.2.1 markdown_1.1 highr_0.8
[57] digest_0.6.24 evaluate_0.14 here_0.1 pak_0.1.2
[61] pacman_0.5.1 remotes_2.1.1
loaded via a namespace (and not attached):
[1] utf8_1.1.4 tidyselect_1.0.0 lme4_1.1-21
[4] htmlwidgets_1.5.1 grid_3.6.0 lpSolve_5.6.15
[7] munsell_0.5.0 effectsize_0.1.2 codetools_0.2-16
[10] DT_0.12.1 withr_2.1.2 ISLR_1.2
[13] colorspace_1.4-1 rstudioapi_0.11 robustbase_0.93-5
[16] ggsignif_0.6.0 labeling_0.3 KMsurv_0.1-5
[19] farver_2.0.3 rprojroot_1.3-2 vctrs_0.2.3
[22] generics_0.0.2 xfun_0.12 downloadthis_0.1.0
[25] R6_2.4.1 reshape_0.8.8 assertthat_0.2.1
[28] promises_1.1.0 networkD3_0.4 scales_1.1.0
[31] nnet_7.3-12 gtable_0.3.0 clisymbols_1.2.0
[34] whoami_1.3.0 splines_3.6.0 lazyeval_0.2.2
[37] acepack_1.4.1 bsplus_0.1.1 checkmate_2.0.0
[40] backports_1.1.5 httpuv_1.5.2 Hmisc_4.3-1
[43] tools_3.6.0 ellipsis_0.3.0 RColorBrewer_1.1-2
[46] plyr_1.8.5 jmvcore_1.2.5 progress_1.2.2
[49] prettyunits_1.1.1 rpart_4.1-15 sampling_2.8
[52] zoo_1.8-7 reactR_0.4.2 cluster_2.1.0
[55] fs_1.3.1 survey_3.37 data.table_1.12.8
[58] openxlsx_4.1.4 mitml_0.3-7 reactable_0.1.0
[61] xtable_1.8-4 jpeg_0.1-8.1 gridExtra_2.3
[64] compiler_3.6.0 mice_3.7.0 writexl_1.2
[67] crayon_1.3.4 minqa_1.2.4 later_1.0.0
[70] Formula_1.2-3 DBI_1.1.0 jmv_1.2.5
[73] MASS_7.3-51.5 boot_1.3-24 Matrix_1.2-18
[76] cli_2.0.1 mitools_2.4 parallel_3.6.0
[79] insight_0.8.1 pan_1.6 igraph_1.2.4.2
[82] pkgconfig_2.0.3 km.ci_0.5-2 foreign_0.8-75
[85] foreach_1.4.8 snakecase_0.11.0 parameters_0.5.0.1
[88] cellranger_1.1.0 survMisc_0.5.5 htmlTable_1.13.3
[91] curl_4.3 jomo_2.6-10 rjson_0.2.20
[94] nloptr_1.2.1 lifecycle_0.1.0 nlme_3.1-144
[97] fansi_0.4.1 labelled_2.2.2 pillar_1.4.3
[100] ggsci_2.9 lattice_0.20-38 GGally_1.4.0
[103] fastmap_1.0.1 DEoptimR_1.0-8 bayestestR_0.5.2
[106] zip_2.0.4 png_0.1-7 iterators_1.0.12
[109] pander_0.6.3 performance_0.4.4 class_7.3-15
[112] stringi_1.4.6 ggfittext_0.8.1 latticeExtra_0.6-29
[115] e1071_1.7-3
\pagebreak
pacman::p_loaded(all = TRUE)
\pagebreak
Last update on $ 2020-05-13 15:20:11 $
Serdar Balci, MD, Pathologist serdarbalci@serdarbalci.com https://rpubs.com/sbalci/CV https://github.com/sbalci https://sbalci.github.io/ Patoloji Notları ParaPathology https://twitter.com/serdarbalci
\pagebreak
Use following chunk options to include all codes below the report.
{r, echo=TRUE, eval=FALSE, ref.label=knitr::all_labels()}
# installing necessary packages
if (requireNamespace("magrittr", quietly = TRUE)) {
`%>%` <- magrittr::`%>%`
}
if (!require("remotes")) install.packages("remotes")
if (!require("pacman")) install.packages("pacman")
if (!require("pak")) install.packages("pak")
if (!require("here")) install.packages("here")
source_rmd <- function(rmd_file){
knitr::knit(rmd_file, output = tempfile(), envir = globalenv())
}
list_of_Rmd <- list.files(path = here::here("childRmd"), pattern = "Rmd")
list_of_Rmd <- list_of_Rmd[!list_of_Rmd %in% c("_19shinySurvival.Rmd")]
purrr::map(.x = here::here("childRmd", list_of_Rmd), .f = source_rmd)
source(file = here::here("R", "force_git.R"))
knitr::opts_chunk$set(
eval = TRUE,
echo = TRUE,
fig.path = here::here("figs/"),
message = FALSE,
warning = FALSE,
error = TRUE,
cache = TRUE,
comment = NA,
tidy = TRUE,
fig.width = 6,
fig.height = 4
)
library(knitr)
hook_output = knit_hooks$get('output')
knit_hooks$set(output = function(x, options) {
# this hook is used only when the linewidth option is not NULL
if (!is.null(n <- options$linewidth)) {
x = knitr:::split_lines(x)
# any lines wider than n should be wrapped
if (any(nchar(x) > n)) x = strwrap(x, width = n)
x = paste(x, collapse = '\n')
}
hook_output(x, options)
})
# linewidth css
pre:not([class]) {
color: #333333;
background-color: #cccccc;
}
# linewidth css
pre.jamovitable{
color:black;
background-color: white;
margin-bottom: 35px;
}
jtable<-function(jobject,digits=3) {
snames<-sapply(jobject$columns,function(a) a$title)
asDF<-jobject$asDF
tnames<-unlist(lapply(names(asDF) ,function(n) snames[[n]]))
names(asDF)<-tnames
kableExtra::kable(asDF,"html",
table.attr='class="jmv-results-table-table"',
row.names = F,
digits=3)
}
# https://cran.r-project.org/web/packages/exploreR/vignettes/exploreR.html
# exploreR::reset()
Block rmdnote
Block rmdtip
Block warning
source(file = here::here("R", "loadLibrary.R"))
source(file = here::here("R", "gc_fake_data.R"))
wakefield::table_heat(x = fakedata, palette = "Set1", flip = TRUE, print = TRUE)
library(readxl)
mydata <- readxl::read_excel(here::here("data", "mydata.xlsx"))
# View(mydata) # Use to view data after importing
# https://cran.r-project.org/web/packages/rio/vignettes/rio.html
# rio::install_formats()
x <- rio::import("mtcars.csv")
y <- rio::import("mtcars.rds")
z <- rio::import("mtcars.dta")
rio::import("mtcars_noext", format = "csv")
rio::export(mtcars, "mtcars.csv")
rio::export(mtcars, "mtcars.rds")
rio::export(mtcars, "mtcars.dta")
rio::export(list(mtcars = mtcars, iris = iris), "multi.xlsx")
# Dataframe report
mydata %>%
dplyr::select(-contains("Date")) %>%
report::report(.)
mydata %>% explore::describe_tbl()
dput(names(mydata))
keycolumns <-
mydata %>%
sapply(., FUN = dataMaid::isKey) %>%
tibble::as_tibble() %>%
dplyr::select(
which(.[1, ] == TRUE)
) %>%
names()
keycolumns
mydata %>%
dplyr::select(-keycolumns) %>%
inspectdf::inspect_types()
mydata %>%
dplyr::select(-keycolumns,
-contains("Date")) %>%
describer::describe() %>%
knitr::kable(format = "markdown")
mydata %>%
dplyr::select(-keycolumns) %>%
inspectdf::inspect_types() %>%
inspectdf::show_plot()
# https://github.com/ropensci/visdat
# http://visdat.njtierney.com/articles/using_visdat.html
# https://cran.r-project.org/web/packages/visdat/index.html
# http://visdat.njtierney.com/
# visdat::vis_guess(mydata)
visdat::vis_dat(mydata)
mydata %>% explore::explore_tbl()
mydata %>%
dplyr::select(-keycolumns) %>%
inspectdf::inspect_types() %>%
dplyr::filter(type == "character") %>%
dplyr::select(col_name) %>%
dplyr::pull() %>%
unlist() -> characterVariables
characterVariables
mydata %>%
dplyr::select(-keycolumns,
-contains("Date")
) %>%
describer::describe() %>%
janitor::clean_names() %>%
dplyr::filter(column_type == "factor") %>%
dplyr::select(column_name) %>%
dplyr::pull() -> categoricalVariables
categoricalVariables
mydata %>%
dplyr::select(-keycolumns,
-contains("Date")) %>%
describer::describe() %>%
janitor::clean_names() %>%
dplyr::filter(column_type == "numeric" | column_type == "double") %>%
dplyr::select(column_name) %>%
dplyr::pull() -> continiousVariables
continiousVariables
mydata %>%
dplyr::select(-keycolumns) %>%
inspectdf::inspect_types() %>%
dplyr::filter(type == "numeric") %>%
dplyr::select(col_name) %>%
dplyr::pull() %>%
unlist() -> numericVariables
numericVariables
mydata %>%
dplyr::select(-keycolumns) %>%
inspectdf::inspect_types() %>%
dplyr::filter(type == "integer") %>%
dplyr::select(col_name) %>%
dplyr::pull() %>%
unlist() -> integerVariables
integerVariables
mydata %>%
dplyr::select(-keycolumns) %>%
inspectdf::inspect_types() %>%
dplyr::filter(type == "list") %>%
dplyr::select(col_name) %>%
dplyr::pull() %>%
unlist() -> listVariables
listVariables
is_date <- function(x) inherits(x, c("POSIXct", "POSIXt"))
dateVariables <-
names(which(sapply(mydata, FUN = is_date) == TRUE))
dateVariables
View(mydata)
reactable::reactable(data = mydata, sortable = TRUE, resizable = TRUE, filterable = TRUE, searchable = TRUE, pagination = TRUE, paginationType = "numbers", showPageSizeOptions = TRUE, highlight = TRUE, striped = TRUE, outlined = TRUE, compact = TRUE, wrap = FALSE, showSortIcon = TRUE, showSortable = TRUE)
summarytools::view(summarytools::dfSummary(mydata %>% dplyr::select(-keycolumns)))
if(!dir.exists(here::here("out"))) {dir.create(here::here("out"))}
summarytools::view(
x = summarytools::dfSummary(
mydata %>%
dplyr::select(-keycolumns)
),
file = here::here("out", "mydata_summary.html")
)
if(!dir.exists(here::here("out"))) {dir.create(here::here("out"))}
dataMaid::makeDataReport(data = mydata,
file = here::here("out", "dataMaid_mydata.Rmd"),
replace = TRUE,
openResult = FALSE,
render = FALSE,
quiet = TRUE
)
if(!dir.exists(here::here("out"))) {dir.create(here::here("out"))}
mydata %>%
dplyr::select(
-dateVariables
) %>%
explore::report(
output_file = "mydata_report.html",
output_dir = here::here("out")
)
dplyr::glimpse(mydata %>% dplyr::select(-keycolumns, -dateVariables))
mydata %>% explore::describe()
explore::explore(mydata)
mydata %>%
explore::explore_all()
visdat::vis_expect(data = mydata,
expectation = ~.x == -1,
show_perc = TRUE)
visdat::vis_expect(mydata, ~.x >= 25)
visdat::vis_miss(airquality,
cluster = TRUE)
visdat::vis_miss(airquality,
sort_miss = TRUE)
xray::anomalies(mydata)
xray::distributions(mydata)
DataExplorer::plot_str(mydata)
DataExplorer::plot_str(mydata, type = "r")
DataExplorer::introduce(mydata)
DataExplorer::plot_intro(mydata)
DataExplorer::plot_missing(mydata)
mydata2 <- DataExplorer::drop_columns(mydata, "TStage")
DataExplorer::plot_bar(mydata)
DataExplorer::plot_bar(mydata, with = "Death")
DataExplorer::plot_histogram(mydata)
if(!dir.exists(here::here("out"))) {dir.create(here::here("out"))}
# https://cran.r-project.org/web/packages/dataMaid/vignettes/extending_dataMaid.html
library("dataMaid")
dataMaid::makeDataReport(mydata,
#add extra precheck function
preChecks = c("isKey", "isSingular", "isSupported", "isID"),
#Add the extra summaries - countZeros() for character, factor,
#integer, labelled and numeric variables and meanSummary() for integer,
#numeric and logical variables:
summaries = setSummaries(
character = defaultCharacterSummaries(add = "countZeros"),
factor = defaultFactorSummaries(add = "countZeros"),
labelled = defaultLabelledSummaries(add = "countZeros"),
numeric = defaultNumericSummaries(add = c("countZeros", "meanSummary")),
integer = defaultIntegerSummaries(add = c("countZeros", "meanSummary")),
logical = defaultLogicalSummaries(add = c("meanSummary"))
),
#choose mosaicVisual() for categorical variables,
#prettierHist() for all others:
visuals = setVisuals(
factor = "mosaicVisual",
numeric = "prettierHist",
integer = "prettierHist",
Date = "prettierHist"
),
#Add the new checkFunction, identifyColons, for character, factor and
#labelled variables:
checks = setChecks(
character = defaultCharacterChecks(add = "identifyColons"),
factor = defaultFactorChecks(add = "identifyColons"),
labelled = defaultLabelledChecks(add = "identifyColons")
),
#overwrite old versions of the report, render to html and don't
#open the html file automatically:
replace = TRUE,
output = "html",
open = FALSE,
file = here::here("out/dataMaid_mydata.Rmd")
)
# https://cran.r-project.org/web/packages/summarytools/vignettes/Recommendations-rmarkdown.html
# https://github.com/dcomtois/summarytools
library(knitr)
opts_chunk$set(comment=NA,
prompt=FALSE,
cache=FALSE,
echo=TRUE,
results='asis' # add to individual summarytools chunks
)
library(summarytools)
st_css()
st_options(bootstrap.css = FALSE, # Already part of the theme so no need for it
plain.ascii = FALSE, # One of the essential settings
style = "rmarkdown", # Idem.
dfSummary.silent = TRUE, # Suppresses messages about temporary files
footnote = NA, # Keeping the results minimalistic
subtitle.emphasis = FALSE) # For the vignette theme, this gives
# much better results. Your mileage may vary.
summarytools::freq(iris$Species, plain.ascii = FALSE, style = "rmarkdown")
summarytools::freq(iris$Species, report.nas = FALSE, headings = FALSE, cumul = TRUE, totals = TRUE)
summarytools::freq(tobacco$gender, style = 'rmarkdown')
summarytools::freq(tobacco[ ,c("gender", "age.gr", "smoker")])
print(freq(tobacco$gender), method = 'render')
view(dfSummary(iris))
dfSummary(tobacco, style = 'grid', graph.magnif = 0.75, tmp.img.dir = "/tmp")
dfSummary(tobacco, plain.ascii = FALSE, style = "grid",
graph.magnif = 0.75, valid.col = FALSE, tmp.img.dir = "/tmp")
print(dfSummary(tobacco, graph.magnif = 0.75), method = 'render')
# https://github.com/rolkra/explore
# https://cran.r-project.org/web/packages/explore/vignettes/explore.html
# https://cran.r-project.org/web/packages/explore/vignettes/explore_mtcars.html
# library(dplyr)
# library(explore)
explore::explore(mydata)
# iris %>% report(output_file = "report.html", output_dir = here::here())
# iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
# iris %>%
# report(output_file = "report.html",
# output_dir = here::here(),
# target = is_versicolor
# # , split = FALSE
# )
iris %>% explore::explore_tbl()
iris %>% explore::describe_tbl()
iris %>% explore::explore(Species)
iris %>% explore::explore(Sepal.Length)
iris %>% explore::explore(Sepal.Length, target = is_versicolor)
iris %>% explore::explore(Sepal.Length, target = is_versicolor, split = FALSE)
iris %>% explore::explore(Sepal.Length, target = Species)
iris %>% explore::explore(Sepal.Length, target = Petal.Length)
%>% %>%
explore::explore_all()
iris %>%
dplyr::select(Sepal.Length, Sepal.Width) %>%
explore::explore_all()
iris %>%
dplyr::select(Sepal.Length, Sepal.Width, is_versicolor) %>%
explore::explore_all(target = is_versicolor)
iris %>%
dplyr::select(Sepal.Length, Sepal.Width, is_versicolor) %>%
explore::explore_all(target = is_versicolor, split = FALSE)
iris %>%
dplyr::select(Sepal.Length, Sepal.Width, Species) %>%
explore::explore_all(target = Species)
iris %>%
dplyr::select(Sepal.Length, Sepal.Width, Petal.Length) %>%
explore::explore_all(target = Petal.Length)
iris %>%
explore::explore_all()
knitr::opts_current(fig.height=explore::total_fig_height(iris, target = Species))
explore::total_fig_height(iris, target = Species)
iris %>% explore::explore_all(target = Species)
iris %>% explore::explore(Sepal.Length, min_val = 4.5, max_val = 7)
iris %>% explore::explore(Sepal.Length, auto_scale = FALSE)
mtcars %>% explore::describe()
# https://cran.r-project.org/web/packages/dlookr/vignettes/EDA.html
dlookr::describe(mydata
# ,
# cols = c(statistic)
)
# dlookr::describe(carseats, Sales, CompPrice, Income)
# dlookr::describe(carseats, Sales:Income)
# dlookr::describe(carseats, -(Sales:Income))
mydata %>%
dlookr::describe() %>%
dplyr::select(variable, skewness, mean, p25, p50, p75) %>%
dplyr::filter(!is.na(skewness)) %>%
arrange(desc(abs(skewness)))
# https://cran.r-project.org/web/packages/dlookr/vignettes/EDA.html
carseats %>%
dlookr::eda_report(target = Sales,
output_format = "pdf",
output_file = "EDA.pdf"
)
carseats %>%
dlookr::eda_report(target = Sales,
output_format = "html",
output_file = "EDA.html"
)
# install.packages("ISLR")
library("ISLR")
# install.packages("SmartEDA")
library("SmartEDA")
## Load sample dataset from ISLR pacakge
Carseats <- ISLR::Carseats
## overview of the data;
SmartEDA::ExpData(data=Carseats,type=1)
## structure of the data
SmartEDA::ExpData(data=Carseats,type=2)
# iris %>% explore::data_dict_md(output_dir = here::here())
# description <- data.frame(
# variable = c("Species"),
# description = c("Species of Iris flower"))
# explore::data_dict_md(data = mydata,
# title = "Data Set",
# # description = description,
# output_file = "data_dict.md",
# output_dir = here::here("out"))
mydata <- janitor::clean_names(mydata)
# cat(names(mydata), sep = ",\n")
# names(mydata) <- c(names(mydata)[1:21], paste0("Soru", 1:30))
iris %>%
explore::clean_var(data = .,
var = Sepal.Length,
min_val = 4.5,
max_val = 7.0,
na = 5.8,
name = "sepal_length") %>%
describe()
summarytools::view(summarytools::dfSummary(mydata))
dplyr::glimpse(mydata)
library(finalfit)
# https://www.datasurg.net/2019/10/15/jama-retraction-after-miscoding-new-finalfit-function-to-check-recoding/
# intentionally miscoded
colon_s %>%
mutate(
sex.factor2 = forcats::fct_recode(sex.factor,
"F" = "Male",
"M" = "Female")
) %>%
count(sex.factor, sex.factor2)
# Install
# devtools::install_github('ewenharrison/finalfit')
library(finalfit)
library(dplyr)
# Recode example
colon_s_small = colon_s %>%
select(-id, -rx, -rx.factor) %>%
mutate(
age.factor2 = forcats::fct_collapse(age.factor,
"<60 years" = c("<40 years", "40-59 years")),
sex.factor2 = forcats::fct_recode(sex.factor,
# Intentional miscode
"F" = "Male",
"M" = "Female")
)
# Check
colon_s_small %>%
finalfit::check_recode()
out = colon_s_small %>%
select(-extent, -extent.factor,-time, -time.years) %>% # choose to exclude variables
check_recode(include_numerics = TRUE)
## Recoding mydata$cinsiyet into mydata$Cinsiyet
mydata$Cinsiyet <- recode(mydata$cinsiyet,
"K" = "Kadin",
"E" = "Erkek")
mydata$Cinsiyet <- factor(mydata$Cinsiyet)
## Recoding mydata$tumor_yerlesimi into mydata$TumorYerlesimi
mydata$TumorYerlesimi <- recode(mydata$tumor_yerlesimi,
"proksimal" = "Proksimal",
"distal" = "Distal",
"yaygın" = "Yaygin",
"gö bileşke" = "GEJ",
"antrum" = "Antrum")
mydata$TumorYerlesimi <- factor(mydata$TumorYerlesimi)
## Reordering mydata$TumorYerlesimi
mydata$TumorYerlesimi <- factor(mydata$TumorYerlesimi, levels=c("GEJ", "Proksimal", "Antrum", "Distal", "Yaygin"))
## Recoding mydata$histolojik_alt_tip into mydata$HistolojikAltTip
mydata$HistolojikAltTip <- recode(mydata$histolojik_alt_tip,
"medüller benzeri" = "meduller benzeri")
mydata$HistolojikAltTip <- factor(mydata$HistolojikAltTip)
## Recoding mydata$lauren_siniflamasi into mydata$Lauren
mydata$Lauren <- recode(mydata$lauren_siniflamasi,
"diffüz" = "diffuse",
"???" = "medullary")
mydata$Lauren <- factor(mydata$Lauren)
## Recoding mydata$histolojik_derece into mydata$Grade
mydata$Grade <- recode(mydata$histolojik_derece,
"az diferansiye" = "az",
"iyi diferansiye" = "iyi",
"orta diferansiye" = "orta")
mydata$Grade <- factor(mydata$Grade)
## Reordering mydata$Grade
mydata$Grade <- factor(mydata$Grade, levels=c("iyi", "orta", "az"))
mydata$Tstage <- stringr::str_match(mydata$patolojik_evre, paste('(.+)', "N", sep=''))[,2]
mydata$Nstage <- paste0("N",
stringr::str_match(mydata$patolojik_evre, paste( "N", '(.+)', "M", sep=''))[,2]
)
mydata$Mstage <- paste0("M",
stringr::str_match(mydata$patolojik_evre, paste("M", '(.+)', sep=''))[,2]
)
mydata <- mydata %>%
dplyr::mutate(
T_stage = dplyr::case_when(
grepl(pattern = "T1", x = .$Tstage) == TRUE ~ "T1",
grepl(pattern = "T2", x = .$Tstage) == TRUE ~ "T2",
grepl(pattern = "T3", x = .$Tstage) == TRUE ~ "T3",
grepl(pattern = "T4", x = .$Tstage) == TRUE ~ "T4",
TRUE ~ "Tx"
)
) %>%
dplyr::mutate(
N_stage = dplyr::case_when(
grepl(pattern = "N0", x = .$Nstage) == TRUE ~ "N0",
grepl(pattern = "N1", x = .$Nstage) == TRUE ~ "N1",
grepl(pattern = "N2", x = .$Nstage) == TRUE ~ "N2",
grepl(pattern = "N3", x = .$Nstage) == TRUE ~ "N3",
TRUE ~ "Nx"
)
) %>%
dplyr::mutate(
M_stage = dplyr::case_when(
grepl(pattern = "M0", x = .$Mstage) == TRUE ~ "M0",
grepl(pattern = "M1", x = .$Mstage) == TRUE ~ "M1",
TRUE ~ "Mx"
)
)
## Recoding mydata$cd44_oran into mydata$CD44
mydata$CD44 <- recode(mydata$cd44_oran,
"2" = "positive",
"0" = "negative",
"1" = "negative",
"3" = "positive")
mydata$CD44 <- factor(mydata$CD44)
## Recoding mydata$her2_skor into mydata$Her2
mydata$Her2 <- recode(mydata$her2_skor,
"+3" = "3",
"+1" = "1",
"+2" = "2")
mydata$Her2 <- factor(mydata$Her2)
## Reordering mydata$Her2
mydata$Her2 <- factor(mydata$Her2, levels=c("0", "1", "2", "3"))
## Recoding mydata$msi into mydata$MMR
mydata$MMR <- recode(mydata$msi,
"MSS" = "pMMR",
"MSİ(PMS2,MLH1)" = "dMMR(PMS2,MLH1)",
"MSİ(MSH2,MSH6)" = "dMMR(MSH2,MSH6)",
"MSİ(PMS2)" = "dMMR(PMS2)")
mydata$MMR <- factor(mydata$MMR)
## Recoding mydata$msi into mydata$MMR2
mydata$MMR2 <- recode(mydata$msi,
"MSS" = "pMMR",
"MSİ(PMS2,MLH1)" = "dMMR",
"MSİ(MSH2,MSH6)" = "dMMR",
"MSİ(PMS2)" = "dMMR")
mydata$MMR2 <- factor(mydata$MMR2)
mydata <- mydata %>%
dplyr::mutate(
TumorPDL1gr1 = dplyr::case_when(
t_pdl1 < 1 ~ "kucuk1",
t_pdl1 >= 1 ~ "buyukesit1"
)
) %>%
dplyr::mutate(
TumorPDL1gr5 = dplyr::case_when(
t_pdl1 < 5 ~ "kucuk5",
t_pdl1 >= 5 ~ "buyukesit5"
)
) %>%
dplyr::mutate(
inflPDL1gr1 = dplyr::case_when(
i_pdl1 < 1 ~ "kucuk1",
i_pdl1 >= 1 ~ "buyukesit1"
)
) %>%
dplyr::mutate(
inflPDL1gr5 = dplyr::case_when(
i_pdl1 < 5 ~ "kucuk5",
i_pdl1 >= 5 ~ "buyukesit5"
)
)
## Recoding mydata$lvi into mydata$LVI
mydata$LVI <- recode(mydata$lvi,
"var" = "Var",
"yok" = "Yok")
mydata$LVI <- factor(mydata$LVI)
## Reordering mydata$LVI
mydata$LVI <- factor(mydata$LVI, levels=c("Yok", "Var"))
## Recoding mydata$pni into mydata$PNI
mydata$PNI <- recode(mydata$pni,
"var" = "Var",
"yok" = "Yok")
mydata$PNI <- factor(mydata$PNI)
## Reordering mydata$PNI
mydata$PNI <- factor(mydata$PNI, levels=c("Yok", "Var"))
## Recoding mydata$ln into mydata$LenfNoduMetastazi
mydata$LenfNoduMetastazi <- recode(mydata$ln,
"var" = "Var",
"yok" = "Yok")
mydata$LenfNoduMetastazi <- factor(mydata$LenfNoduMetastazi)
## Reordering mydata$LenfNoduMetastazi
mydata$LenfNoduMetastazi <- factor(mydata$LenfNoduMetastazi, levels=c("Yok", "Var"))
mydata$sontarih <- janitor::excel_numeric_to_date(as.numeric(mydata$olum_tarihi))
mydata$Outcome <- "Dead"
mydata$Outcome[mydata$olum_tarihi == "yok"] <- "Alive"
# cat(names(mydata), sep = ",\n")
mydata <- mydata %>%
select(
# sira_no,
# no,
# x3,
# hasta_biyopsi_no,
# cinsiyet,
Cinsiyet,
Yas = hasta_yasi,
TumorYerlesimi,
TumorCapi = tumor_capi,
HistolojikAltTip,
Lauren,
Grade,
TNM = patolojik_evre,
Tstage,
T_stage,
Nstage,
N_stage,
Mstage,
M_stage,
CD44,
Her2,
MMR,
MMR2,
TumorPDL1gr1,
TumorPDL1gr5,
inflPDL1gr1,
inflPDL1gr5,
LVI,
PNI,
LenfNoduMetastazi,
Outcome,
# tumor_yerlesimi,
# histolojik_alt_tip,
# lauren_siniflamasi,
# histolojik_derece,
# cd44_oran,
# cd44_intense,
# her2_skor,
# msi,
# t_pdl1,
# i_pdl1,
# lvi,
# pni,
# ln,
CerrahiTarih = cerrahi_tarih,
# olum_tarihi,
genel_sagkalim,
SonTarih = sontarih
)
mydata <- janitor::clean_names(mydata)
# cat(names(mydata), sep = ",\n")
names(mydata) <- c(names(mydata)[1:21], paste0("Soru", 1:30))
library(arsenal)
tab1 <- tableby(~ katilim_durumu
,
data = mydata
)
summary(tab1)
mydata <- mydata %>%
filter(katilim_durumu == "katılmış ve tamamlamış")
# summarytools::view(summarytools::dfSummary(mydata))
# dplyr::glimpse(mydata)
# mydata %>%
# select(starts_with("Soru")) %>%
# pivot_longer(everything()) %>%
# select(value) %>%
# pull() %>%
# unique() %>%
# cat(sep = "\n")
## Recoding mydata$x3_yasiniz_nedir into mydata$YasGrup
mydata$YasGrup <- factor(mydata$x3_yasiniz_nedir)
## Reordering mydata$YasGrup
mydata$YasGrup <- factor(mydata$YasGrup, levels=c("20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89"))
## Recoding mydata$x4_cinsiyetiniz_nedir into mydata$Cinsiyet
mydata$Cinsiyet <- recode(mydata$x4_cinsiyetiniz_nedir,
"Kadın" = "Kadin")
mydata$Cinsiyet <- factor(mydata$Cinsiyet)
## Recoding mydata$x5_kac_yildir_genel_cerrahi_uzmanisiniz into mydata$UzmanlikSuresi
mydata$UzmanlikSuresi <- recode(mydata$x5_kac_yildir_genel_cerrahi_uzmanisiniz,
"43739" = "10-19")
mydata$UzmanlikSuresi <- factor(mydata$UzmanlikSuresi)
## Reordering mydata$UzmanlikSuresi
mydata$UzmanlikSuresi <- factor(mydata$UzmanlikSuresi, levels=c("0-9", "10-19", "20-29", "30-39", "40-49"))
## Recoding mydata$x6_unvaniniz_nedir into mydata$Unvan
mydata$Unvan <- factor(mydata$x6_unvaniniz_nedir)
## Reordering mydata$Unvan
mydata$Unvan <- factor(mydata$Unvan, levels=c("Op.Dr.", "Doktor Öğretim Üyesi", "Doç.Dr.", "Prof.Dr"))
## Recoding mydata$x8_hangi_kurumda_calisiyorsunuz into mydata$Kurum
mydata$Kurum <- recode(mydata$x8_hangi_kurumda_calisiyorsunuz,
"Eğitim Araştırma Hastanesi" = "Eğitim Araştırma",
"İlçe Devlet Hastanesi" = "İlçe Devlet",
"Üniversite Hastanesi" = "Üniversite",
"İl Devlet Hastanesi" = "İl Devlet",
"Özel Hastane ve Kurumlar" = "Özel")
mydata$Kurum <- factor(mydata$Kurum)
## Reordering mydata$Kurum
mydata$Kurum <- factor(mydata$Kurum, levels=c("Özel", "İlçe Devlet", "İl Devlet", "Eğitim Araştırma", "Üniversite"))
tersSorular <- c("Soru1",
"Soru4",
"Soru15",
"Soru17",
"Soru29")
CSS <- c(
"Soru3",
"Soru6",
"Soru12",
"Soru16",
"Soru18",
"Soru20",
"Soru22",
"Soru24",
"Soru27",
"Soru30"
)
BS <- c(
"Soru1",
"Soru4",
"Soru8",
"Soru10",
"Soru15",
"Soru17",
"Soru19",
"Soru21",
"Soru26",
"Soru29"
)
STSS <- c(
"Soru2",
"Soru5",
"Soru7",
"Soru9",
"Soru11",
"Soru13",
"Soru14",
"Soru23",
"Soru25",
"Soru28"
)
recode_numberize <- function(x, ...) {
dplyr::recode(
x,
"Bazı zamanlar" = 3,
"Çoksık" = 5,
"Hiçbir zaman" = 1,
"Nadiren" = 2,
"Sık sık" = 4,
"Sıkça" = 4,
"Bazı zamanlarda" = 3,
"Çok sık" = 5,
"Sıksık" = 4
)
}
mydata <- mydata %>%
mutate_at(.tbl = .,
.vars = vars(starts_with("Soru"), -tersSorular),
.funs = recode_numberize
)
recode_numberize_ters <- function(x, ...) {
recode(
x,
"Bazı zamanlar" = 3,
"Çoksık" = 1,
"Hiçbir zaman" = 5,
"Nadiren" = 4,
"Sık sık" = 2,
"Sıkça" = 2,
"Bazı zamanlarda" = 3,
"Çok sık" = 1,
"Sıksık" = 2
)
}
mydata <- mydata %>%
mutate_at(.tbl = .,
.vars = vars(tersSorular),
.funs = recode_numberize
)
mydata <- mydata %>%
# böyle yazınca missing olunca hesaplamıyor
# mutate(
# CSS_total = rowSums(select(., CSS), na.rm = FALSE)
# ) %>%
mutate(
CSS_total = rowSums(select(., CSS), na.rm = TRUE)
) %>%
mutate(
BS_total = rowSums(select(., BS), na.rm = TRUE)
) %>%
mutate(
STSS_total = rowSums(select(., STSS), na.rm = TRUE)
)
mydata <- mydata %>%
naniar::replace_with_na_at(
.vars = vars(ends_with("_total")),
condition = ~.x == 0
)
mydata <- mydata %>%
mutate_at(.tbl = .,
.vars = vars(ends_with("_total")),
.funs = list(Gr =
~ case_when(
. <= 22 ~ "Low",
. >= 23 & . <= 41 ~ "Average",
. >= 42 ~ "High",
TRUE ~ NA_character_
)
)
) %>%
mutate_at(.tbl = .,
.vars = vars(ends_with("_Gr")),
.funs = ~ factor(., levels=c("Low", "Average", "High"))
)
# ## Reordering mydata$CSS_total_Gr
# mydata$CSS_total_Gr <- factor(mydata$CSS_total_Gr, )
#
# ## Reordering mydata$BS_total_Gr
# mydata$BS_total_Gr <- factor(mydata$BS_total_Gr, levels=c("Low", "Average", "High"))
#
#
# ## Reordering mydata$STSS_total_Gr
# mydata$STSS_total_Gr <- factor(mydata$STSS_total_Gr, levels=c("Low", "Average", "High"))
visdat::vis_miss(mydata)
visdat::vis_miss(airquality,
cluster = TRUE)
visdat::vis_miss(airquality,
sort_miss = TRUE)
# https://cran.r-project.org/web/packages/dlookr/vignettes/transformation.html
income <- dlookr::imputate_na(carseats, Income, US, method = "rpart")
income
attr(income,"var_type")
attr(income,"method")
attr(income,"na_pos")
attr(income,"type")
attr(income,"message")
attr(income,"success")
attr(income,"class")
summary(income)
plot(income)
carseats %>%
mutate(Income_imp = dlookr::imputate_na(carseats, Income, US, method = "knn")) %>%
group_by(US) %>%
summarise(orig = mean(Income, na.rm = TRUE),
imputation = mean(Income_imp))
library(mice)
urban <- dlookr::imputate_na(carseats, Urban, US, method = "mice")
urban
summary(urban)
plot(urban)
price <- dlookr::imputate_outlier(carseats, Price, method = "capping")
price
summary(price)
plot(price)
carseats %>%
mutate(Price_imp = dlookr::imputate_outlier(carseats, Price, method = "capping")) %>%
group_by(US) %>%
summarise(orig = mean(Price, na.rm = TRUE),
imputation = mean(Price_imp, na.rm = TRUE))
carseats %>%
mutate(Income_minmax = dlookr::transform(carseats$Income, method = "minmax"),
Sales_minmax = dlookr::transform(carseats$Sales, method = "minmax")) %>%
select(Income_minmax, Sales_minmax) %>%
boxplot()
dlookr::find_skewness(carseats)
dlookr::find_skewness(carseats, index = FALSE)
dlookr::find_skewness(carseats, value = TRUE)
dlookr::find_skewness(carseats, value = TRUE, thres = 0.1)
Advertising_log = transform(carseats$Advertising, method = "log")
# Advertising_log <- transform(carseats$Advertising, method = "log+1")
head(Advertising_log)
summary(Advertising_log)
plot(Advertising_log)
bin <- dlookr::binning(carseats$Income)
bin <- binning(carseats$Income, nbins = 4,
labels = c("LQ1", "UQ1", "LQ3", "UQ3"))
binning(carseats$Income, nbins = 5, type = "equal")
binning(carseats$Income, nbins = 5, type = "pretty")
binning(carseats$Income, nbins = 5, type = "kmeans")
binning(carseats$Income, nbins = 5, type = "bclust")
bin
summary(bin)
plot(bin)
carseats %>%
mutate(Income_bin = dlookr::binning(carseats$Income)) %>%
group_by(ShelveLoc, Income_bin) %>%
summarise(freq = n()) %>%
arrange(desc(freq)) %>%
head(10)
bin <- dlookr::binning_by(carseats, "US", "Advertising")
bin
summary(bin)
attr(bin, "iv") # information value
attr(bin, "ivtable") # information value table
plot(bin, sub = "bins of Advertising variable")
# https://cran.r-project.org/web/packages/exploreR/vignettes/exploreR.html
(regressResults <- exploreR::masslm(iris,
"Sepal.Length",
ignore = "Species")
)
exploreR::massregplot(iris, "Sepal.Length", ignore = "Species")
(stand.Petals <- exploreR::standardize(iris,
c("Petal.Width", "Petal.Length"))
)
carseats %>%
dlookr::transformation_report(target = US)
carseats %>%
dlookr::transformation_report(target = US, output_format = "html",
output_file = "transformation.html")
inspectdf::inspect_na(starwars)
inspectdf::inspect_na(starwars) %>% inspectdf::show_plot()
inspectdf::inspect_na(star_1, star_2)
inspectdf::inspect_na(star_1, star_2) %>% inspectdf::show_plot()
mydata %>%
dplyr::select(-dplyr::contains("Date")) %>%
report::report()
# cat(names(mydata), sep = " + \n")
library(arsenal)
tab1 <- arsenal::tableby(
~ Sex +
Age +
Race +
PreinvasiveComponent +
LVI +
PNI +
Death +
Group +
Grade +
TStage +
# `Anti-X-intensity` +
# `Anti-Y-intensity` +
LymphNodeMetastasis +
Valid +
Smoker +
Grade_Level
,
data = mydata
)
summary(tab1)
library(tableone)
mydata %>%
dplyr::select(-keycolumns,
-dateVariables
) %>%
tableone::CreateTableOne(data = .)
# CreateTableOne(vars = myVars, data = mydata, factorVars = characterVariables)
# tab <- CreateTableOne(vars = myVars, data = pbc, factorVars = catVars)
# print(tab, showAllLevels = TRUE)
# ?print.TableOne
# summary(tab)
# print(tab, nonnormal = biomarkers)
# print(tab, nonnormal = biomarkers, exact = "stage", quote = TRUE, noSpaces = TRUE)
# tab3Mat <- print(tab3, nonnormal = biomarkers, exact = "stage", quote = FALSE, noSpaces = TRUE, printToggle = FALSE)
# write.csv(tab3Mat, file = "myTable.csv")
mydata %>%
dplyr::select(
continiousVariables,
numericVariables,
integerVariables
) %>%
summarytools::descr(., style = 'rmarkdown')
print(summarytools::descr(mydata), method = 'render', table.classes = 'st-small')
mydata %>%
summarytools::descr(.,
stats = "common",
transpose = TRUE,
headings = FALSE
)
mydata %>%
summarytools::descr(stats = "common") %>%
summarytools::tb()
mydata$Sex %>%
summarytools::freq(cumul = FALSE, report.nas = FALSE) %>%
summarytools::tb()
mydata %>%
explore::describe() %>%
dplyr::filter(unique < 5)
mydata %>%
explore::describe() %>%
dplyr::filter(na > 0)
mydata %>% explore::describe()
source(here::here("R", "gc_desc_cat.R"))
tab <-
mydata %>%
dplyr::select(
-keycolumns
) %>%
tableone::CreateTableOne(data = .)
?print.CatTable
tab$CatTable
race_stats <- summarytools::freq(mydata$Race)
print(race_stats,
report.nas = FALSE,
totals = FALSE,
display.type = FALSE,
Variable.label = "Race Group"
)
mydata %>% explore::describe(PreinvasiveComponent)
## Frequency or custom tables for categorical variables
SmartEDA::ExpCTable(
mydata,
Target = NULL,
margin = 1,
clim = 10,
nlim = 5,
round = 2,
bin = NULL,
per = T
)
inspectdf::inspect_cat(mydata)
inspectdf::inspect_cat(mydata)$levels$Group
library(summarytools)
grouped_freqs <- stby(data = mydata$Smoker,
INDICES = mydata$Sex,
FUN = freq, cumul = FALSE, report.nas = FALSE)
grouped_freqs %>% tb(order = 2)
summarytools::stby(
list(x = mydata$LVI, y = mydata$LymphNodeMetastasis),
mydata$PNI,
summarytools::ctable
)
with(mydata,
summarytools::stby(
list(x = LVI, y = LymphNodeMetastasis), PNI,
summarytools::ctable
)
)
SmartEDA::ExpCTable(
mydata,
Target = "Sex",
margin = 1,
clim = 10,
nlim = NULL,
round = 2,
bin = 4,
per = F
)
mydata %>%
dplyr::select(characterVariables) %>%
dplyr::select(PreinvasiveComponent,
PNI,
LVI
) %>%
reactable::reactable(data = ., groupBy = c("PreinvasiveComponent", "PNI"), columns = list(
LVI = reactable::colDef(aggregate = "count")
))
questionr:::icut()
source(here::here("R", "gc_desc_cont.R"))
tab <- tableone::CreateTableOne(data = mydata)
# ?print.ContTable
tab$ContTable
print(tab$ContTable, nonnormal = c("Anti-X-intensity"))
mydata %>% explore::describe(Age)
mydata %>%
dplyr::select(continiousVariables) %>%
SmartEDA::ExpNumStat(
data = .,
by = "A",
gp = NULL,
Qnt = seq(0, 1, 0.1),
MesofShape = 2,
Outlier = TRUE,
round = 2
)
inspectdf::inspect_num(mydata, breaks = 10)
inspectdf::inspect_num(mydata)$hist$Age
inspectdf::inspect_num(mydata, breaks = 10) %>%
inspectdf::show_plot()
grouped_descr <- summarytools::stby(data = mydata,
INDICES = mydata$Sex,
FUN = summarytools::descr, stats = "common")
# grouped_descr %>% summarytools::tb(order = 2)
grouped_descr %>% summarytools::tb()
mydata %>%
group_by(US) %>%
dlookr::describe(Sales, Income)
carseats %>%
group_by(US, Urban) %>%
dlookr::describe(Sales, Income)
categ <- dlookr::target_by(carseats, US)
cat_num <- dlookr::relate(categ, Sales)
cat_num
summary(cat_num)
plot(cat_num)
summarytools::stby(data = mydata,
INDICES = mydata$PreinvasiveComponent,
FUN = summarytools::descr,
stats = c("mean", "sd", "min", "med", "max"),
transpose = TRUE)
with(mydata,
summarytools::stby(Age, PreinvasiveComponent, summarytools::descr),
stats = c("mean", "sd", "min", "med", "max"),
transpose = TRUE
)
mydata %>%
group_by(PreinvasiveComponent) %>%
summarytools::descr(stats = "fivenum")
## Summary statistics by – category
SmartEDA::ExpNumStat(
mydata,
by = "GA",
gp = "PreinvasiveComponent",
Qnt = seq(0, 1, 0.1),
MesofShape = 2,
Outlier = TRUE,
round = 2
)
mydata %>%
janitor::tabyl(Sex) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(Race) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(PreinvasiveComponent) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(LVI) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(PNI) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(Group) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(Grade) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(TStage) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(LymphNodeMetastasis) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(Grade_Level) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
janitor::tabyl(DeathTime) %>%
janitor::adorn_pct_formatting(rounding = 'half up', digits = 1) %>%
knitr::kable()
mydata %>%
jmv::descriptives(
data = .,
vars = 'Age',
hist = TRUE,
dens = TRUE,
box = TRUE,
violin = TRUE,
dot = TRUE,
mode = TRUE,
sd = TRUE,
variance = TRUE,
skew = TRUE,
kurt = TRUE,
quart = TRUE)
mydata %>%
jmv::descriptives(
data = .,
vars = 'AntiX_intensity',
hist = TRUE,
dens = TRUE,
box = TRUE,
violin = TRUE,
dot = TRUE,
mode = TRUE,
sd = TRUE,
variance = TRUE,
skew = TRUE,
kurt = TRUE,
quart = TRUE)
mydata %>%
jmv::descriptives(
data = .,
vars = 'AntiY_intensity',
hist = TRUE,
dens = TRUE,
box = TRUE,
violin = TRUE,
dot = TRUE,
mode = TRUE,
sd = TRUE,
variance = TRUE,
skew = TRUE,
kurt = TRUE,
quart = TRUE)
library(finalfit)
# dependent <- c("dependent1",
# "dependent2"
# )
# explanatory <- c("explanatory1",
# "explanatory2"
# )
dependent <- "PreinvasiveComponent"
explanatory <- c("Sex", "Age", "Grade", "TStage")
source(here::here("R", "gc_table_cross.R"))
CreateTableOne(vars = myVars, strata = "columnname", data = pbc, factorVars = catVars)
print(tab, nonnormal = biomarkers, exact = "exactVariable", smd = TRUE)
write2html(
knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.keep.rmd.html"),
quiet = TRUE, # passed to rmarkdown::render
keep.rmd = TRUE
)
ctable(tobacco$gender, tobacco$smoker, style = 'rmarkdown')
print(ctable(tobacco$gender, tobacco$smoker), method = 'render')
print(ctable(tobacco$smoker, tobacco$diseased, prop = "r"), method = "render")
with(tobacco,
print(ctable(smoker, diseased, prop = 'n', totals = FALSE, chisq = TRUE),
headings = FALSE, method = "render"))
# devtools::install_github("ewenharrison/summarizer")
# library(summarizer)
# data(colon_s)
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
colon_s %>%
summary.factorlist(dependent, explanatory, p=TRUE) %>%
knitr::kable(row.names=FALSE, align=c("l", "l", "r", "r", "r"))
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
summary.factorlist(dependent, explanatory) %>%
knitr::kable(row.names=FALSE, align=c("l", "l", "r", "r", "r"))
library("rmngb")
# rmngb::pairwise.chisq.test(mydata$StageGr2, mydata$Ki67Gr)
rmngb::pairwise.fisher.test(mydata$StageGr2, mydata$Ki67Gr)
# rmngb::pairwise.chisq.test(mydata$LiverDistantMets, mydata$Ki67Gr, p.adj = "BH")
rmngb::pairwise.fisher.test(mydata$LiverDistantMets, mydata$Ki67Gr, p.adj = "BH")
# rmngb::pairwise.chisq.test(mydata$PNI, mydata$Ki67Gr, p.adj = "BH")
rmngb::pairwise.fisher.test(mydata$PNI, mydata$Ki67Gr, p.adj = "BH")
# rmngb::pairwise.chisq.test(mydata$LVI, mydata$Ki67Gr, p.adj = "BH")
rmngb::pairwise.fisher.test(mydata$LVI, mydata$Ki67Gr, p.adj = "BH")
MBStudy <-
tibble::tribble(
~Grup, ~Diagnosis, ~Number,
"\"Grup1\"", "\"Diseased\"", 1383L,
"\"Grup2A\"", "\"Diseased\"", 58L,
"\"Grup2B\"", "\"Diseased\"", 349L,
"\"Grup3\"", "\"Diseased\"", 5217L,
"\"Grup1\"", "\"Stromal Diseased\"", 13L,
"\"Grup2A\"", "\"Stromal Diseased\"", 2L,
"\"Grup2B\"", "\"Stromal Diseased\"", 47L,
"\"Grup3\"", "\"Stromal Diseased\"", 476L,
"\"Grup1\"", "\"Inflammation fibrosis\"", 56L,
"\"Grup2A\"", "\"Inflammation fibrosis\"", 52L,
"\"Grup2B\"", "\"Inflammation fibrosis\"", 267L,
"\"Grup3\"", "\"Inflammation fibrosis\"", 1387L
)
MBStudy <-
tibble::tribble(
~Grup, ~Diagnosis, ~Number,
"\"Grup1\"", "\"Diseased\"", 1383L,
"\"Grup2A\"", "\"Diseased\"", 58L,
"\"Grup2B\"", "\"Diseased\"", 349L,
"\"Grup3\"", "\"Diseased\"", 5217L,
"\"Grup1\"", "\"Stromal Diseased\"", 13L,
"\"Grup2A\"", "\"Stromal Diseased\"", 2L,
"\"Grup2B\"", "\"Stromal Diseased\"", 47L,
"\"Grup3\"", "\"Stromal Diseased\"", 476L,
"\"Grup1\"", "\"Inflammation fibrosis\"", 56L,
"\"Grup2A\"", "\"Inflammation fibrosis\"", 52L,
"\"Grup2B\"", "\"Inflammation fibrosis\"", 267L,
"\"Grup3\"", "\"Inflammation fibrosis\"", 1387L
)
MBStudy <-
data.frame(
stringsAsFactors = FALSE,
V1 = c("\"Grup1\"","\"Grup2A\"",
"\"Grup2B\"","\"Grup3\"","\"Grup1\"","\"Grup2A\"",
"\"Grup2B\"","\"Grup3\"","\"Grup1\"","\"Grup2A\"",
"\"Grup2B\"","\"Grup3\""),
V2 = c("\"Diseased\"",
"\"Diseased\"","\"Diseased\"","\"Diseased\"",
"\"Stromal Diseased\"","\"Stromal Diseased\"",
"\"Stromal Diseased\"",
"\"Stromal Diseased\"","\"Inflammation fibrosis\"",
"\"Inflammation fibrosis\"","\"Inflammation fibrosis\"",
"\"Inflammation fibrosis\""),
V3 = c(1383L,58L,349L,5217L,13L,
2L,47L,476L,56L,52L,267L,1387L)
)
MBStudy <- matrix(c(
1383L, 13L, 56L,
58L, 2L, 52L,
349L, 47L, 267L,
5217L, 476L, 1387L
), byrow = TRUE, nrow = 4, dimnames = list(c("Grup1", "Grup2A", "Grup2B", "Grup3"), c("Diseased", "Stromal Diseased", "Inflammation")))
RVAideMemoire::chisq.multcomp(MBStudy)
MBStudy
MB_table <- RVAideMemoire::fisher.multcomp(tab.cont = MBStudy)
MB_table$p.value %>%
as.data.frame() %>%
tibble::rownames_to_column(var = "Grup") %>%
gt::gt(.) %>%
gt::fmt_number(., columns = dplyr::contains("Diseased"), decimals = 4)
rmngb::pairwise.fisher.test.table(MBStudy)
MBStudy2 <- matrix(c(
13L, 53L,
9L, 5L,
3L, 26L),
byrow = TRUE,
nrow = 3,
dimnames = list(
c("Diseased", "Inflammation", "Fibrosis"),
c("sw", "cds")
))
MBStudy2
MBStudy2_analysis <- RVAideMemoire::fisher.multcomp(tab.cont = t(MBStudy2))
MBStudy2_analysis$p.value
mydata %>%
summary_factorlist(dependent = 'PreinvasiveComponent',
explanatory = explanatory,
# column = TRUE,
total_col = TRUE,
p = TRUE,
add_dependent_label = TRUE,
na_include=FALSE
# catTest = catTestfisher
) -> table
knitr::kable(table, row.names = FALSE, align = c('l', 'l', 'r', 'r', 'r'))
table1 <- arsenal::tableby(PreinvasiveComponent ~ explanatory, mydata)
summary(table1)
knitr::kable(table1,
row.names = FALSE,
align = c('l', 'l', 'r', 'r', 'r', 'r'),
format = 'html') %>%
kableExtra::kable_styling(kable_input = .,
bootstrap_options = 'striped',
full_width = F,
position = 'left')
tangram::tangram(PreinvasiveComponent ~ explanatory, mydata)
tangram::html5(tangram::tangram(PreinvasiveComponent ~ explanatory, mydata),
fragment = TRUE,
inline = 'nejm.css',
caption = 'Cross TablePreinvasiveComponentNEJM Style',
id = 'tbl3')
tangram::html5(tangram::tangram(PreinvasiveComponent ~ explanatory, mydata),
fragment = TRUE,
inline = 'lancet.css',
caption = 'Cross TablePreinvasiveComponentLancet Style',
id = 'tbl3')
dependent <- c("dependent1",
"dependent2"
)
explanatory <- c("explanatory1",
"explanatory2"
)
mydataCategorical <- mydata %>%
select(-var1,
-var2
)
mydataCategorical_variable <- explanatory[1]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[2]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[3]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[4]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[5]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[6]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[7]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[8]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[9]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[10]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[11]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[12]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[13]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[14]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[15]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
mydataCategorical_variable <- NA
dependent2 <- NA
mydataCategorical_variable <- explanatory[16]
dependent2 <- dependent[!dependent %in% mydataCategorical_variable]
source(here::here("R", "gc_plot_cat.R"))
## column chart
SmartEDA::ExpCatViz(
Carseats,
target = "Urban",
fname = NULL,
clim = 10,
col = NULL,
margin = 2,
Page = c(2, 1),
sample = 2
)
## Stacked bar graph
SmartEDA::ExpCatViz(
Carseats,
target = "Urban",
fname = NULL,
clim = 10,
col = NULL,
margin = 2,
Page = c(2, 1),
sample = 2
)
## Variable importance graph using information values
SmartEDA::ExpCatStat(
Carseats,
Target = "Urban",
result = "Stat",
Pclass = "Yes",
plot = TRUE,
top = 20,
Round = 2
)
inspectdf::inspect_cat(starwars) %>% inspectdf::show_plot()
inspectdf::inspect_cat(starwars) %>%
inspectdf::show_plot(high_cardinality = 1)
inspectdf::inspect_cat(star_1, star_2) %>% inspectdf::show_plot()
# mydataContinious
mydata %>%
select(institution, starts_with("Slide")) %>%
pivot_longer(cols = starts_with("Slide")) %>%
ggplot(., aes(name, value)) -> p
p + geom_jitter()
p + geom_jitter(aes(colour = institution))
dxchanges <- mydata %>%
select(bx_no, starts_with("Slide")) %>%
filter(complete.cases(.)) %>%
group_by(Slide1_infiltrative, Slide2_Medium, Slide3_Demarcated) %>%
tally()
library(ggalluvial)
ggplot(data = dxchanges,
aes(axis1 = Slide1_infiltrative, axis2 = Slide2_Medium, axis3 = Slide3_Demarcated,
y = n)) +
scale_x_discrete(limits = c("Slide1", "Slide2", "Slide3"),
expand = c(.1, .05)
) +
xlab("Slide") +
geom_alluvium(aes(fill = Slide1_infiltrative,
colour = Slide1_infiltrative
)) +
geom_stratum() +
geom_text(stat = "stratum", label.strata = TRUE) +
theme_minimal() +
ggtitle("PanNET")
## Generate Boxplot by category
SmartEDA::ExpNumViz(
mtcars,
target = "gear",
type = 2,
nlim = 25,
fname = file.path(here::here(), "Mtcars2"),
Page = c(2, 2)
)
## Generate Density plot
SmartEDA::ExpNumViz(
mtcars,
target = NULL,
type = 3,
nlim = 25,
fname = file.path(here::here(), "Mtcars3"),
Page = c(2, 2)
)
## Generate Scatter plot
SmartEDA::ExpNumViz(
mtcars,
target = "carb",
type = 3,
nlim = 25,
fname = file.path(here::here(), "Mtcars4"),
Page = c(2, 2)
)
SmartEDA::ExpNumViz(mtcars, target = "am", scatter = TRUE)
library(ggplot2)
library(plotly)
library(gapminder)
p <- gapminder %>%
filter(year==1977) %>%
ggplot( aes(gdpPercap, lifeExp, size = pop, color=continent)) +
geom_point() +
scale_x_log10() +
theme_bw()
ggplotly(p)
scales::show_col(colours(), cex_label = .35)
gistr::gist("https://gist.github.com/sbalci/834ebc154c0ffcb7d5899c42dd3ab75e") %>%
gistr::embed() -> embedgist
# https://stackoverflow.com/questions/43053375/weighted-sankey-alluvial-diagram-for-visualizing-discrete-and-continuous-panel/48133004
library(tidyr)
library(dplyr)
library(alluvial)
library(ggplot2)
library(forcats)
set.seed(42)
individual <- rep(LETTERS[1:10],each=2)
timeperiod <- paste0("time_",rep(1:2,10))
discretechoice <- factor(paste0("choice_",sample(letters[1:3],20, replace=T)))
continuouschoice <- ceiling(runif(20, 0, 100))
d <- data.frame(individual, timeperiod, discretechoice, continuouschoice)
# stacked bar diagram of discrete choice by individual
g <- ggplot(data=d,aes(timeperiod,fill=fct_rev(discretechoice)))
g + geom_bar(position="stack") + guides(fill=guide_legend(title=NULL))
# alluvial diagram of discrete choice by individual
d_alluvial <- d %>%
select(individual,timeperiod,discretechoice) %>%
spread(timeperiod,discretechoice) %>%
group_by(time_1,time_2) %>%
summarize(count=n()) %>%
ungroup()
alluvial(select(d_alluvial,-count),freq=d_alluvial$count)
# stacked bar diagram of discrete choice, weighting by continuous choice
g + geom_bar(position="stack",aes(weight=continuouschoice))
library(ggalluvial)
ggplot(
data = d,
aes(
x = timeperiod,
stratum = discretechoice,
alluvium = individual,
y = continuouschoice
)
) +
geom_stratum(aes(fill = discretechoice)) +
geom_flow()
# use of strata and labels
ggplot(as.data.frame(Titanic),
aes(y = Freq,axis1 = Class, axis2 = Sex, axis3 = Age)) +
geom_flow() +
scale_x_discrete(limits = c("Class", "Sex", "Age")) +
geom_stratum() +
geom_text(stat = "stratum", infer.label = TRUE) +
ggtitle("Alluvial plot of Titanic passenger demographic data")
# use of facets
ggplot(as.data.frame(Titanic),aes(y = Freq,axis1 = Class, axis2 = Sex)) +geom_flow(aes(fill = Age), width = .4) +geom_stratum(width = .4) +geom_text(stat = "stratum", infer.label = TRUE, size = 3) +scale_x_discrete(limits = c("Class", "Sex")) +facet_wrap(~ Survived, scales = "fixed")
# time series alluvia of WorldPhones
wph <- as.data.frame(as.table(WorldPhones))
names(wph) <- c("Year", "Region", "Telephones")
ggplot(wph,aes(x = Year, alluvium = Region, y = Telephones)) +geom_flow(aes(fill = Region, colour = Region), width = 0)
# rightward flow aesthetics for vaccine survey datad
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
aes(x = survey,
stratum = response,
alluvium = subject,
y = freq,
fill = response
label = round(a, 3)
)
) +
geom_lode() +
geom_flow() +
geom_stratum(alpha = 0) +
geom_text(stat = "stratum")
CD44changes <- mydata %>%
dplyr::select(TumorCD44, TomurcukCD44, PeritumoralTomurcukGr4) %>%
dplyr::filter(complete.cases(.)) %>%
dplyr::group_by(TumorCD44, TomurcukCD44, PeritumoralTomurcukGr4) %>%
dplyr::tally()
library(ggalluvial)
ggplot(data = CD44changes,
aes(axis1 = TumorCD44, axis2 = TomurcukCD44,
y = n)) +
scale_x_discrete(limits = c("TumorCD44", "TomurcukCD44"),
expand = c(.1, .05)
) +
xlab("Tumor Tomurcuk") +
geom_alluvium(aes(fill = PeritumoralTomurcukGr4,
colour = PeritumoralTomurcukGr4 )) +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", infer.label = TRUE) +
# geom_text(stat = 'alluvium', infer.label = TRUE) +
theme_minimal() +
ggtitle("Changes in CD44")
library(arsenal)
dat <- data.frame(
tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)),
id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6),
Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"),
Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")),
Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA),
Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")),
Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE),
Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4),
stringsAsFactors = FALSE
)
p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE)
summary(p)
dlookr::normality(carseats)
dlookr::normality(carseats, Sales, CompPrice, Income)
dlookr::normality(carseats, Sales:Income)
dlookr::normality(carseats, -(Sales:Income))
carseats %>%
dlookr::normality() %>%
dplyr::filter(p_value <= 0.01) %>%
arrange(abs(p_value))
carseats %>%
group_by(ShelveLoc, US) %>%
dlookr::normality(Income) %>%
arrange(desc(p_value))
carseats %>%
mutate(log_income = log(Income)) %>%
group_by(ShelveLoc, US) %>%
dlookr::normality(log_income) %>%
dplyr::filter(p_value > 0.01)
dlookr::plot_normality(carseats, Sales, CompPrice)
carseats %>%
dplyr::filter(ShelveLoc == "Good") %>%
group_by(US) %>%
dlookr::plot_normality(Income)
mytable <- jmv::ttestIS(
formula = HindexCTLA4 ~ PeritumoralTomurcukGr4,
data = mydata,
vars = HindexCTLA4,
students = FALSE,
mann = TRUE,
norm = TRUE,
meanDiff = TRUE,
desc = TRUE,
plots = TRUE)
cat("<pre class='jamovitable'>")
print(jtable(mytable$ttest))
cat("</pre>")
categ <- dlookr::target_by(carseats, US)
cat_cat <- dlookr::relate(categ, ShelveLoc)
cat_cat
summary(cat_cat)
plot(cat_cat)
## Summary statistics of categorical variables
SmartEDA::ExpCatStat(
Carseats,
Target = "Urban",
result = "Stat",
clim = 10,
nlim = 5,
Pclass = "Yes"
)
inspectdf::inspect_cat(star_1, star_2)
num <- dlookr::target_by(carseats, Sales)
num_num <- dlookr::relate(num, Price)
num_num
summary(num_num)
plot(num_num)
plot(num_num, hex_thres = 350)
## Inforamtion value and Odds value
SmartEDA::ExpCatStat(
Carseats,
Target = "Urban",
result = "IV",
clim = 10,
nlim = 5,
Pclass = "Yes"
)
# library(OptimalCutpoints)
# https://tidymodels.github.io/yardstick/reference/roc_curve.html
roc_fit <- yardstick::roc_curve(mydata,
truth = "classification",
estimate = "test",
na_rm = TRUE,
options = list(
smooth = FALSE,
print.auc = TRUE,
ret = c("all_coords")
)
)
ggplot2::autoplot(roc_fit)
library(pROC)
m1 <- pROC::roc(mydata,
"classification",
"test",
auc = TRUE,
ci = TRUE,
# plot = TRUE,
# percent=TRUE,
na.rm=TRUE,
# smooth = TRUE,
ret = "all_coords",
# ret = "roc",
quiet = FALSE,
legacy.axes = TRUE,
print.auc = TRUE,
# xlab = "False Positive",
# ylab = "True Positive"
)
m1
pROC::roc(mydata,
"polyp_rec",
"size",
auc = TRUE,
ci = TRUE,
# plot = TRUE,
# percent=TRUE,
na.rm=TRUE,
# smooth = TRUE,
# ret = "all_coords",
ret = "roc",
quiet = FALSE,
legacy.axes = TRUE,
print.auc = TRUE,
# xlab = "False Positive",
# ylab = "True Positive"
)
which.max(m1$youden)
m1[which.max(m1$youden),]
roc_obj <- pROC::roc(polyp_rec ~ size,
data = mydata,
auc = TRUE,
ci = TRUE,
plot = TRUE,
# percent=TRUE,
na.rm=TRUE,
# smooth = TRUE,
# ret = "all_coords",
ret = "roc",
quiet = FALSE,
legacy.axes = TRUE,
print.auc = TRUE,
xlab = "False Positive",
ylab = "True Positive"
)
# devtools::install_github("sachsmc/plotROC")
library(plotROC)
# shiny_plotROC()
iris %>% explore::explain_tree(target = Species)
iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
iris %>% select(-Species) %>% explain_tree(target = is_versicolor)
iris %>% explain_tree(target = Sepal.Length)
explore::explore(mydata)
mydata$int <- lubridate::interval(
lubridate::ymd(mydata$SurgeryDate),
lubridate::ymd(mydata$LastFollowUpDate)
)
mydata$OverallTime <- lubridate::time_length(mydata$int, "month")
mydata$OverallTime <- round(mydata$OverallTime, digits = 1)
mydata$OverallTime <- mydata$genel_sagkalim
## Recoding mydata$Death into mydata$Outcome
mydata$Outcome <- forcats::fct_recode(as.character(mydata$Death),
"1" = "TRUE",
"0" = "FALSE")
mydata$Outcome <- as.numeric(as.character(mydata$Outcome))
table(mydata$Death, mydata$Outcome)
library(survival)
# data(lung)
# km <- with(lung, Surv(time, status))
km <- with(mydata, Surv(OverallTime, Outcome))
head(km,80)
plot(km)
# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html
dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "LVI"
mydata %>%
finalfit::surv_plot(.data = .,
dependent = dependentKM,
explanatory = explanatoryKM,
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
# legend.labs = c('a','b')
)
# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html
mydata %>%
finalfit::surv_plot(.data = .,
dependent = "Surv(OverallTime, Outcome)",
explanatory = "LVI",
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
# legend.labs = c('a','b')
)
library(finalfit)
library(survival)
explanatoryUni <- "LVI"
dependentUni <- "Surv(OverallTime, Outcome)"
mydata %>%
finalfit::finalfit(dependentUni, explanatoryUni) -> tUni
knitr::kable(tUni[, 1:4], row.names=FALSE, align=c('l', 'l', 'r', 'r', 'r', 'r'))
tUni_df <- tibble::as_tibble(tUni, .name_repair = "minimal") %>%
janitor::clean_names()
tUni_df_descr <- paste0("When ",
tUni_df$dependent_surv_overall_time_outcome[1],
" is ",
tUni_df$x[2],
", there is ",
tUni_df$hr_univariable[2],
" times risk than ",
"when ",
tUni_df$dependent_surv_overall_time_outcome[1],
" is ",
tUni_df$x[1],
"."
)
km_fit <- survfit(Surv(OverallTime, Outcome) ~ LVI, data = mydata)
km_fit
plot(km_fit)
# summary(km_fit)
km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>%
janitor::clean_names() %>%
tibble::rownames_to_column()
km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>%
tibble::rownames_to_column()
names(km_fit_median_df) <- paste0("m", 1:dim(km_fit_median_df)[2])
km_fit_median_definition <-
km_fit_median_df %>%
dplyr::mutate(
description =
glue::glue(
"When {m1}, median survival is {m8} [{m9} - {m10}, 95% CI] months."
)
) %>%
dplyr::select(description) %>%
dplyr::pull()
sTable <- summary(km_fit)$table
st <- data.frame()
for (i in seq_len(nrow(km_fit))) {
if (nrow(km_fit) == 1)
g <- sTable
else
g <- sTable[i,]
nevents <- sum(g['events'])
n <- g['n.max']
ncensor <- n - nevents
median <- g['median']
mean <- g['*rmean']
prop <- nevents / n
print(rowNo=i, list(
censored=ncensor,
events=nevents,
n=n,
prop=nevents/n,
median=median,
mean=mean))
}
st$setStatus('complete')
results1 <- st
km_fit
broom::tidy(km_fit)
km_fit_median_df %>%
dplyr::mutate(
description =
glue::glue(
"When {rowname}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months."
)
) %>%
dplyr::select(description) %>%
dplyr::pull() -> km_fit_median_definition
summary(km_fit, times = c(12,36,60))
km_fit_summary <- summary(km_fit, times = c(12,36,60))
km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", "surv", "std.err", "lower", "upper")])
km_fit_df %>%
dplyr::mutate(
description =
glue::glue(
"When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI]."
)
) %>%
dplyr::select(description) %>%
dplyr::pull() -> km_fit_definition
library(survival)
surv_fit <- survival::survfit(Surv(time, status) ~ ph.ecog, data=lung)
insight::is_model_supported(surv_fit)
insight::find_formula(surv_fit)
report::report_participants(mydata)
dependentKM <- "Surv(OverallTime, Outcome2)"
explanatoryKM <- c("explanatory1",
"explanatory2"
)
source(here::here("R", "gc_survival.R"))
mydependent <- "Surv(time, status)"
explanatory <- "Organ"
mysurvival <- function(mydata, mydependent, explanatory) {
{{mydata}} %>%
finalfit::surv_plot(dependent = {{mydependent}},
explanatory = {{explanatory}},
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
)
}
# library(tidyverse)
mysurvival(mydata = whippleables, mydependent = mydependent, explanatory = explanatory)
explanatory <- c("Organ", "LVI")
deneme <- purrr::map(explanatory, mysurvival, mydata = whippleables, mydependent = mydependent)
dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "TStage"
mydata %>%
finalfit::surv_plot(.data = .,
dependent = dependentKM,
explanatory = explanatoryKM,
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
# legend.labs = c('a','b')
)
survminer::pairwise_survdiff(
formula = Surv(OverallTime, Outcome) ~ TStage,
data = mydata,
p.adjust.method = "BH"
)
km_fit
print(km_fit,
scale=1,
digits = max(options()$digits - 4,3),
print.rmean=getOption("survfit.print.rmean"),
rmean = getOption('survfit.rmean'),
print.median=getOption("survfit.print.median"),
median = getOption('survfit.median')
)
library(finalfit)
library(survival)
explanatoryMultivariate <- explanatoryKM
dependentMultivariate <- dependentKM
mydata %>%
finalfit(dependentMultivariate, explanatoryMultivariate, metrics=TRUE) -> tMultivariate
knitr::kable(tMultivariate, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
# https://tidymodels.github.io/parsnip/reference/surv_reg.html
library(parsnip)
surv_reg()
#> Parametric Survival Regression Model Specification (regression)
#> # Parameters can be represented by a placeholder:
surv_reg(dist = varying())
#> Parametric Survival Regression Model Specification (regression)
#>
#> Main Arguments:
#> dist = varying()
#>
model <- surv_reg(dist = "weibull")
model
#> Parametric Survival Regression Model Specification (regression)
#>
#> Main Arguments:
#> dist = weibull
#> update(model, dist = "lnorm")#> Parametric Survival Regression Model Specification (regression)
#>
#> Main Arguments:
#> dist = lnorm
#>
# From randomForest
rf_1 <- randomForest(x, y, mtry = 12, ntree = 2000, importance = TRUE)
# From ranger
rf_2 <- ranger(
y ~ .,
data = dat,
mtry = 12,
num.trees = 2000,
importance = 'impurity'
)
# From sparklyr
rf_3 <- ml_random_forest(
dat,
intercept = FALSE,
response = "y",
features = names(dat)[names(dat) != "y"],
col.sample.rate = 12,
num.trees = 2000
)
rand_forest(mtry = 12, trees = 2000) %>%
set_engine("ranger", importance = 'impurity') %>%
fit(y ~ ., data = dat)
rand_forest(mtry = 12, trees = 2000) %>%
set_engine("spark") %>%
fit(y ~ ., data = dat)
mb_followup$OverallTime <- mb_followup$months
mb_followup$Outcome <- mb_followup$`rec(1,0)`
mb_followup$Operation <- mb_followup$`op type (1,2,3)`
## Recoding mb_followup$Operation
mb_followup$Operation <- as.character(mb_followup$Operation)
mb_followup$Operation <- forcats::fct_recode(mb_followup$Operation,
"Type3" = "3",
"Type2" = "2",
"Type1" = "1")
## Reordering mb_followup$Operation
mb_followup$Operation <- factor(mb_followup$Operation, levels=c("Type3", "Type2", "Type1"))
library(magrittr)
mb_followup %$% table(Operation, `op type (1,2,3)`)
library(survival)
library(survminer)
library(finalfit)
mb_followup %>%
finalfit::surv_plot('Surv(OverallTime, Outcome)', 'Operation',
xlab='Time (months)', pval=TRUE, legend = 'none',
# pval.coord
break.time.by = 12, xlim = c(0,60), ylim = c(0.8, 1)
# legend.labs = c('a','b')
)
explanatoryUni <- 'Operation'
dependentUni <- 'Surv(OverallTime, Outcome)'
mb_followup %>%
finalfit(dependentUni, explanatoryUni) -> tUni
knitr::kable(tUni[, 1:4], row.names=FALSE, align=c('l', 'l', 'r', 'r', 'r', 'r'))
tUni_df <- tibble::as_tibble(tUni, .name_repair = 'minimal') %>%
janitor::clean_names(dat = ., case = 'snake')
n_level <- dim(tUni_df)[1]
tUni_df_descr <- function(n) {
paste0(
'When ',
tUni_df$dependent_surv_overall_time_outcome[1],
' is ',
tUni_df$x[n + 1],
', there is ',
tUni_df$hr_univariable[n + 1],
' times risk than ',
'when ',
tUni_df$dependent_surv_overall_time_outcome[1],
' is ',
tUni_df$x[1],
'.'
)
}
results5 <- purrr::map(.x = c(2:n_level-1), .f = tUni_df_descr)
print(unlist(results5))
km_fit <- survfit(Surv(OverallTime, Outcome) ~ Operation, data = mb_followup)
# km_fit
# summary(km_fit)
km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>%
janitor::clean_names(dat = ., case = 'snake') %>%
tibble::rownames_to_column(.data = ., var = 'Derece')
km_fit_median_df
# km_fit_median_df %>%
# knitr::kable(format = "latex") %>%
# kableExtra::kable_styling(latex_options="scale_down")
km_fit_median_df %>%
dplyr::mutate(
description =
glue::glue(
'When, Derece, {Derece}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.'
)
) %>%
dplyr::mutate(
description = gsub(pattern = 'thefactor=', replacement = ' is ', x = description)
) %>%
dplyr::select(description) %>%
dplyr::pull() -> km_fit_median_definition
# km_fit_median_definition
summary(km_fit, times = c(12,36,60))
km_fit_summary <- summary(km_fit, times = c(12,36,60))
km_fit_df <- as.data.frame(km_fit_summary[c('strata', 'time', 'n.risk', 'n.event', 'surv', 'std.err', 'lower', 'upper')])
km_fit_df %>%
knitr::kable(format = "latex") %>%
kableExtra::kable_styling(latex_options="scale_down")
km_fit_df %>%
dplyr::mutate(
description =
glue::glue(
'When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].'
)
) %>%
dplyr::select(description) %>%
dplyr::pull() -> km_fit_definition
km_fit_definition
survminer::pairwise_survdiff(
formula = Surv(OverallTime, Outcome) ~ Operation,
data = mb_followup,
p.adjust.method = 'BH'
)
library(gt)
library(gtsummary)
library(survival)
fit1 <- survfit(Surv(ttdeath, death) ~ trt, trial)
tbl_strata_ex1 <-
tbl_survival(
fit1,
times = c(12, 24),
label = "{time} Months"
)
fit2 <- survfit(Surv(ttdeath, death) ~ 1, trial)
tbl_nostrata_ex2 <-
tbl_survival(
fit2,
probs = c(0.1, 0.2, 0.5),
header_estimate = "**Months**"
)
library(survival)
library(survminer)
library(finalfit)
mydata %>%
finalfit::surv_plot('Surv(OverallTime, Outcome)', 'LVI',
xlab='Time (months)', pval=TRUE, legend = 'none',
break.time.by = 12, xlim = c(0,60)
# legend.labs = c('a','b')
)
explanatoryUni <- 'LVI'
dependentUni <- 'Surv(OverallTime, Outcome)'
mydata %>%
finalfit(dependentUni, explanatoryUni, metrics=TRUE) -> tUni
knitr::kable(tUni[, 1:4], row.names=FALSE, align=c('l', 'l', 'r', 'r', 'r', 'r'))
tUni_df <- tibble::as_tibble(tUni, .name_repair = 'minimal') %>%
janitor::clean_names(dat = ., case = 'snake')
n_level <- dim(tUni_df)[1]
tUni_df_descr <- function(n) {
paste0(
'When ',
tUni_df$dependent_surv_overall_time_outcome[1],
' is ',
tUni_df$x[n + 1],
', there is ',
tUni_df$hr_univariable[n + 1],
' times risk than ',
'when ',
tUni_df$dependent_surv_overall_time_outcome[1],
' is ',
tUni_df$x[1],
'.'
)
}
results5 <- purrr::map(.x = c(2:n_level-1), .f = tUni_df_descr)
print(unlist(results5))
km_fit <- survfit(Surv(OverallTime, Outcome) ~ LVI, data = mydata)
km_fit
km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>%
janitor::clean_names(dat = ., case = 'snake') %>%
tibble::rownames_to_column(.data = ., var = 'LVI')
km_fit_median_df %>%
dplyr::mutate(
description =
glue::glue(
'When, LVI, {LVI}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.'
)
) %>%
dplyr::mutate(
description = gsub(pattern = 'thefactor=', replacement = ' is ', x = description)
) %>%
dplyr::select(description) %>%
dplyr::pull() -> km_fit_median_definition
km_fit_median_definition
summary(km_fit, times = c(12,36,60))
km_fit_summary <- summary(km_fit, times = c(12,36,60))
km_fit_df <- as.data.frame(km_fit_summary[c('strata', 'time', 'n.risk', 'n.event', 'surv', 'std.err', 'lower', 'upper')])
km_fit_df
km_fit_df %>%
dplyr::mutate(
description =
glue::glue(
'When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].'
)
) %>%
dplyr::select(description) %>%
dplyr::pull() -> km_fit_definition
km_fit_definition
summary(km_fit)$table
km_fit_median_df <- summary(km_fit)
results1html <- as.data.frame(km_fit_median_df$table) %>%
janitor::clean_names(dat = ., case = 'snake') %>%
tibble::rownames_to_column(.data = ., var = 'LVI')
results1html[,1] <- gsub(pattern = 'thefactor=',
replacement = '',
x = results1html[,1])
knitr::kable(results1html,
row.names = FALSE,
align = c('l', rep('r', 9)),
format = 'html',
digits = 1)
survminer::pairwise_survdiff(
formula = formula_p,
data = self$data,
p.adjust.method = 'BH'
)
library("shiny")
library("dplyr")
library("magrittr")
library("viridis")
library("readxl")
library("survival")
library("survminer")
library("finalfit")
library("glue")
mydata <- readxl::read_excel(here::here("data", "mydata.xlsx"))
mydata$int <- lubridate::interval(
lubridate::ymd(mydata$SurgeryDate),
lubridate::ymd(mydata$LastFollowUpDate)
)
mydata$OverallTime <- lubridate::time_length(mydata$int, "month")
mydata$OverallTime <- round(mydata$OverallTime, digits = 1)
mydata$Outcome <- forcats::fct_recode(as.character(mydata$Death),
"1" = "TRUE",
"0" = "FALSE")
mydata$Outcome <- as.numeric(as.character(mydata$Outcome))
mydata %>%
select(-ID,
-Name) %>%
inspectdf::inspect_types() %>%
dplyr::filter(type == "character") %>%
dplyr::select(col_name) %>%
pull() %>%
unlist() -> characterVariables
selectInput(
inputId = "Factor",
label = "Choose a Factor Affecting Survival",
choices = characterVariables,
selected = "LVI"
)
dependentKM <- "Surv(OverallTime, Outcome)"
renderPrint({
print(input$Factor)
})
tags$b("Kaplan-Meier Plot, Log-Rank Test")
tags$br()
renderPlot({
mydata %>%
finalfit::surv_plot(
.data = .,
dependent = dependentKM,
explanatory = input$Factor,
xlab = 'Time (months)',
pval = TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0, 60)
)
})
tags$b("Univariate Cox-Regression")
tags$br()
renderPrint({
mydata %>%
finalfit::finalfit(dependentKM, input$Factor) -> tUni
knitr::kable(tUni[, 1:4],
row.names = FALSE,
align = c('l', 'l', 'r', 'r', 'r', 'r'))
})
tags$b("Median Survival")
tags$br()
renderPrint({
formula_text <- paste0("Surv(OverallTime, Outcome) ~ ",input$Factor)
km_fit <- survfit(as.formula(formula_text),
data = mydata)
km_fit
})
tags$b("1-3-5-yr Survival")
tags$br()
renderPrint({
formula_text <- paste0("Surv(OverallTime, Outcome) ~ ",input$Factor)
km_fit <- survfit(as.formula(formula_text),
data = mydata)
summary(km_fit, times = c(12, 36, 60))
})
renderPrint({
formula_text <- paste0("Surv(OverallTime, Outcome) ~ ",input$Factor)
km_fit <- survfit(as.formula(formula_text),
data = mydata)
km_fit_summary <- summary(km_fit, times = c(12,36,60))
km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", "surv", "std.err", "lower", "upper")])
km_fit_df %>%
dplyr::mutate(
description =
glue::glue(
"When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI]."
)
) %>%
dplyr::select(description) %>%
pull()
})
# https://easystats.github.io/correlation/
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
library("correlation")
correlation::correlation(iris)
library(dplyr)
iris %>%
select(Species, starts_with("Sepal")) %>%
group_by(Species) %>%
correlation::correlation() %>%
filter(r < 0.9)
correlation::correlation(select(iris, Species, starts_with("Sepal")),
select(iris, Species, starts_with("Petal")),
partial=TRUE)
correlation(iris, bayesian=TRUE)
library(report)
iris %>%
select(starts_with("Sepal")) %>%
correlation::correlation(bayesian=TRUE) %>%
report()
report::report(cor.test(iris$Sepal.Length, iris$Petal.Length))
iris %>%
group_by(Species) %>%
correlation() %>%
report() %>%
to_table()
iris %>% explore(Sepal.Length, Petal.Length)
iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
iris %>% explore(Sepal.Length, Petal.Length, target = is_versicolor)
dlookr::correlate(carseats)
dlookr::correlate(carseats, Sales, CompPrice, Income)
dlookr::correlate(carseats, Sales:Income)
dlookr::correlate(carseats, -(Sales:Income))
carseats %>%
dlookr::correlate(Sales:Income) %>%
dplyr::filter(as.integer(var1) > as.integer(var2))
carseats %>%
dplyr::filter(ShelveLoc == "Good") %>%
group_by(Urban, US) %>%
dlookr::correlate(Sales) %>%
dplyr::filter(abs(coef_corr) > 0.5)
dlookr::plot_correlate(carseats)
dlookr::plot_correlate(carseats, Sales, Price)
carseats %>%
dplyr::filter(ShelveLoc == "Good") %>%
dplyr::group_by(Urban, US) %>%
dlookr::plot_correlate(Sales)
## Summary statistics by – overall with correlation
SmartEDA::ExpNumStat(
Carseats,
by = "A",
gp = "Price",
Qnt = seq(0, 1, 0.1),
MesofShape = 1,
Outlier = TRUE,
round = 2
)
# https://alastairrushworth.github.io/inspectdf/articles/pkgdown/inspect_cor_exampes.html
inspectdf::inspect_cor(storms)
inspectdf::inspect_cor(storms) %>% inspectdf::show_plot()
inspectdf::inspect_cor(storms, storms[-c(1:200), ])
inspectdf::inspect_cor(storms, storms[-c(1:200), ]) %>%
slice(1:20) %>%
inspectdf::show_plot()
cor %>%
report::to_values()
mydata %>%
select(continiousVariables,
-dateVariables) %>%
visdat::vis_cor()
library(report)
model <- lm(Sepal.Length ~ Species, data = iris)
report::report(model)
# Table report for a linear model
lm(Sepal.Length ~ Petal.Length + Species, data=iris) %>%
report::report() %>%
report::to_table() %>%
kableExtra::kable()
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
summarizer(dependent, explanatory)
num_cat <- dlookr::relate(num, ShelveLoc)
num_cat
summary(num_cat)
plot(num_cat)
my_text <- kableExtra::text_spec("Some Text",
color = "red",
background = "yellow"
)
# `r my_text`
mylongtext <- paste("İstatistik Metod:
Sürekli verilerin ortalama, standart sapma, median, minimum ve maksimum değerleri verildi. Kategorik veriler ve gruplanan sürekli veriler için frekans tabloları oluşturuldu. Genel sağkalım analizinde ölüm tarihi ve son başvuru tarihi hasta dosyalarından elde edildi.
Sağkalım analizinde Kaplan-Meier grafikleri, Log-rank testi ve Cox-Regresyon testleri uygulandı. Analizler R-project (version 3.6.0) ve RStudio ile survival ve finalfit paketleri kullanılarak yapıldı. p değeri 0.05 düzeyinde anlamlı olarak kabul edildi.
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Therneau T (2015). A Package for Survival Analysis in S. version 2.38, https://CRAN.R-project.org/package=survival
Terry M. Therneau, Patricia M. Grambsch (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York. ISBN 0-387-98784-3.
Ewen Harrison, Tom Drake and Riinu Ots (2019). finalfit: Quickly Create Elegant Regression Results Tables and Plots when Modelling. R package version 0.9.6. https://github.com/ewenharrison/finalfit"
)
mylongtext <- strwrap(mylongtext)
# `r mylongtext`
boxplot(1:10)
plot(rnorm(10))
ggplot2::ggplot(mtcars,
ggplot2::aes(x=mpg)
) +
ggplot2::geom_histogram(fill="skyblue", alpha=0.5) +
ggplot2::theme_minimal()
Block rmdnote
Block rmdtip
Block warning
projectName <- list.files(path = here::here(), pattern = "Rproj")
projectName <- gsub(pattern = ".Rproj", replacement = "", x = projectName)
analysisDate <- as.character(Sys.Date())
imageName <- paste0(projectName, analysisDate, ".RData")
save.image(file = here::here("data", imageName))
rdsName <- paste0(projectName, analysisDate, ".rds")
readr::write_rds(x = mydata, path = here::here("data", rdsName))
saveRDS(object = mydata, file = here::here("data", rdsName))
excelName <- paste0(projectName, analysisDate, ".xlsx")
rio::export(
x = mydata,
file = here::here("data", excelName),
format = "xlsx"
)
# writexl::write_xlsx(mydata, here::here("data", excelName))
print(glue::glue(
"saved data after analysis to ",
rownames(file.info(here::here("data", excelName))),
" : ",
as.character(
file.info(here::here("data", excelName))$ctime
)
)
)
mydata %>%
downloadthis::download_this(
output_name = excelName,
output_extension = ".csv",
button_label = "Download data as csv",
button_type = "default"
)
mydata %>%
downloadthis::download_this(
output_name = excelName,
output_extension = ".xlsx",
button_label = "Download data as xlsx",
button_type = "primary"
)
# pacman::p_load(here, lubridate, glue)
# here::here("data", glue("{today()}_trends.csv"))
# mydata %>% select(
# -c(
# rapor_yil,
# rapor_no,
# protokol_no,
# istek_tarihi,
# nux_yada_met_varsa_tarihi,
# son_hastane_vizit_tarihi,
# Outcome
# )
# ) -> finalSummary
#
# summarytools::view(summarytools::dfSummary(x = finalSummary
# , style = "markdown"))
citation()
report::cite_packages(session = sessionInfo())
report::show_packages(session = sessionInfo()) %>%
kableExtra::kable()
# citation("tidyverse")
citation("readxl")
citation("janitor")
# citation("report")
citation("finalfit")
# citation("ggstatsplot")
if(!dir.exists(here::here("bib"))) {dir.create(here::here("bib"))}
knitr::write_bib(x = c(.packages(), "knitr", "shiny"),
file = here::here("bib", "packages.bib")
)
sessionInfo()
pacman::p_loaded(all = TRUE)
search()
library()
installed.packages()[1:5, c("Package", "Version")]
installed.packages()
\pagebreak
push all changes to GitHub repository
source(file = here::here("R", "force_git.R"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.