library(knitr) options(htmltools.dir.version = FALSE, cache=TRUE) opts_chunk$set(comment = NA, prompt=TRUE) #opts_chunk$set(dev.args=list(bg="transparent"), fig.width=15, fig.height=7) source("kutheme.R")
background-image: url(pics/redcard+sociology.jpg) background-size: 96% class: center, middle
???
Science og statistics har været under pres de seneste år.
Specielt indenfor psykologi har det vist sig svært at replikere vigtige resultater
Resulteret i en diskussion om, hvad videnserfaring var, og hvordan forskellige grupper kan se så forskelligt på de samme data.
background-image: url(pics/soccer.png) background-size: 98% class: center, middle
???
Soccer la la la la
Rødt kort til farvede spillere
29 forskergrupper
From idea ...
library(DiagrammeR) library(DiagrammeRsvg) library(svglite) library(rsvg) #svg <- export_svg( grViz(" digraph dot { graph [layout = dot, rankdir = LR, bgcolor='#000000', size=2] node [shape = circle, style = filled, fillcolor = DimGray, fontcolor = White, fontsize=15, fontname=Helvetica, label = '', penwidth=4, margin=0.05, color=White] a [label='Design'] b [label='Collect'] c [label='Analyze'] d [label='Publish'] edge [color = White, penwidth=4] a -> b -> c c -> d [color=red, penwidth=4] }") #) #svg %>% # charToRaw %>% rsvg %>% png::writePNG('graph.png') #knitr::include_graphics("graph.png") # html_print(HTML(svg), background="transparent", viewer=NULL)
.pull-right[... to publication.]
???
The red line shows where peer review comes in. Total summary. We need to document the steps we did througout as shown in the soccer example.
We want reproducible research
Statistical analysis
.large[All of the data were analyzed with data processing software and figures with Microsoft excel 2007.]
.pull-right[-- Tayefe et al, Advances in Bioresearch, 2014]
???
Full statistical analysis section from a scientific paper.
Clearly impossible to reproduce
Reproducibility
Given code/data/materials, can I get the same (=identical) numbers that you did?
Replicability
Given scientific protocol, can I get the same (=in agreement) result that you did in my own study?
???
However, what do we really do?
.large[Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.]
.right[.small[-- "For Big-Data Scientists, 'Janitor Work' Is Key Hurdle to Insight" - The New York Times, 2014]]
???
Noter
library(DiagrammeR) library(DiagrammeRsvg) library(svglite) library(rsvg) #svg <- export_svg( grViz(" digraph dot { graph [layout = dot, rankdir = LR, bgcolor='#000000', size=2] node [shape = circle, style = filled, fillcolor = DimGray, fontcolor = White, fontsize=15, fontname=Helvetica, label = '', penwidth=4, margin=0.05, color=White] a [label='Design'] b [label='Collect'] c [label='Analyze'] d [label='Publish'] edge [color = White, penwidth=4] a -> b c -> d b -> c [color=red, penwidth=16] }") #) #svg %>% # charToRaw %>% rsvg %>% png::writePNG('graph.png') #knitr::include_graphics("graph.png") # html_print(HTML(svg), background="transparent", viewer=NULL)
knitr::include_graphics("pics/cartoon-metadata.png")
???
GIGO
Huge impact here
class: middle
dataMaid
. Extending dataMaid
validate
Exercises.
If you haven't already: go to www.biostatistics.dk/CSP2018/ and install the required packages.
background-image: url(pics/flower.png) background-size: 60% class: center, middle
background-image: url(pics/structure.png) background-position: right background-size: 30%
.small[ .pull-left[ Wrangle to put into correct format and type (validity) Screen to look for consistency, accuracy and uniqueness Validate to check for consistency, accuracy and uniqueness Clean data * Check (screen/validate) again ]]
--
.
???
Note the complete overlap between context and content.
Crucial: someone must know the topic!
library(reshape2) library(readr) DF <- data.frame(id=c(1, 2), bmi0=c(35.2, 31.1), bmi52=c(24.2, 27.0)) DFm <- DF %>% melt(id.vars="id") %>% mutate(time=readr::parse_number(variable)) %>% select(-variable)
.pull-left[
knitr::kable(DF, format = 'markdown')
]
.pull-right[
knitr::kable(DFm, format = 'markdown')
]
--
???
Fordele og ulemper ved begge dele. Man skal være opmærksom på, hvad man har med at gøre.
class: center, middle
.Large[Thou shall never manually modify your raw data.]
There are no exceptions to this rule.
class: center, middle
.Large[Thou shall never overwrite your raw data.]
There are no exceptions to this rule either.
Format for writing reproducible, dynamic reports with R. Embed R code and results into slideshows, pdfs, html documents, Word files and more. See cheat sheet at RStudio.
install.packages("rmarkdown", "knitr")
knitr::include_graphics("pics/rmarkdown.png")
Technically corrct data requires that the data formats are correct
.pull-left[
DF
]
.pull-right[
lapply(DF, class)
]
class: inverse, middle
Get the bigPresidentData
from the dataMaid
package:
library(dataMaid) data("bigPresidentData")
Hunt for errors!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.