library(knitr) options(htmltools.dir.version = FALSE, cache=TRUE) opts_chunk$set(comment = NA, prompt=TRUE) #opts_chunk$set(dev.args=list(bg="transparent"), fig.width=15, fig.height=7) source("kutheme.R") library(dataMaid) toyData <- as.data.frame(toyData)
knitr::include_graphics("pics/datacleaning.jpg")
.pull-right[Not the best term ... and should not be unsupervised]
In an R-script:
NA
to mark that information is missing in this spot. Two systems for selecting observations in data.frame
s in R:
By index (row number) or using a logical vector.
(tD <- head(toyData, 3))
Four equivalent ways to get the second line of tD
:
tD[2, ] #indexing tD[c(FALSE, TRUE, FALSE), ] #manual logical vector tD[tD$id == 2, ] #informative logical vector tD %>% filter(id==2) # Using tidyverse
tD[tD$id == 2, ] #informative logical vector
Use informative logical vectors as much as possible!
tD #Mark non-positive change as missing: tD[tD$change > 0, "change"] <- NA
ALWAYS use variable names.
#readable, informative code: tD[tD$change > 0, "change"] <- NA # Indexing by numbers easily becomes # a source of error by itself: tD[tD$change > 0, 4] <- NA
class: inverse
Correct the errors you have found so far.
Make sure to make the cleaning process reproducible.
Remember rules 1 and 2!
background-image: url(pics/structure.png) background-size: 30%
Should now have
a cleaned dataset
that can form the
basis for future
analyses.
With documentation
of how we got
there!
Produce a summary document for subsequent analyses.
.footnotesize[
makeCodebook(bigPresidentData)
]
Add label (similar to labelled
package) or extra information
.footnotesize[
bPD <- bigPresidentData attr(bPD$presidencyYears, "label") <- "Full years as president"
]
.footnotesize[
attr(bPD$dateOfDeath, "shortDescription") <- "Missing means that the person is still alive"
]
class: inverse
Create the final codebook with additional information about some of the variables.
makeCodebook(myCleanedData)
class: middle, center
Please grab hold of us here or via email
.pull-left[Anne
ahpe@sund.ku.dk] .pull-right[Claus
ekstrom@sund.ku.dk]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.