knitr::opts_chunk$set(echo = TRUE, comment = NA, error = TRUE)
activity
datasetlibrary(RepDataPeerAssessment1) data("activity") head(activity)
steps <- activity$steps summary(steps)
We have r sum(is.na(steps))
NAs in this variable.
We have r round(mean(is.na(steps)) * 100, 0)
percent of missing values!
library(dplyr) filter(activity) %>% summarize(zeros = mean(steps != 0, na.rm = TRUE))
Source: http://thomasleeper.com/Rcourse/Tutorials/NAhandling.html
Analyze a vector or variable of the dataframe:
length(steps[is.na(steps) == TRUE]) # number of NAs
Analyze for NAs in a the whole dataframe:
length(steps[is.na(steps) == FALSE]) # number of non-NAs summary(activity)
image(is.na(activity), xaxt = "n", # suppress plotting of x axis yaxt = "n", # suppress plotting of y axis bty = "n" # do not draw a box around ) # custom labels for x and y axis axis(1, seq(0, 1, length.out = nrow(activity)), 1:nrow(activity), col = "white") axis(2, # y axis c(0, 0.5, 1), # sequence of labels names(activity), # variables names of data frame col = "white", # color las = 2) # 2: perpendicular to the axis. Defaults = 0 box(lty = '1373', col = 'black') # draw a box around
Source: https://www.r-bloggers.com/example-2014-5-simple-mean-imputation/
df = data.frame(x = 1:20, y = c(1:10,rep(NA,10))) head(df) tail(df)
df$y[is.na(df$y)] = mean(df$y, na.rm = TRUE) tail(df)
transform
and ifelse
# alternative df = data.frame(x = 1:20, y = c(1:10,rep(NA,10))) head(df) tail(df) df = transform(df, y = ifelse(is.na(y), mean(y, na.rm=TRUE), y)) tail(df)
In the first example, we identify elements of y that are NA, and replace them with the mean, if so. In the second, we test each element of y; if it is NA, we replace with the mean, otherwise we replace with the original value.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.