library(learnr) library(JM) library(memisc) library(foreign) pregdata <- read.spss('www/pregdata.sav',to.data.frame = TRUE) knitr::opts_chunk$set(echo = FALSE)
It is important to load all necessery packages before starting.
Load the JM
and memisc
package. Press the 'Run Code' button when you are done. Click the 'Solution' button when you want to see the answers.
library(JM) library(memisc)
Always check your data.
Explore the pbc2.id and the pbc2 data sets (print the first six and last six rows). Again press the 'Run Code' button when you are done.
head(pbc2.id) head(pbc2) tail(pbc2.id) tail(pbc2)
Calculate the mean and median age from the pbc2.id data set.
mean(pbc2.id$age) median(pbc2.id$age)
mean(...)
and median(...)
to investigate the data set. Replace the dots with the name of the data set and the variable name separated by a dollar sign. For example mean(pbc2.id$age)
.
Some questions have more than one correct answers.
quiz( question("The primary R system is available from the:", answer("GNU"), answer("GitHub"), answer("CRAN", correct = TRUE), answer("All of the above") ), question("Which program other do I need to install in order to use R?", answer("None", correct = TRUE), answer("Rstudio", message = "RStudio is very useful when you work with R. It is not absolutely required however."), answer("RWinEdt"), answer("Rstudio and WinEdt") ), question("The interface of Rstudio consists of several panes (sections of the program window). How many panes do we have at most (hint: This maximum is also the usual number that we see once we have opened an .R file.)", answer("1"), answer("2"), answer("3"), answer("4", correct = TRUE), answer("5"), answer("6"), answer("More than 6") ), question("With which of the following functions can we load packages?", answer("library()", correct = TRUE), answer("install.packages()", message = "install.packages() can be used to install packages. You still need to load them using library() or require() however"), answer("require()", correct = TRUE), answer("None of the above") ) )
Categorize serum bilirubin as low_val: [0.3 until 3.2) and high_val: [3.2 until 28]. Give this variable the name "serBilCat". Print the first 6 rows of the data:
pbc2.id$serBilCat <- as.numeric(pbc2.id$serBilir >= 3.2) pbc2.id$serBilCat <- factor(pbc2.id$serBilCat, levels = c(0, 1), labels = c("low_val", "high_val")) head(pbc2.id)
Hint: Use the function factor(...)
to create a categorical variable. Use the function as.numeric(...)
to convert a variable to a numeric variable.
Categorize serum bilirubin as low_val: all values from the lowest upto but not including including the mean and high_val: the mean and everything above that. Give this variable the name "serBilCat2". Print the first 6 rows of the data:
pbc2.id$serBilCat2 <- as.numeric(pbc2.id$serBilir >= mean(pbc2.id$serBilir)) pbc2.id$serBilCat2 <- factor(pbc2.id$serBilCat2, levels = c(0, 1), labels = c("low_val", "high_val")) head(pbc2.id)
Hint: Use the function factor(...)
to create a categorical variable. Use the function as.numeric(...)
to convert a variable to a numeric variable.
Categorize serum bilirubin as follows:
Give this variable the name "serBilCat3". Print the first 6 rows of the data:
pbc2.id$serBilCat3 <- cut(pbc2.id$serBilir, c(min(pbc2.id$serBilir), 2, 4, max(pbc2.id$serBilir)), right = FALSE) pbc2.id$serBilCat3 <- factor(pbc2.id$serBilCat3, levels = c("[0.3,2)", "[2,4)", "[4,28)"), labels = c("low_val", "med_val", "high_val")) head(pbc2.id)
Hint: Use the function factor(...)
to create a categorical variable. Use the function cut(..., right = FALSE)
to convert a continuous variable to a categorical variable. If needed consult the documentation for the cut
function using ?cut.
Check whether the variable "hepatomegaly" consists of missing values:
sum(is.na(pbc2.id$hepatomegaly)) # or any(is.na(pbc2.id$hepatomegaly))
Hint: Use the function is.na(...)
to investigate if a vector has missing values.
Explore the data.
Obtain the mean and sd for the follow-up years in females:
mean(pbc2$years[pbc2$sex == "female"]) sd(pbc2$years[pbc2$sex == "female"])
Obtain the median and interquartile range for age in 2 decimals:
round(median(pbc2.id$age), digits = 2) round(IQR(pbc2.id$age), digits = 2)
Obtain the mean and sd for the baseline serum bilirubin:
mean(pbc2.id$serBilir) sd(pbc2.id$serBilir)
Obtain the percentage of drug and placebo:
percent(pbc2.id$drug)
Some questions have more than one correct answers.
quiz( question("Which functions can we use to obtain the means and standard deviation:", answer("mean(...) and sd(...)", correct = TRUE), answer("Mean(...) and SD(...)"), answer("mean_val(...) and Sd_val(...)"), answer("The two first options") ), question("Which is a useful function to explore subset of data?", answer("library(...)"), answer("tapply(...)", correct = TRUE), answer("percent(...)", correct = TRUE), answer("combination of tapply(...) and functions such as mean(...)") ) )
In the next quiz you can test your knowledge about Rs data types.
quiz( question("Which of the following are Rs elementary data types (multiple answers are possible.)", answer("numeric", correct= TRUE), answer("vector"), answer("character", correct=TRUE), answer("logical", correct = TRUE) ), question('For which of the following objects all elements have to be of the same type? (multiple answers are possible.)', answer("vector", correct= TRUE), answer("list"), answer("matrix", correct=TRUE), answer("data.frame") ) )
Understanding how indexing works.
Select the first row of the pbc2.id data set:
pbc2.id[1, ]
Select the first column of the pbc2.id data set:
pbc2.id[, 1]
Select column "id" from the pbc2.id data set:
pbc2.id["id"] # OR pbc2.id[["id"]] OR pbc2.id[,"id"]
Select only the patients that received the active treatment from the pbc2.id data set:
pbc2.id[pbc2.id$drug == "D-penicil", ]
Select the sex of the 10th patient:
pbc2.id$sex[10]
Select the baseline details of the 5th patient:
pbc2.id[5, ]
Select the serum cholesterol values for all males (use the long format data set):
pbc2$serChol[pbc2$sex == "male"]
Select only the baseline details for females:
pbc2.id[pbc2.id$sex == "female", ]
Select the age for patients that have serum bilirubin more than 2 (use the short format data set):
pbc2.id$age[pbc2.id$serBilir > 2]
Select the follow-up years for female patients that have serum bilirubin more than 1 at the end of the study:
pbc2.id$years[pbc2.id$serBilir > 1 & pbc2.id$sex == "female"]
Select patients that have no missing values in serum cholesterol at the end of the study:
pbc2.id[!is.na(pbc2.id$serChol), ]
Create a vector x that takes values from -20 to 10 with step 1. Select a) the elements of x that are larger than 0 and smaller or equal than 8 and b) the elements of x that are larger than 5 or smaller than -5:
x <- -20:10 x[x > 0 & x <= 8] x[x > 5 | x < -5]
Create a vector x that takes values from -20 to 10 with step 1. Select all the elements that are not zero:
x <- -20:10 x[x != 0]
Create a matrix with the name mat that takes values from 1 to 6 with the first column as c(1, 2, 3). Select a) the first and second rows and b) the first and second columns:
mat <- matrix(1:6, 3, 3) mat[1:2, ] mat[ , 1:2]
Create a vector x that takes values from -20 to 10 with step 1. Create a matrix with the name mat that takes values from 1 to 6 with the first column as c(1, 2, 3). Create a list (with the name myList) that included as elements the vector x and the matrix mat. Select a) the first element of the list and b) the first and second columns from the second element of the list:
x <- -20:10 mat <- matrix(1:6, 3, 3) myList <- list(x, mat) myList[[1]] myList[[2]][, 1:2]
Some questions have more than one correct answers.
quiz( question(" We run the following code: \n v <- c(1, 5, 8) \n v[2] \n What is the value returned by the last statement?", answer("1"), answer("2"), answer("5", correct=TRUE), answer("8") ), question("We run the following code: \n L <- list(1, 2, 'A', 'B', FALSE) \n L[1] \n What is the value returned by the last statement?", answer("A numeric vector of length 1"), answer("A numeric vector of length 2"), answer("A list of length 1", correct=TRUE), answer("A list of length 2") ), question("ages is a vector of the ages in years of some individuals. How can we select all ages that are less than or equal to 40?", answer("ages[1:40]"), answer("ages[40]"), answer("ages[ages=<40]"), answer("ages[ages<=40], correct=TRUE") ), question(" We define a data.frame: \n mydata <- data.frame(ages=c(31, 32, 33), Sex=factor(c('M', 'V', 'M')) \n ) \n How can we select the 2nd row of mydata? ", answer("mydata[2]"), answer("mydata[[2]]"), answer("mydata[2, ]", correct=TRUE), answer("mydata[ ,2]") ), question("We run the following code in R: \n A <- matrix(1:9, nrow = 3) \n B <- A[2,,drop=FALSE] \n How can we select the 2nd row of mydata? Which of the following is correct ", answer("B is a vector of length 3"), answer("B is a 1 by 3 matrix", correct=TRUE), answer("B is a 3 by 1 matrix"), answer(" B is a list of length 3") ), question("How can we select the first row and first column of the pbc2.id data set?", answer("pbc2.id[c(1,2), ]"), answer("pbc2.id[1, 1]", correct = TRUE), answer("pbc2.id[[c(1,2)]]"), answer("pbc2.id[[1]][1]", correct = TRUE) ), question("Which of the following code will print the age of the patients that received the placebo?", answer("pbc2.id[pbc2.id$drug == \"placebo\", 5]", correct = TRUE), answer("pbc2.id$age[pbc2.id$drug == \"placebo\"]", correct = TRUE), answer("pbc2.id[pbc2.id$drug == \"placebo\", \"age\"]", correct = TRUE), answer("pbc2.id[[\"age\"]][pbc2.id$drug == \"placebo\"]", correct = TRUE) ) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.