library(plyr) library(xtable) library(knitr) library(ggplot2) library(reshape2) opts_chunk$set(echo=T) plot.flag = T
This is an R Markdown and knitr document. The document itself is a .rmd file, which you can consider the "input script". The "output" is a formatted HTML, Word, or pdf document. R Markdown is very easy to use from RStudio, which provides point and click compilation. See online resources for how to install Rstudio, the knitr package (required for processing code chunks), and how to knit a .rmd file. What we'll talk about in this document is why Rmarkdown/knitr is useful and how to use some of its features.
Think about the data analyses you've run and the problems you've encountered. We can classify those in at least a couple categories:
Markdown provides a way to write document with richer formatting than text (e.g. we can put in italic, bold, bulleted lists, hyperlinks, graphics). We can even include equations due to an HTML plug-in called MathJax, using LaTeX notation, e.g.:
$C_p = Dose \frac{F}{V} \frac{k_a}{k_a - k_{e}} \left(e^{-k_e t} - e^{-k_a t} \right)$
Now the real cool thing... we can include R code in our document by writing "chunks".
We can write code to do a computation, like simulate some data:
theta = c(V=10, ke=.5, ka=1) dose=100 omega = diag(.1,3) N.subj = 24 rand = matrix(rnorm(N.subj*3,0,1),ncol=3) eta=exp( rand %*% t(chol(omega))) phi = sweep(eta,MARGIN=2,theta,'*') colnames(phi)=names(theta) phi.df = data.frame(cbind(id=1:N.subj,phi)) time.df = data.frame(time = c(.5, 1, 1.5, 2, 3, 4, 6, 8, 12, 24)) cp.df = merge(phi.df,time.df) cp.df = arrange(cp.df,id,time) cp.df = within(cp.df, { dv = dose*ka/(ka-ke)*(exp(-ke*time)-exp(-ka*time)) })
What did we just do? We made a dataframe of simulated subjects using a 1 compartment, extravascular model.
Let's look at the data:
ggplot(cp.df, aes(x=time,y=dv,group=id)) + geom_point() + geom_line() + theme_bw() + scale_y_log10()
How about a summary of the data? Cmax, AUC
# get parameters summary.df = ddply(cp.df,.(id),summarise, Cmax = max(dv), AUC = dose/(ke*V)[1]) Cmax.mean = mean(summary.df$Cmax) #save it for easy reference #stack and do summaries summary.df = melt(summary.df,id="id") summary.df = ddply(summary.df,.(variable),summarise, mean=mean(value), log.mean=exp(mean(log(value))), sd = sd(value) ) kable(summary.df, row.names=F, digits=2)
The mean Cmax is (computed live): r format(Cmax.mean,digits=4)
Here's the value pasted from when it was run from the command line:
# Copy Cmax from console # 48.48
Neat, we ran a simulation, made a plot and a summary table. All the results are right in front of us (in the HTML output)
So, with this small example maybe we can already see how knitr and markdown address the problems above:
There are some ways to use Markdown and knitr that make a lot of sense. Ask yourself these questions:
If you answered yes to any of these, Markdown and Knitr may be useful. Here are some common things you can use it for:
RStudio provides a help button with two options when the active tab is an rmd file. Have a look at the Markdown Quick Reference. There are some other good references online:
This is a suggestion only
HTML doesn't have real estate limits like documents do. So, if you want to plot 1000 subjects in a 10 x 100 grid, go nuts. Just make the output dimensions of the plot larger to control how big each panel is on screen. If you want panels about 1inch square, make it 10in by 100in.
There are a couple useful chunk options for plots:
Plots are saved in the fig.path directory. This can be changed by chunk or set for the entire document by calling opts_chunk() in any chunk (best done in preamble though). Plots will be named after the chunk they are created from. If you want to easily find the files consider naming chunks with plots in them.
You can still save plots to a file the same way you always have.
There a few ways to make a nice table with Rmarkdown. Tables come out in HTML documents as html tables (as opposed to images) and can easily be copied into powerpoint or word documents.
Tables can be 'hard coded' in the Markdown using dashes '-' and pipes '|'.
Table: Lord of the Rings Characters
Name | Species | Origin ------|---------|-------- Bilbo | Hobbit | Shire Gandalf | Maia | Valinor Thorin | Dwarf | Ered Luin Boromir | Human | Gondor
Generate a table from a data.frame very easily using the kable function in the knitr package. It's not very flexible, but usually does fine if formatting isn't the main goal. By default, kable will create a table in Markdown. Set results='asis' to have knitr process the generated Markdown into html. Set row.names=F to omit row names.
cyls.summary = ddply(mpg,.(cyl),summarise,N=length(year),mean.displacement=mean(displ),mean.hwy=mean(hwy)) kable(cyls.summary, row.names=F, digits=2)
You can also use xtable. Set the output format to HTML (not latex). HTML won't have the latex style formatting you may be used to. This method is less convenient because of the two step process required. But xtable offers more flexibility in formatting cells contents.
cyls.x = xtable(cyls.summary,digits=c(0,1,2,2,1)) print.xtable(cyls.x, include.rownames=F, type='html')
We are working on a qP stylesheet that makes the tables look more like the booktabs option in LaTeX.
We usually want to load libraries for our analyses. This invariably generates a wall of text telling us about the libraries, directories, missing bits, warnings, errors, etc... Sometimes we want to see this, like in an analysis notebook, so we can ensure nothing suspicious has happened. Other times we want to hide this bit of code entirely, maybe to show results to a client. Use the "include" flag.
If a chunk takes a long time to run, try caching it. Just put 'cache=T' in the chunk options. A subdirectory will be created with cache files. If the inputs or code of the chunk change, the chunk will be computed. Otherwise it will load from cache. Try not to isolate the time consuming bit and put it in its own chunk so that you won't be changing things too often. The best candidates are file loading and simulations.
Sometimes caching screws up. If code in a cached chunk is running wierdly, delete the cache directory.
Maybe you don't want caching, but just want to avoid running a chunk. Here's a scenario, with run times:
So, we can set the chunk option 'eval=F' for steps 1 and 2 (after running them once). That R code will be skipped and any variables modified or created there will not be present in the knitr session. Nonetheless, if we've written modular code steps 3 and 4 should still work fine.
You can set a global variable in the preamble (sim.flag=F) and use it to toggle the eval option, 'eval=sim.flag' Try it with plot.flag in this example. The code chunk is still echoed, but the plot doesn't show up in the output.
Maybe you want to show output to a client, but don't want to take a tour of 1000 lines of code. You can set a the default for the chunk option 'echo' in the preable. This example sets it to T. Set it to F and see what happens.
Develop your code just like you normally would. Run the lines (ctrl+enter) or the chunks (ctrl+alt+c) to put variables into the Rstudio workspace. If you get to a point where you want to see plots in larger windows than the viewer screen, knit the file.
Occasionally knit the file to make sure you don't have a problem with phantom code. Review the results in the popup viewer or in your browser (right click on the html output in the Files tab for a popup menu with the options)
For data management activities, try to write out results of important steps. Show the QC person that the step worked. Pick subject to use and follow that subject through the transformations.
For EDA and analyses, add comments in the markdown at decision points. Use inline R to capture important things like estimates, p-values, counts, whatever. What you do here can easily be copied to a presentation or report.
For QC, make a copy of the originator RMD or R file (convert to RMD by breaking it into chunks) and run it. Compare results with originator. Insert QC checks in the QC RMD file. If you find something, you have a record of how the originator code ran, and what you found. At the end of the QC you have an artifact that shows the QC due diligence. When QCing reports, if necessary, copy tables or figures as image files and import them into the RMD file to contrast with the QC calculations.
If knitr throws an error, it is usually due to one of these things:
R Markdown and Knitr provide a way to combine results of R based analyses into a readable, sharable document. The learning curve is pretty shallow and there's a lot that you can do with very little skill, like putting plots and tables together in an analysis workbook. Try it on your next project.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.