knitr::opts_chunk$set(cache=TRUE,fig.width=6,fig.height=4)
Package provenance
provides functions for the visualisation of typical data utilised in sediment provenance analysis in the geosciences, as well as helper functions aiding data analysis. Plotting takes advantage of the framework provided by the ggplot2
package, and the output plots are ggplot objects, which allows for further modification with ggplot2
's functions and easy saving of the graphics. This document is meant for geoscientists who want to get step-by-step guidelines for using provenance
's plotting functions, and who might not be experienced users of the R language.
The "workhorse functions" of the package are:
plotKDE()
provides plotting of kernel density estimates (KDEs) of geochronological data for single or suites of samples
plotMDS()
provides plotting of multidimensional scaling (MDS) maps for quick visual identification of trends and similar groups within a set of sample data
Working with provenance data using the ggprovenance
package in R typically involves three main steps:
Of these steps, 2 usually is an iterative "trial-and-error" or gradual improvement process to determine best graphical representation of the data, with a few additional calculations necessary here and there. Step 3 is, due to ggplot2
's functionality, a trivial call to ggsave()
. The first item, loading the data, is usually the most involved and might require the user to write their own scripts for the purpose, as this can not easily be standardised over the wide range of possible data file formats and the individual requirements of each user. This step can be further subdivided into:
1a. loading data files
1b. reformattiong data
1c. adding information for visualisation
All basic steps will be presented in workable examples in section 2 of this document. Section 3 illustrates the effects of different settings of the parameters given to the main plotting functions. Section 4 gives a basic generic script that should be easily adaptable to the user's needs. Section 5 contains additional tips & tricks on plotting and data import.
\pagebreak
First, we need to load the package:
library(ggprovenance)
For this document, we use example data files installed with ggprovenance
, but the principles would be the same for any other data file. Simple example data files are provided in the /extdata
subfolder of the package installation folder, these can serve as an example of how to prepare data. Alternatively, the data loading process can be adapted to the existing data files, and the data restructured in R, to obtain objects suitable for the plotting functions. The latter approach is preferrable, as it leaves the original data files untouched, and new and altered data is easily replotted by simply re-running the R script, instead of preparing intermediate data files by hand. See also section Tips & Tricks for notes on how to import lists of individual data files and MS Excel files.
In the /extdata
folder, there is a simple example file of detrital zircon U-Pb ages. The individual measured ages are arranged column-wise, one column per sample, with the sample names in the first line.
table<-read.xls(xls=system.file("extdata", "Tarim.xls", package="ggprovenance"), stringsAsFactors=FALSE) knitr::kable(table[1:6,],format="markdown")
To load this data, you could use the included function read.xls.flat()
:
agedata<-read.xls.flat(system.file("extdata", "Tarim.xls", package="ggprovenance"))
Two small helper functions are provided with ggprovenance
, read.xls.flat()
and read.xls.tabbed()
, which are wrappers around gdata
's read.xls()
function, adapted for two common cases: Excel files with either data in one worksheet, one column per sample, or data contained in a file with one worksheet per sample, although it is assumed that the column name for the data of interest within each worksheet is the same. Please note that gdata
needs a working perl installation, which is usually the case on *nix-based operating systems, but requires extra instllation steps under Windows, see Section 5.
Another common file format are comma-separated values (csv) files, which can be read with read.table()
and its derivates, from the utils
package. This is typically contained in a base installation of R, and thus does not require installing further packages or dependencies.
Before plotting, we can have a look at some properties of the data:
names(agedata)
For e.g. colouring purposes, additional information can be added to the data. For example, we might classify our data into general source areas. plotKDE()
allows to provide a data.frame in parameter categories
for this purpose. We prepare a data.frame with the desired properties in columns (here: area
), and the sample names set as row.names
(see section section 3 for an easier way to achieve this).
cats<-data.frame(area=rep("n/a",length(agedata)),stringsAsFactors=FALSE) row.names(cats)<-names(agedata) cats$area[grep("Tb04",row.names(cats))]<-"Tarim" cats$area[grep("Tb21",row.names(cats))]<-"Taklamakan" cats$area[grep("Tb22",row.names(cats))]<-"Taklamakan" cats$area[grep("Tb35",row.names(cats))]<-"Kunlun" cats$area[grep("Tb38",row.names(cats))]<-"Kunlun" cats$area[grep("Tb50",row.names(cats))]<-"Tian Shan"
The area name of each data set in data
is now recorded in smplarea
(we'll use this later):
knitr::kable(cats,format="markdown")
To get a quick look at the loaded data, plotKDE()
is a good point to start.
plotKDE(agedata)
With the cats
variable we generated, and some additional adjustments, we can make this a little clearer. See the examples and help files for details.
names(cats) plotKDE(agedata,categories=cats,markers="dash",stack="close",limits=c(0,3500),mapping=aes(fill=area))
Since plots generated with ggprovenance
are also ggplot objects, ggsave()
from package ggplot2
can be utilised for saving in a wide range of file formats.
ggsave("~/test.pdf",width=10,height=8)
Adapt the output path, file format (indicated simply by the file extension) and sizes to your needs. You can also save a specific plot, if you stored it in a variable earlier:
p1<-plotKDE(agedata) # # ... a lot of clever code here, generating other plots (p2, p3,...) # ggsave(plot=p1,filename="~/test.png",dpi=300,width=10,height=8) # save p2, p3,...
\pagebreak
In the following, the effects of the many different parameters for the plotting functions are illustrated by example plots. The examples assume agedata
loaded as described previously. Many of the parameters can be used together (see examples at the end of this section), not all possible combinations can be detailed here.
plotKDE()
The full function call to plotKDE is:
plotKDE<-function(ages,title,limits=c(0,max(unlist(agedata),na.rm=TRUE)), plotonly=names(ages),categories,mapping,breaks=NA, bandwidth=NA,splitat=NA,markers=c("none","dash","circle"), logx=FALSE,histogram=FALSE,binwidth=bandwidth,adaptive=TRUE, stack=c("equal","close","dense"), normalise=c("area","height","none"),lowcount=80,...)
See also the help files for details.
Disclaimer: as of the time of writing (
r Sys.Date()
), not all parameters are functional yet.
The simplest call plots KDEs for all elements of ages
. Each plot is assigned an individual colour, the default age range spans from 0 to the oldest age data, bandwidth is chosen automatically.
plotKDE(agedata)
The plotonly
parameter allows to select specific data sets (samples) by their names. The below example has the same effect as subsetting the data provided to the function like in e.g. plotKDE(agedata[["Tb22"]])
or plotKDE(agedata$Tb22)
. Note automatic removal of the legend, as it is obsolete here.
plotKDE(agedata,plotonly=c("Tb22"))
Add histograms. The optional binwidth
parameter sets histogram bin width independently from bandwidth
.
plotKDE(agedata,plotonly=c("Tb22"),hist=TRUE,binwidth=50)
Add data markers. Possible values are "dash"
for little tick marks and "circle"
for semi-transparent circles.
plotKDE(agedata,plotonly=c("Tb22"),markers="dash")
Custom breaks. The breaks
parameter overrides limits
, i.e. if any breaks lie outside the limits, the limits will be expanded accordingly.
plotKDE(agedata,plot=c("Tb22"),breaks=seq(200,2800,200))
Change x- (time-) limits. Leaving out limits will automatically use the range of values contained in data
.
plotKDE(agedata, limits=c(50,600))
Logarithmic x-scale.
plotKDE(agedata,logx=TRUE)
Split plot at a certain age, the two sub-ranges will occupy equal space (useful to emphasise the younger ages).
#plotKDE(agedata,splitat=600)
Splitting can be combined with custom ranges for the two half plots. A limits
parameter of length 4 will override splitat
.
#plotKDE(agedata,limits=c(50,600,1600,2800))
Usually, an "optimal bandwidth" is calculated automatically for each data series, and the median of all optimal bandwidths for all data series is set equally for all KDEs. It can also be set manually.
plotKDE(agedata, bandwidth=15)
To plot each KDE with it's own optimal bandwidth, set bandwidth=-1
. Note that this sometimes leads to unexpected results, i.e. severe oversmoothing in data sets with very few data or very regular data distribution. However, this gives a good impression on the statistical significance of age peaks. If a peak does not show up unless a small bandwidth is set manually, there is probably not enough data to support this age population in the first place.
plotKDE(agedata,bandwidth=-1)
The categories
parameter allows the data to be classified based on the individual values encountered in the supplied data.frame. categories
should have as many lines as entries in ages
, and the row.names
should be the same as names(ages)
. We can use the cats
data.frame created in Basic workflow. In parameter mapping
, we can then provide an aesthetic mapping for any variable within categories
. plotKDE()
understands fill, colour, size and linetype aesthetics, that work much like aesthetics in ggplot2
.
plotKDE(agedata,categories=cats, mapping=aes(fill=area))
It is often easier to prepare the categories as a separate table/file, and load them from there. For finding the optimal data visualisation, this table can then easily adapted and reloaded until the desired plot is produced:
# load categories: cats<-read.table(file=system.file("extdata", "categories.csv", package="ggprovenance"), header=TRUE,row.names=1,sep=",",stringsAsFactors=FALSE) plotKDE(agedata,categories=cats,mapping=aes(fill=area,linetype=type), limits=c(0,1200),stack="close",bandwidth=-1) # adapt categories.csv to needs, re-run, repeat...
order
title
Nice combined colour plot
t.b.a.
Custom breaks on a log scale
t.b.a.
Publication-quality pure black & white plot
plotKDE(agedata,aes(fill="black"))+theme(panel.background=element_blank(), panel.grid=element_blank(),axis.ticks=element_line(colour="black"), axis.text=element_text(size=rel(0.8), colour="black"))
Add annotation to plots
g<-plotKDE(agedata,bandwidth=-1,logx=TRUE,limits=c(50,3200), plotonly=c("Tb38","Tb35","Tb50","Tb22"),normalise="height") g+annotate("rect",xmin=c(90,260,385),xmax=c(125,320,490),ymin=0,ymax=1.1, fill=c("#FF000022","#00FF0022","#0000FF22"),colour="black")
plotMDS()
plotMDS(mds, diss, col="", sym="", nearest=TRUE, labels=TRUE, symbols=TRUE, fcolour=NA, stretch=FALSE)
plotShepard()
plotShepard(mds, diss, xlab="dissimilarity", ylab="distance", title="")
plotDendrogram()
plotTernary()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.