The unpakathonJan2016 package is primarily a datafreeze of the unpak database at the end of Jan 2016.
This vignette shows how to interact with these data, once the package is installed and loaded into the current R session
load the library
library(unpakathonJan2016)
There are several tables available, each with some sort of help description that you can access by typing ?
Here are the tables at the moment and a list of their current column names
#phenolong: long format for all the phenotypic data names(phenolong) #phenowide: wide (typical) format for phenotypic data in phenolong names(phenowide) #independent: a table of indepenent data, mostly gene expression and protein expression #indexed by both SALK_Line and Gene names(independent) ##tdna: tdna data we know what it is. indexed by accession names(tdna) #geneont The tair10 gene ontology indexed by gene names(geneont)
The experiment table is sort of "subexperiment-oriented". the database now provides a way to lump all the data associated with experiments 1a1, 1a2, 1a3, etc into a meta.experiment called "1". This provides a quick way to do subsetting of the huge data set:
unique(phenolong[,c("experiment","meta.experiment")])
So, to subset the data to exclude farms one could do a couple of easy lines of code. I'm using dplyr throughout on this document, btw.
library(dplyr) unique(phenolong$experiment) nofarm.long <- phenolong %>% filter(meta.experiment!='farm') unique(nofarm.long$experiment)
Or maybe you only want to look at the distribution of individual fruitnums for experiment1
exp1 <- phenolong %>% filter(meta.experiment=="1") %>% filter(variable=="fruitnum") library(ggplot2) #might as well use ggplot to start with ggplot(data=exp1, aes(value, fill=experiment)) + geom_histogram(binwidth=50)
Here's the same exercise but getting averages for every replicate within a growth chamber within an experiment
exp1mn <- phenolong %>% filter(meta.experiment=="1") %>% filter(variable=="fruitnum") %>% group_by(experiment,accession) %>% summarise(value=mean(value,na.rm=T)) ggplot(data=exp1mn, aes(value, fill=experiment)) + geom_histogram(binwidth=50)
I've done all of the above with the long format, but similar approaches could be used with the wide format in phenowide
For example, it is arguably easier to compare phenotypes when they are in the wide format
exp1 <- phenowide %>% filter(meta.experiment=="1") ggplot(data=exp1,aes(x=days.to.bolt,y=fruitnum))+geom_point()
might be nice to see whats with that point at 765 fruits
The information in the other tables can be compared to phenotypes through merging.
Say you wanted to look at how insertion location might influence phenotype. You could merge in the 'independent' table and look at the result
newdf <- left_join(phenolong,independent) %>% filter(variable=="fruitnum") ggplot(data=newdf, aes(x=InsertLocation, y=log(value+1)))+geom_boxplot()
or you could look at expression at flower stage 9
ggplot(data=newdf, aes(x=AverageExpressFlowerStage9, y=log(value+1)))+geom_point()
Everything so far has been performed on uncorrected phenotypes.
We can correct phenotypes based on the Treatment, Experiment, and Facility combinations
Here's an example correcting for phytometers
library(reshape) plotdf = phenolong %>% filter(variable %in% c("fruitnum","days.to.bolt")) %>% cast(fun.aggregate=mean) plotadj <- phytcorrect(phenolong,c("fruitnum","days.to.bolt"),c("plantID","treatment","facility","experiment"),lineid="accession") %>% cast(fun.aggregate=mean) plot(fruitnum~days.to.bolt,data=plotdf,main="unadjusted") plot(fruitnum~days.to.bolt,data=plotadj,main="adjusted")
And here correcting by all plant means
plotadj <- allcorrect(phenolong,c("fruitnum","days.to.bolt"),c("plantID","treatment","facility","experiment"),lineid="accession") %>% cast(fun.aggregate=mean) plot(fruitnum~days.to.bolt,data=plotdf,main="unadjusted") plot(fruitnum~days.to.bolt,data=plotadj,main="adjusted")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.