library(unpakathon)
The phenotypic data frame(s) already have a large amount of information about things that might affect the response of Columbia to having a gene knocked out. A lot of these describe differences in the environment. For example 'institution', 'facility', 'treatment', and 'block' fall into this category. In addition genetic differences and similarities can be assessed by examining the response of different accessions to the same environment.
Here is an example of the differences that might occur among environments and different lines.
Lines connect the same line in different growth-chamber/greenhouse runs
plotdf <- phenolong%>%filter(variable %in% c("fruitnum"))%>% group_by(experiment, facility, variable, accession) %>% summarise(value=mean(value)) %>% mutate(salk=grepl("SALK",accession), env=paste(experiment,facility,sep="-")) plotdf2 <- data.frame(left_join(plotdf,plotdf %>% group_by(env,variable) %>% summarise(envmean=mean(value))) ) #sometimes dplyr needs reminding that it works with data.frames plotdf2$env = as.factor(plotdf2$env) plotdf2$env = reorder(plotdf2$env,plotdf2$envmean,mean,na.rm=T) ggplot(plotdf2,aes(x=env,y=value,color=salk,group=accession)) + geom_line() + scale_y_continuous(trans="log1p") + scale_x_discrete() + ylab("fruitnum")+xlab("environment (experiment/growthchamber combinations)") + theme(axis.text.x = element_text(angle = 90, hjust = 1, size=rel(0.75)))
Environment (but look at all the crossing lines!) sure plays a huge role in trait value...
There are a number of dataframes that comprise the majority of unpakathon. These are (not necessarily exhaustively):
phenolong, independent, geneont, allfeat, allmeth, SalkPos, SalkInsert, and tdna
Each of these dataframes can be examined by (for example the dataframe 'independent'):
head(independent)
So if one wanted to examine the effects of age of gene on fruitnum resulting from knocking out genes, it would look like this:
plotdf <- phenolong %>% filter(variable %in% "fruitnum") %>% phytcorrect(classifier = c("facility","experiment")) %>% scalePhenos(classifier = c("facility","experiment")) %>% left_join(independent) %>% group_by(ConservedGroup,variable,accession) %>% summarise(value=mean(value)) %>% spread() ggplot(plotdf,aes(x=ConservedGroup,y=fruitnum)) + geom_boxplot() table(plotdf$ConservedGroup) #but small samples in the left hand boxes
Say you wanted to look at how insertion location might influence phenotype. You could merge in the 'independent' table and look at the result
newdf <- left_join(phenolong,independent) %>% filter(variable=="fruitnum") ggplot(data=newdf, aes(x=InsertLocation, y=log(value+1)))+geom_boxplot()
or you could look at expression at flower stage 9
ggplot(data=newdf, aes(x=AverageExpressFlowerStage9, y=log(value+1)))+geom_point()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.