library(unpakathon)

Classification information (and possible independent variables)

built right into the phenotypic dataframes

The phenotypic data frame(s) already have a large amount of information about things that might affect the response of Columbia to having a gene knocked out. A lot of these describe differences in the environment. For example 'institution', 'facility', 'treatment', and 'block' fall into this category. In addition genetic differences and similarities can be assessed by examining the response of different accessions to the same environment.

Here is an example of the differences that might occur among environments and different lines.

Lines connect the same line in different growth-chamber/greenhouse runs

plotdf <- phenolong%>%filter(variable %in% c("fruitnum"))%>%
  group_by(experiment, facility, variable, accession) %>%
  summarise(value=mean(value)) %>% 
  mutate(salk=grepl("SALK",accession), env=paste(experiment,facility,sep="-")) 

plotdf2 <- data.frame(left_join(plotdf,plotdf %>% group_by(env,variable) %>% summarise(envmean=mean(value))) ) #sometimes dplyr needs reminding that it works with data.frames

plotdf2$env = as.factor(plotdf2$env)
plotdf2$env = reorder(plotdf2$env,plotdf2$envmean,mean,na.rm=T)

ggplot(plotdf2,aes(x=env,y=value,color=salk,group=accession)) + geom_line() + 
  scale_y_continuous(trans="log1p") +
  scale_x_discrete() + 
  ylab("fruitnum")+xlab("environment (experiment/growthchamber combinations)") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size=rel(0.75)))

Environment (but look at all the crossing lines!) sure plays a huge role in trait value...

Folding in other types of explanatory variables.

There are a number of dataframes that comprise the majority of unpakathon. These are (not necessarily exhaustively):

phenolong, independent, geneont, allfeat, allmeth, SalkPos, SalkInsert, and tdna

Each of these dataframes can be examined by (for example the dataframe 'independent'):

head(independent)

So if one wanted to examine the effects of age of gene on fruitnum resulting from knocking out genes, it would look like this:

plotdf <- phenolong %>%
  filter(variable %in% "fruitnum") %>%
  phytcorrect(classifier = c("facility","experiment")) %>%
  scalePhenos(classifier = c("facility","experiment")) %>% 
  left_join(independent) %>%
  group_by(ConservedGroup,variable,accession) %>%
  summarise(value=mean(value)) %>% spread()

ggplot(plotdf,aes(x=ConservedGroup,y=fruitnum)) + geom_boxplot()

table(plotdf$ConservedGroup) #but small samples in the left hand boxes

Say you wanted to look at how insertion location might influence phenotype. You could merge in the 'independent' table and look at the result

newdf <- left_join(phenolong,independent) %>% filter(variable=="fruitnum")
ggplot(data=newdf, aes(x=InsertLocation, y=log(value+1)))+geom_boxplot()

or you could look at expression at flower stage 9

ggplot(data=newdf, aes(x=AverageExpressFlowerStage9, y=log(value+1)))+geom_point()


stranda/unpakathon documentation built on Nov. 9, 2021, 7:48 a.m.