knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE )
library(brentlabRnaSeqTools)
The 2016 grant summary is stored in a variable, grant_df
which is available when you execute library(brentlabRnaSeqTools)
In general, the first step to creating an experimental set is to ensure that all 'empty' values in the data are the same. This may be accomplished by setting any empty strings to NA like so (using dplyr):
metadata = metadata %>% # replace empty strings with NA mutate_if(is.character, list(~na_if(.,"")))
Once this is accomplished, you may easily replace the NAs with a specific entry. An example of this will be in the text of the createNinetyMinuteInductionSet()
function.
Filtering with dplyr is quite simple. Extensive documentation may be found here:
https://dplyr.tidyverse.org/
And you may also use the functions listed below as templates.
One handy trick: use the %in% operator to filter on multiple values in a given field. For example:
metadata %>% filter(temperature %in% c(37, 30))
would return any record with temperature either 37 or 30.
Another trick is to use grepl to search for a pattern in a field. For example:
combined_df %>% filter(grepl("PLAG", s2cDNAPreparer))
which may simiarly be achived with
combined_df %>% filter(str_detect(s2cDNAPreparer, "PLAG"))
The definition of the 90 minute induction set is:
createNinetyMinuteInductionSet
This includes all double deletions of genotypes which are included in the 2016 grant summary
createNinetyMinuteInductionWithDoubles
To create a set with all doubles in the database, simply filter for samples which have a double deletion. For example:
# see database_interaction for instructions on creating the combined_df combined_df %>% filter(perturbation1=='deletion' & perturbation2=='deletion') # at this point, you may choose to filter further. For example, you may use the same conditions as the 90minuteInduction # see createNinetyMinuteInductionSet for those conditions
This is the set of single perturbations which are not included in the 2016 grant summary
# get wildtypes from the createNinetyMinuteInduction set grant_induction_set = createNinetyMinuteInductionSet(combined_df, grant_df) induction_set_wt = grant_induction_set %>% filter(genotype1 == 'CNAG_00000') # note: this isn't as explicit as the createNinetyMinuteInductionSet definition. It likely should be, which would mean replacing empty strings with a definite value, and then filtering exlicitly in those columns. This prevents any strange behavior with empty strings and NA regulators_not_in_2016_grant = combined_df %>% filter(purpose == "fullRNASeq", !genotype1 %in% grant_df$GENOTYPE1, perturbation1 == 'deletion', !fastqFileName %in% grant_induction_set, medium %in% c("DMEM"), temperature %in% c(37), atmosphere %in% c("CO2"), timePoint == 90, !is.na(fastqFileName)) other_regulators_set = bind_rows(induction_set_wt, regulators_not_in_2016_grant)
Extracting overexpressions is similar
The definition of the Environmental Perturbation set is:
createEnvPertSet
To filter out all samples which fail auto audit or manual audit (but retaining those which are manual pass regardless of auto audit status), use the following function:
qual_assess_1_passing = qualityAssessmentFilter(metadata)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.