Maximilian Krause, Adnan M. Niazi, Kornel Labun, Yamila N. Torres Cleuren, Florian S. Müller, Eivind Valen
This repository is roughly organized as an R package – but is not an R package per se – providing functions and the raw data to reproduce and extend the analyses reported in the publication. By raw data, we mean the output of tools such as tailfindr and Nanopolish etc.
This project is setup with a drake workflow, ensuring reproducibility. Intermediate targets/objects will be stored in a hidden .drake directory.
The R library of this project is managed by packrat. This makes sure that the exact same package versions are used when recreating the project.
Please note that this project was built with R version 3.6.0 on a MAC OSx Mojave operating system. The packrat packages from this project are not compatible with R versions prior version 3.6.0 (In general, it should be possible to reproduce the analysis on any other operating system.)
Before starting, please ensure that you have:
A working installation of git
R (version 3.6.0 or above)
A working installation of pandoc. You can install it using instructions here.
To clone the project, open a terminal in the directory of your choice and execute:
git clone https://github.com/adnaniazi/krauseNiazi2019Analyses.git
Then go into the krauseNiazi2019Analyses
directory using:
cd krauseNiazi2019Analyses
Now start R in this location in the terminal:
R
Now in R console, type:
# restore all R packages with their specific version (won't run in R < 3.6.0)
packrat::restore()
Next execute:
drake::r_make() # recreates the analysis
This command will do a series of steps:
It will download outputs of tailfindr, Nanopolish, barcode
assignment, eGFP alignment results for DNA and RNA data (both us and
Workman et al.’s) as .csv
files in the data
folder. This step
may take some time as these files are large. All the scripts that
generated these csv
files are present in the scripts
folder. You
can use these scripts manually yourself if you want to start working
your way up from Fast5 files. However, for the sake of ease and
saving time, we have already generated the results of these scripts
and will download these pre-computed results to the data
directory. The data directory has a README file containing detailed
information about each file and their respective columns.
Once all csv files are downloaded, they are consolidated into
dataframes. The code that does this is located in the R
directory.
This step results in three dataframes: rna_kr_data
, dna_kr_data
,
rna_wo_data
corresponding to RNA data of Krause/Niazi et al, DNA
data of Krause/Niazi et al, and RNA data of Workman et
al. respectively. You can access these datasets manually – if you
wish so – by using drake’s loadd
command.
Using rna_kr_data
, dna_kr_data
, rna_wo_data
datasets, three R
Markdown files (krause_niazi_et_al_rna_analysis.Rmd
,
krause_niazi_et_al_dna_analysis.Rmd
,
workman_et_al_rna_analysis.Rmd
) located in the reports
directory
are knit. These R Makrdown files contain the code for all the
figures used in the manuscript. The html outputs of these R Markdown
files are generated in the reports
directory. Go to report
directory and open these html files to view the rendered report.
If you want to extend the analysis, then open the R Markdown file, edit
it, and re-knit it in RStudio. You will need to open
krauseNiazi2019Analyses
directory as a project in R-studio. The
knitting should work – provided steps 1 and 2 have been executed without
any errors. Alternatively, you can also run drake::r_make()
, and it
will automatically run anything that has changed downstream of whatever
you changed.
Contains helper functions for downloading the data and consolidating them.
Contains calls to helper functions in the R
directory.
Contains all the data generated by tailfindr, Nanopolish etc as csv
files. These file are downloaded once drake::r_make()
is run as
mentioned above.`
Contains scripts that generated the data in the data
directory. These
scripts are not run at any point in the analyses done here; they have
been provided only for reference.
Contains R Markdown files and their knitted html versions.
Contains documentation of functions in R
directory.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.