Home

/

GitHub

/

In adnaniazi/nanoporePractical: Interactive squiggle browser for Oxford Nanopore data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
options(tibble.print_min = 5, tibble.print_max = 5)

tailfindr

What is tailfindr?

tailfindr is a R package for estimating poly(A)-tail lengths in Oxford Nanopore reads.

Features of tailfindr

Works for both RNA and DNA reads. In the case of DNA reads, it estimates both poly(A)- and poly(T)-tail lengths.
Supports data that has been basecalled with Albacore or Guppy. It also support data that has been basecalled using the newer 'flipflop' model.
Can work on single or multi-fast5 file reads.

tailfindr has been developed at Valen Lab in Computational Biology Unit at the University of Bergen, Norway.

Installation

Step 1. Installing HDF5 library

tailfindr depends on the HDF5 library for reading Fast5 files. For OS X and Linux, the HDF5 library needs to be installed via one of the (shell) commands specified below:

| System | Command |:------------------------------------------|:---------------------------------| |OS X (using Homebrew) | brew install hdf5 |Debian-based systems (including Ubuntu)| sudo apt-get install libhdf5-dev |Systems supporting yum and RPMs | sudo yum install hdf5-devel

HDF5 1.8.14 has been pre-compiled for Windows and is available here — thus no manual installation is required.

Step 2. Installing devtools

Currently, tailfindr is not listed on CRAN/Bioconductor, so you need to install it using devtools. To install devtools use the following command:

install.packages("devtools")

Step 3. Installing tailfindr

If you want to install tailfindr without building the vignette, then run the command below: ````r devtools::install_github("adnaniazi/tailfindr")

If you also want to build the vignette while installing tailfindr, then run the command below:
```r
remotes::install_github('adnaniazi/tailfindr', build = TRUE, build_opts = c("--no-resave-data", "--no-manual"), force = TRUE)

Now you are ready to use tailfindr.

Usage

1. Minimal working example

find_tails() is the main function that you can use to find tail lengths in both RNA and DNA reads. It saves a CSV file containing all the tail-length data. Furthermore, it also returns the same data as a tibble.

Give below is a minimal use case in which we will run tailfindr on example RNA reads present in the tailfindr package.

library(tailfindr)
df <- find_tails(fast5_dir = system.file('extdata', 'rna', package = 'tailfindr'),
                 save_dir = '~/Downloads',
                 csv_filename = 'rna_tails.csv',
                 num_cores = 2)

In the above example, tailfindr returns a tibble containing the tail data which is then stored in the variable df. tailfindr also savs this dataframe as a csv file (rna_tails.csv) in the user-specified save_dir, which in this case is set to ~/Downloads. A logfile is also saved in the save_dir. The parameter num_cores can be increased depending on the number of physical cores at your disposal.

2. Plotting the tail

Additionally, tailfindr allows you to generate plots that show the tail location in the raw squiggle. You can save these plots as interactive .html files by using 'rbokeh' as the plotting_library. You can zoom in on the tail region in the squiggle and see the exact location of the tail.

Give below is a minimal use case in which we will run tailfindr on example cDNA reads present in the tailfindr package, and also save the plots:

df <- find_tails(fast5_dir = fast5_dir <- system.file('extdata', 'cdna', package = 'tailfindr'),
                 save_dir = '~/Downloads',
                 csv_filename = 'cdna_tails.csv',
                 num_cores = 2,
                 save_plots = TRUE,
                 plotting_library = 'rbokeh')

Poly(T) read squiggle plot

However, note that generating plots can slow down the performace of tailfindr. We recommend that you generate these plots only for a small subset of your reads.

3. Plotting the tail and debug traces

tailfindr can plot additional information that it used while deriving the tail boundaries. Please read our preprint to learn how tailfindr works. To plot this information, set the plot_debug_traces parameter to TRUE.

df <- find_tails(fast5_dir = fast5_dir <- system.file('extdata', 'cdna', package = 'tailfindr'),
                 save_dir = '~/Downloads',
                 csv_filename = 'cdna_tails.csv',
                 num_cores = 2,
                 save_plots = TRUE,
                 plot_debug_traces = TRUE,
                 plotting_library = 'rbokeh')

Poly(A) read squiggle plot

There are more options available in the find_tails() function. Please see its documentation.

Description of the CSV/Dataframe columns

tailfindr returns tail data in a dataframe and also saves this information in a user-specified CSV file. The columns generated depend on the whether tailfindr was run on RNA or DNA data. Below is a description of columns for both thses scenarios:

When input data is RNA

| Column Names | Datatype | Description | |:---------------|:----------|:-----------------------------------------------------------------------------------------------------------| | read_id | character | Read ID as given in the Fast5 file | | tail_start | numeric | Sample index of start site of the tail in raw data | | tail_end | numeric | Sample index of end site of the tail in raw data | | samples_per_nt | numeric | Read rate in terms of samples per nucleotide | | tail_length | numeric | Tail length in nucleotides. It is the difference between tail_end and tail_start divided by samples_per_nt | | file_path | character | Absolute path of the Fast5 file |

When input data is DNA

Here are the columns that you will get from tailfindr if you have run it on DNA data:

| Column Names | Datatype | Description | |----------------|------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | read_id | character | Read ID as given in the Fast5 file | | read_type | character factor | Whether a read is "polyA", "polyT", or "invalid". Invalid reads are those in which tailfindr wasn't able to find Nanopore primers with high confidence. | | tail_is_valid | logical | Whether a poly(A) tail is a full-length read or not. This is important because a poly(A) tail is at the end of the read, and premature termination of reads is prevelant in cDNA. | | tail_start | numeric | Sample index of start site of the tail in raw data | | tail_end | numeric | Sample index of end site of the tail in raw data | | samples_per_nt | numeric | Read rate in terms of samples per nucleotide | | tail_length | numeric | Tail length in nucleotides. It is the difference between tail_end and tail_start divided by samples_per_nt | | file_path | character | Absolute path of the Fast5 file |

The devil👹 in the details

tailfindr currently works on data in the /Analyses/Basecall_1D_000/BaseCalled_template/ path of the Fast5 file data hierarchy. It won't work on data present in, lets say, /Analyses/Basecall_1D_001/BaseCalled_template/ path or /Analyses/Basecall_1D_002/BaseCalled_template/ path; such paths are generated if you re-basecall already-basecalled data. To avoid this problem, use tailfindr on files that have been basecalled from the raw Fast5 files.
For DNA data, tailfindr decides whether a read is poly(A) or poly(T) based on finding Nanopore primers/adaptors. If you are using the flipflop model to basecall DNA data, please ensure that the nanopore adaptors are not trimmed off while basecalling. This can be done by turning off enabling_trimming option in the basecalling script. The script below shows you how we have basecalled our reads using the flipflop model

#!/bin/sh
INPUT=/raw/fast5/files/path/
OUTPUT=/output/folder/path/
guppy_basecaller \
    --config dna_r9.4.1_450bps_flipflop.cfg \
    --input $INPUT \
    --save_path $OUTPUT \
    --recursive \
    --fast5_out \
    --hp_correct 1 \
    --disable_pings 1 \
    --enable_trimming 0

Getting help

If you encounter a clear bug, please file a minimal reproducible example on github. For questions and other discussion, email me at adnan.niazi@uib.no.

License

And of course:

GPL-3: https://www.gnu.org/licenses/gpl-3.0.en.html

adnaniazi/nanoporePractical documentation built on May 14, 2019, 3:05 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

adnaniazi/nanoporePractical
Interactive squiggle browser for Oxford Nanopore data

In adnaniazi/nanoporePractical: Interactive squiggle browser for Oxford Nanopore data

tailfindr

What is tailfindr?

Features of tailfindr

Installation

Step 1. Installing HDF5 library

Step 2. Installing devtools

Step 3. Installing tailfindr

Usage

1. Minimal working example

2. Plotting the tail

3. Plotting the tail and debug traces

Description of the CSV/Dataframe columns

When input data is RNA

When input data is DNA

The devil👹 in the details

Getting help

License

R Package Documentation

Browse R Packages

We want your feedback!

adnaniazi/nanoporePractical Interactive squiggle browser for Oxford Nanopore data

In adnaniazi/nanoporePractical: Interactive squiggle browser for Oxford Nanopore data

tailfindr

What is tailfindr?

Features of tailfindr

Installation

Step 1. Installing HDF5 library

Step 2. Installing devtools

Step 3. Installing tailfindr

Usage

1. Minimal working example

2. Plotting the tail

3. Plotting the tail and debug traces

Description of the CSV/Dataframe columns

When input data is RNA

When input data is DNA

The devil👹 in the details

Getting help

License

R Package Documentation

Browse R Packages

We want your feedback!

adnaniazi/nanoporePractical
Interactive squiggle browser for Oxford Nanopore data