find_tails | R Documentation |
This function estimates poly(A) tail length in RNA reads, and both poly(A) and poly(T) tail lengths in DNA reads. It can operate on reads base called with any version of Albacore and Guppy using either the standard or the recent 'flip-flop' model. The function outputs a CSV file containing poly(A) tail information organised by the read ID; it also returns the same information as a tibble for further processing by the end-user. Currently, the algorithm works only on ONT 1D reads.
find_tails(fast5_dir, save_dir, csv_filename = "tails.csv",
num_cores = 1, basecall_group = "Basecall_1D_000",
save_plots = FALSE, plot_debug_traces = FALSE,
plotting_library = "rbokeh", ...)
fast5_dir |
character string. Full path of the directory to search the basecalled fast5 files in. The Fast5 files can be single or multi-fast5 file. The directory is searched recursively. |
save_dir |
character string. Full path of the directory where the CSV
file containing the tail-length information should be stored. If save_plots
is set to |
csv_filename |
character string ["tails.csv"]. Filename of the CSV file in which to store the tail length data |
num_cores |
numeric [1]. Num of physical cores to use in processing the data. Always use 1 less than the number of cores at your disposal. |
basecall_group |
a character string ["Basecall_1D_000"]. Name of the level in the Fast5 file hierarchy from which tailfindr should read the data. |
save_plots |
logical [FALSE]. If set to |
plot_debug_traces |
logical [FALSE]. This option works only
if |
plotting_library |
character string ["rbokeh"]. |
... |
list. A list of optional parameters. This is currently, reserved for internal use only. |
A data tibble containing tail information organzied by the read ID is returned. Always save this returned tibble in a variable (see examples below), otherwise the long tibble will be printed to the console, which may hang up your R session.
A CSV file containing the same information is also saved on disk in the
save_dir
.
## Not run:
library(tailfindr)
# 1. Suppose you have 11 cores at your disposal, then you should run tailfindr
# on your data as following:
df <- find_tails(fast5_dir = system.file('extdata', 'rna', package = 'tailfindr'),
save_dir = '~/Downloads',
csv_filename = 'rna_tails.csv',
num_cores = 10)
# In the above example, we have used tailfindr on example RNA reads
# present in the tailfindr package. You should substitute the path of
# your data for the fast5_dir parameter.
# 2. If you want to save interactive HTML plots using rbokeh,
# then you should run tailfindr as following:
df <- find_tails(fast5_dir = system.file('extdata', 'cdna', package = 'tailfindr'),
save_dir = '~/Downloads',
csv_filename = 'cdna_tails.csv',
num_cores = 10,
save_plots = TRUE,
plotting_library = 'rbokeh')
# 3. If you also want to plot debug traces, then you should run tailfindr as
# below:
df <- find_tails(fast5_dir = system.file('extdata', 'cdna', package = 'tailfindr'),
save_dir = '~/Downloads',
csv_filename = 'cdna_tails.csv',
num_cores = 10,
save_plots = TRUE,
plot_debug_traces = TRUE,
plotting_library = 'rbokeh')
# N.B.: Making and saving plots is a computationally slow process.
# Only generate plots by running tailfindr on a small subset of your reads.
# 4. By default, tailfindr uses Events/Move table in the Basecall_1D_000
# section of the FAST5 file. If you want tailfindr to pick Events/Move table
# from some other section of the FAST5 file -- lets say Basecall_1D_001--
# then you should use tailfindr like below:
df <- find_tails(fast5_dir = system.file('extdata', 'rna_basecall_1D_001', package = 'tailfindr'),
save_dir = '~/Downloads',
csv_filename = 'rna_tails.csv',
num_cores = 2,
basecall_group = 'Basecall_1D_001',
save_plots = TRUE,
plot_debug_traces = TRUE,
plotting_library = 'rbokeh')
# N.B.: tailfindr cannot work if it can't find Events or Move table in
# your FAST5 files. MinKNOW Live Basecalling currently does not save the
# Events/Move table in the FAST5 file. If your reads have been live
# basecalled, then you should rebasecall them using Albacore or Guppy, and
# subsequently use tailfindr and specify the basecall_group parameter. Most
# probably, in the second round of your basecalling, the Events/Move table
# is stored in the 'Basecall_1D_001' section, so set this as the value of the
# basecall_group parameter. You can also confirm this by viewing your
# re-basecalled reads in HDFView.
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.