amplicanPipeline: Wraps main package functionality into one function.

amplicanPipelineR Documentation

Wraps main package functionality into one function.

Description

amplicanPipeline is convenient wrapper around all functionality of the package with the most robust settings. It will generate all results in the result_folder and also knit prepared reports into 'reports' folder.

Usage

amplicanPipeline(
  config,
  fastq_folder,
  results_folder,
  knit_reports = TRUE,
  write_alignments_format = "None",
  average_quality = 30,
  min_quality = 0,
  filter_n = FALSE,
  batch_size = 1e+07,
  use_parallel = FALSE,
  scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4,
    baseOnly = FALSE, type = "DNA"),
  gap_opening = 25,
  gap_extension = 0,
  fastqfiles = 0.5,
  primer_mismatch = 2,
  donor_mismatch = 3,
  donor_strict = FALSE,
  PRIMER_DIMER = 30,
  event_filter = TRUE,
  cut_buffer = 5,
  promiscuous_consensus = TRUE,
  normalize = c("guideRNA", "Group"),
  min_freq = min_freq_default,
  continue = TRUE
)

Arguments

config

(string) The path to your configuration file. For example: system.file("extdata", "config.txt", package = "amplican"). Configuration file can contain additional columns, but first 11 columns have to follow the example config specification.

fastq_folder

(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file.

results_folder

(string) Where do you want to store results? The package will create files in that folder so make sure you have writing permissions.

knit_reports

(boolean) whether function should "knit" all reports automatically for you (it is time consuming, be patient), when false reports will be prepared, but not knitted

write_alignments_format

(character vector) Whether amplicanPipeline should write alignments results to separate files. Alignments are also always saved as .rds object of AlignmentsExperimentSet class. Possible options are:

  • "fasta" outputs alignments in fasta format where header indicates experiment ID, read id and number of reads

  • "txt" simple format, read information followed by forward read and amplicon sequence followed by reverse read with its amplicon sequence eg.:

    ID: ID_1 Count: 7
    ACTGAAAAA--------
    ACTG-----ACTGACTG
    
    ------G-ACTG
    ACTGACTGACTG
    
  • "None" Don't write any alignments to files.

  • c("fasta", "txt") There are also possible combinations of above formats, pass a vector to get alignments in multiple formats.

average_quality

(numeric) The FASTQ file have a quality for each nucleotide, depending on sequencing technology there exist many formats. This package uses readFastq to parse the reads. If the average quality of the reads fall below value of average_quality then sequence is filtered. Default is 0.

min_quality

(numeric) Similar as in average_quality, but depicts the minimum quality for ALL nucleotides in given read. If one of nucleotides has quality BELLOW min_quality, then the sequence is filtered. Default is 20.

filter_n

(boolean) Whether to filter out reads containing N base.

batch_size

(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files.

use_parallel

(boolean) Set to TRUE, if you have registered multicore back-end.

scoring_matrix

(matrix) Default is 'NUC44'. Pass desired matrix using nucleotideSubstitutionMatrix.

gap_opening

(numeric) The opening gap score.

gap_extension

(numeric) The gap extension score.

fastqfiles

(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options:

  • 0 Use both FASTQ files.

  • 0.5 Use both FASTQ files, but only for one of the reads (forward or reverse) is required to have primer perfectly matched to sequence - eg. use when reverse reads are trimmed of primers, but forward reads have forward primer in the sequence.

  • 1 Use only the forward FASTQ file.

  • 2 Use only the reverse FASTQ file.

primer_mismatch

(numeric) Decide how many mismatches are allowed during primer matching of the reads, that groups reads by experiments. When primer_mismatch = 0 no mismatches are allowed, which can increase number of unasssigned read.

donor_mismatch

(numeric) How many events of length 1 (mismatches, deletions and insertions of length 1) are allowed when aligning toward the donor template. This parameter is only used when donor template is specified. The higher the parameter the less strict will be algorithm accepting read as HDR. Set to 0 if only perfect alignments to the donor template marked as HDR, unadvised due to error rate of the sequencers.

donor_strict

(logical) Applies more strict algorithm for HDR detection. Only these reads that have all of the donor events will count as HDR. Tolerates 'donor_mismatch' level of noise, but no indels are allowed. Use this when your reads should span over the whole window of the donor events. Might be more time consuming.

PRIMER_DIMER

(numeric) Value specifying buffer for PRIMER DIMER detection. For a given read it will be recognized as PRIMER DIMER when alignment will introduce gap of size bigger than:
length of amplicon - (lengths of PRIMERS + PRIMER_DIMER value)

event_filter

(logical) Whether detection of offtarget reads, should be enabled.

cut_buffer

The number of bases by which extend expected cut sites (specified as UPPER case letters in the amplicon) in 5' and 3' directions.

promiscuous_consensus

(boolean) Whether rules of amplicanConsensus should be promiscuous. When promiscuous, we allow indels that have no confirmation on the other strand.

normalize

(character vector) If column 'Control' in config table has all FALSE/0 values then normalization is skipped. Otherwise, normalization is strict, which means events that are found in 'Control' TRUE group will be removed in 'Control' FALSE group. This parameter by default uses columns 'guideRNA' and 'Group' to impose additional restrictions on normalized events eg. only events created by the same 'guideRNA' in the same 'Group' will be normalized.

min_freq

(numeric) All events below this frequency are treated as sequencing errors and rejected. This parameter is used during normalization through amplicanNormalize.

continue

(boolean) Default TRUE, decides whether to continue failed ampliCan runs. In case of FALSE, all contents in 'results' folder will be removed.

Value

(invisible) results_folder path

See Also

Other analysis steps: amplicanAlign(), amplicanConsensus(), amplicanFilter(), amplicanMap(), amplicanNormalize(), amplicanOverlap(), amplicanPipelineConservative(), amplicanReport(), amplicanSummarize()

Examples

# path to example config file
config <- system.file("extdata", "config.csv", package = "amplican")
# path to example fastq files
fastq_folder <- system.file("extdata", package = "amplican")
# output folder
results_folder <- tempdir()

#full analysis, not knitting files automatically
amplicanPipeline(config, fastq_folder, results_folder, knit_reports = FALSE)


valenlab/amplican documentation built on Jan. 28, 2024, 5:10 a.m.