estimateOffTargets: Estimate primer off-targets using BLAST

estimateOffTargetsR Documentation

Estimate primer off-targets using BLAST

Description

Functions to use BLAST to align TAP-seq primers against a genome and chromosome reference to estimate potential off-target binding sites.

Usage

createBLASTDb(
  genome,
  annot,
  blastdb,
  standard_chromosomes = TRUE,
  tx_id = "transcript_id",
  tx_name = "transcript_name",
  gene_name = "gene_name",
  gene_id = "gene_id",
  title = "TAP-seq_GT_DB",
  verbose = FALSE,
  makeblastdb = getOption("TAPseq.makeblastdb")
)

blastPrimers(
  object,
  blastdb,
  max_mismatch = 0,
  min_aligned = 0.75,
  primer_targets = c("transcript_id", "transcript_name", "gene_id", "gene_name"),
  tmpdir = tempdir(),
  blastn = getOption("TAPseq.blastn")
)

## S4 method for signature 'TsIO'
blastPrimers(
  object,
  blastdb,
  max_mismatch = 0,
  min_aligned = 0.75,
  primer_targets = c("transcript_id", "transcript_name", "gene_id", "gene_name"),
  tmpdir = tempdir(),
  blastn = getOption("TAPseq.blastn")
)

## S4 method for signature 'TsIOList'
blastPrimers(
  object,
  blastdb,
  max_mismatch = 0,
  min_aligned = 0.75,
  primer_targets = c("transcript_id", "transcript_name", "gene_id", "gene_name"),
  tmpdir = tempdir(),
  blastn = getOption("TAPseq.blastn")
)

Arguments

genome

A BSgenome (or DNAStringSet) object containing the sequences of all chromosomes to obtain genome and transcript sequences.

annot

A GRanges object containing all exons of transcripts to be considered.

blastdb

TAP-seq BLAST database created with createBLASTDb.

standard_chromosomes

(logical) Specifies whether only standard chromosomes should be included in output genome sequences (e.g. chr1-22, chrX, chrY, chrM for homo sapiens).

tx_id, tx_name, gene_name, gene_id

(character) Column names in annot metadata containing transcript id, transcript name, gene name and gene id information.

title

Optional title for BLAST database.

verbose

(logical) If TRUE, additional information from makeblastdb is printed to the console. Default: FALSE.

makeblastdb

Path to the makeblastdb executable. Usually this is inferred when loading/attaching the package.

object

A TsIO or TsIOList object containing designed primers.

max_mismatch

Maximum number of mismatches allowed for off-target hits (default: 0).

min_aligned

Minimum portion of the primer sequence starting from the 3' end that must align for off-target hits (default: 0.75).

primer_targets

Specifies what should be used to identify primer targets for off-target identification. I.e. to what does the sequence_id in TsIO objects refer? Can be a subset of transcript_id, transcript_name, gene_id or gene_name. By default all 4 are checked. Set to NULL to disable any off-target identification. See Details for more information.

tmpdir

Directory needed to store temporary files.

blastn

Path (character) to the blastn executable. Usually this is inferred when loading/attaching the package.

Details

createBLASTDb creates a BLAST database containing genome and transcriptome sequences, which is required by blastPrimers. The created database contains both sequence files for BLAST and annotations to process the results.

Use blastPrimers to align designed TAP-seq primers against the created database to estimate off-target priming potential. Only hits where at least a specified portion of the sequence involving the 3' end of the primer aligns with not more than a certain number of mismatches are considered.

blastPrimers counts the number of genes in which a primer has 1) exonic hits or 2) intronic hits, or 3) the number of hits in intergenic regions of the genome. The exonic and intronic counts should be interptreted as: "In how many genes does a primer have exonic (or intronic) hits?".

If a BLAST hit falls in both intronic and exonic regions of a given gene (i.e. exonic for one transcript, intronic for another transcript), only the exonic hit is counted for that gene. If a primer has for instance 3 BLAST hits in one gene, 2 exonic and 1 intronic, then one exonic hit and one intronic hit is counted for that gene.

If sequence IDs of the designed primers (sequence_id) refer to the target gene/transcripts and can be found in the BLAST database annotations via primer_targets, then only off-target hits are counted. This is usually the case if input for primer design was produced from target gene annotations.

Value

For createBLASTDb a directory containing the BLAST database. For blastPrimers a TsIO or TsIOList object with the number of potential off-targets added to the TAP-seq primer metadata.

Functions

  • createBLASTDb(): Create a genome and transcriptome TAP-seq BLAST database

  • blastPrimers(TsIO): BLAST primers in a TsIO object

  • blastPrimers(TsIOList): BLAST primers in a TsIOList object

Examples

## Not run: 
library(BSgenome)

# human genome (hg38) BSgenome object
hg38 <- getBSgenome("BSgenome.Hsapiens.UCSC.hg38")

# get annotations for BLAST
annot_url <- paste0("ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/",
                    "gencode.v32.annotation.gtf.gz")
annot <- import(annot_url, format = "gtf")
blast_exons <- annot[annot$type == "exon" & annot$gene_type == "protein_coding"]

# build BLAST database
blastdb <- file.path(tempdir(), "blastdb")
createBLASTDb(genome = hg38, annot = blast_exons, blastdb = blastdb)

# chr11 primers example data (already contains off-targets, but we can overwrite them)
data("chr11_primers")
chr11_primers <- chr11_primers[1:3]  # only use a small subset for this example

# run blast to identify potential off-targets
chr11_primers <- blastPrimers(chr11_primers, blastdb = blastdb)
tapseq_primers(chr11_primers)

# allow 1 mismatch between primer and off-target
chr11_primers <- blastPrimers(chr11_primers, blastdb = blastdb, max_mismatch = 1)
tapseq_primers(chr11_primers)

## End(Not run)

argschwind/TAPseq documentation built on Feb. 9, 2024, 8:20 p.m.