offTargetAnalysisWithoutBSgenome: Design of target-specific guide RNAs for CRISPR-Cas9 system...

View source: R/offTargetAnalysisWithoutBSgenome.R

offTargetAnalysisWithoutBSgenomeR Documentation

Design of target-specific guide RNAs for CRISPR-Cas9 system in one function without BSgenome

Description

Design of target-specific guide RNAs (gRNAs) for CRISPR-Cas9 system without BSgenome by automatically calling findgRNAs, filtergRNAs, searchHits, buildFeatureVectorForScoring, getOfftargetScore, filterOfftarget, calculating gRNA cleavage efficiency and generate reports.

Usage

offTargetAnalysisWithoutBSgenome(
  inputFilePath,
  format = "fasta",
  header = FALSE,
  gRNAoutputName,
  findgRNAs = TRUE,
  exportAllgRNAs = c("all", "fasta", "genbank", "no"),
  findgRNAsWithREcutOnly = FALSE,
  REpatternFile = system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek"),
  minREpatternSize = 4,
  overlap.gRNA.positions = c(17, 18),
  findPairedgRNAOnly = FALSE,
  annotatePaired = TRUE,
  paired.orientation = c("PAMout", "PAMin"),
  enable.multicore = FALSE,
  n.cores.max = 6,
  min.gap = 0,
  max.gap = 20,
  gRNA.name.prefix = "",
  PAM.size = 3,
  gRNA.size = 20,
  PAM = "NGG",
  BSgenomeName,
  chromToSearch = "all",
  chromToExclude = c("chr17_ctg5_hap1", "chr4_ctg9_hap1", "chr6_apd_hap1",
    "chr6_cox_hap2", "chr6_dbb_hap3", "chr6_mann_hap4", "chr6_mcf_hap5", "chr6_qbl_hap6",
    "chr6_ssto_hap7"),
  max.mismatch = 3,
  PAM.pattern = "NNG$|NGN$",
  allowed.mismatch.PAM = 1,
  gRNA.pattern = "",
  baseEditing = FALSE,
  targetBase = "C",
  editingWindow = 4:8,
  editingWindow.offtargets = 4:8,
  primeEditing = FALSE,
  PBS.length = 13L,
  RT.template.length = 8:28,
  RT.template.pattern = "D$",
  corrected.seq,
  targeted.seq.length.change,
  bp.after.target.end = 15L,
  target.start,
  target.end,
  primeEditingPaired.output = "pairedgRNAsForPE.xls",
  min.score = 0,
  topN = 1000,
  topN.OfftargetTotalScore = 10,
  annotateExon = TRUE,
  txdb,
  orgAnn,
  ignore.strand = TRUE,
  outputDir,
  fetchSequence = TRUE,
  upstream = 200,
  downstream = 200,
  upstream.search = 0,
  downstream.search = 0,
  weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613,
    0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583),
  baseBeforegRNA = 4,
  baseAfterPAM = 3,
  featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package =
    "CRISPRseek"),
  useScore = TRUE,
  useEfficacyFromInputSeq = FALSE,
  outputUniqueREs = TRUE,
  foldgRNAs = FALSE,
 
    gRNA.backbone = "GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU",
  temperature = 37,
  overwrite = FALSE,
  scoring.method = c("Hsu-Zhang", "CFDscore"),
  subPAM.activity = hash(AA = 0, AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG =
    0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA
    = 0, TC = 0, TG = 0.038961039, TT = 0),
  subPAM.position = c(22, 23),
  PAM.location = "3prime",
  rule.set = c("Root_RuleSet1_2014", "Root_RuleSet2_2016", "CRISPRscan", "DeepCpf1"),
  chrom_acc,
  mismatch.activity.file = system.file("extdata",
    "NatureBiot2016SuppTable19DoenchRoot.csv", package = "CRISPRseek"),
  useBSgenome = FALSE,
  genomeSeqFile,
  predIndelFreq = FALSE,
  predictIndelFreq.onTargetOnly = TRUE,
  method.indelFreq = "Lindel",
  baseBeforegRNA.indelFreq = 13,
  baseAfterPAM.indelFreq = 24
)

Arguments

inputFilePath

Sequence input file path or a DNAStringSet object that contains sequences to be searched for potential gRNAs

format

Format of the input file, fasta, fastq and bed are supported, default fasta

header

Indicate whether the input file contains header, default FALSE, only applies to bed format

gRNAoutputName

Specify the name of the gRNA outupt file when inputFilePath is DNAStringSet object instead of file path

findgRNAs

Indicate whether to find gRNAs from the sequences in the input file or skip the step of finding gRNAs, default TRUE. Set it to FALSE if the input file contains user selected gRNAs plus PAM already.

exportAllgRNAs

Indicate whether to output all potential gRNAs to a file in fasta format, genbank format or both. Default to both.

findgRNAsWithREcutOnly

Indicate whether to find gRNAs overlap with restriction enzyme recognition pattern

REpatternFile

File path containing restriction enzyme cut patterns

minREpatternSize

Minimum restriction enzyme recognition pattern length required for the enzyme pattern to be searched for, default 4

overlap.gRNA.positions

The required overlap positions of gRNA and restriction enzyme cut site, default 17 and 18

findPairedgRNAOnly

Choose whether to only search for paired gRNAs in such an orientation that the first one is on minus strand called reverse gRNA and the second one is on plus strand called forward gRNA. TRUE or FALSE, default FALSE

annotatePaired

Indicate whether to output paired information, default TRUE

paired.orientation

PAMin orientation means the two adjacent PAMs on the sense and antisense strands face inwards towards each other like N21GG and CCN21 whereas PAMout orientation means they face away from each other like CCN21 and N21GG

enable.multicore

Indicate whether enable parallel processing, default FALSE. For super long sequences with lots of gRNAs, suggest set it to TRUE

n.cores.max

Indicating maximum number of cores to use in multi core mode, i.e., parallel processing, default 6. Please set it to 1 to disable multicore processing for small dataset.

min.gap

Minimum distance between two oppositely oriented gRNAs to be valid paired gRNAs. Default 0

max.gap

Maximum distance between two oppositely oriented gRNAs to be valid paired gRNAs. Default 20

gRNA.name.prefix

The prefix used when assign name to found gRNAs, default gRNA, short for guided RNA.

PAM.size

PAM length, default 3

gRNA.size

The size of the gRNA, default 20

PAM

PAM sequence after the gRNA, default NGG

BSgenomeName

BSgenome object. Please refer to available.genomes in BSgenome package. For example,

  • BSgenome.Hsapiens.UCSC.hg19 - for hg19,

  • BSgenome.Mmusculus.UCSC.mm10 - for mm10

  • BSgenome.Celegans.UCSC.ce6 - for ce6

  • BSgenome.Rnorvegicus.UCSC.rn5 - for rn5

  • BSgenome.Drerio.UCSC.danRer7 - for Zv9

  • BSgenome.Dmelanogaster.UCSC.dm3 - for dm3

chromToSearch

Specify the chromosome to search, default to all, meaning search all chromosomes. For example, chrX indicates searching for matching in chromosome X only

chromToExclude

Specify the chromosome not to search. If specified as "", meaning to search chromosomes specified by chromToSearch. By default, to exclude haplotype blocks from offtarget search in hg19, i.e., chromToExclude = c("chr17_ctg5_hap1","chr4_ctg9_hap1", "chr6_apd_hap1", "chr6_cox_hap2", "chr6_dbb_hap3", "chr6_mann_hap4", "chr6_mcf_hap5","chr6_qbl_hap6", "chr6_ssto_hap7")

max.mismatch

Maximum mismatch allowed in off target search, default 3. Warning: will be considerably slower if set > 3

PAM.pattern

Regular expression of protospacer-adjacent motif (PAM), default NNG$|NGN$ for spCas9. For cpf1, ^TTTN since it is a 5 prime PAM sequence

allowed.mismatch.PAM

Maximum number of mismatches allowed in the PAM sequence for offtarget search, default to 1 for NNG or NGN PAM pattern for offtarget finding.

gRNA.pattern

Regular expression or IUPAC Extended Genetic Alphabet to represent gRNA pattern, default is no restriction. To specify that the gRNA must start with GG for example, then set it to ^GG. Please see help(translatePattern) for a list of IUPAC Extended Genetic Alphabet.

baseEditing

Indicate whether to design gRNAs for base editing. Default to FALSE If TRUE, please set baseEditing = TRUE, targetBase and editingWidow accordingly.

targetBase

Applicable only when baseEditing is set to TRUE. It is used to indicate the target base for base editing systems, default to C for converting C to T in the CBE system. Please change it to A if you intend to use the ABE system.

editingWindow

Applicable only when baseEditing is set to TRUE. It is used to indicate the effective editing window, default to 4 to 8 which is for the original CBE system. Please change it accordingly if the system you use have a different editing window.

editingWindow.offtargets

Applicable only when baseEditing is set to TRUE. It is used to indicate the effective editing window to consider for the offtargets search only, default to 4 to 8 (1 means the most distal site from the 3' PAM, the most proximla site from the 5' PAM), which is for the original CBE system. Please change it accordingly if the system you use have a different editing window, or you would like to include offtargets with the target base in a larger editing window.

primeEditing

Indicate whether to design gRNAs for prime editing. Default to FALSE. If true, please set PBS.length, RT.template.length, RT.template.pattern, targeted.seq.length.change, bp.after.target.end, target.start, and target.end accordingly

PBS.length

Applicable only when primeEditing is set to TRUE. It is used to specify the number of bases to ouput for primer binding site.

RT.template.length

Applicable only when primeEditing is set to TRUE. It is used to specify the number of bases required for RT template, default to 8 to 18. Please increase the length if the edit is large insertion. Only gRNAs with calculated RT.template.length falling into the specified range will be in the output. It is calculated as the following. RT.template.length = target.start – cut.start + (target.end - target.start) + targeted.seq.length.change + bp.after.target.end

RT.template.pattern

Applicable only when primeEditing is set to TRUE. It is used to specify the RT template sequence pattern, default to not ending with C according to https://doi.org/10.1038/s41586-019-1711-4

corrected.seq

Applicable only when primeEditing is set to TRUE. It is used to specify the mutated or inserted sequences after successful editing.

targeted.seq.length.change

Applicable only when primeEditing is set to TRUE. It is used to specify the number of targeted sequence length change. Please set it to 0 for base changes, positive numbers for insersion, and negative number for deletion. For example, 10 means that the corrected sequence will have 10bp insertion, -10 means that the corrected sequence will have 10bp deletion, and 0 means only bases have been changed and the sequence length remains the same

bp.after.target.end

Applicable only when primeEditing is set to TRUE. It is used to specify the number of bases to add after the target change end site as part of RT template. Please refer to RT.template.length for how this parameter influences the RT.template.length calculation which is used as a filtering criteria in pregRNA selection.

target.start

Applicable only when primeEditing is set to TRUE. It is used to specify the start location in the input sequence to make changes, which will be used to obtain the RT template sequence. Please also refer to RT.template.length for how this parameter influences the RT.template.length calculation which is used as a filtering criteria in pregRNA selection.

target.end

Applicable only when primeEditing is set to TRUE. It is used to specify the end location in the input sequnence to make changes, which will be used to obtain the RT template sequence. Please also refer to RT.template.length for how this parameter influences the RT.template.length calculation which is used as a filtering criteria in pregRNA selection.

primeEditingPaired.output

Applicable only when primeEditing is set to TRUE. It is used to specify the file path to save pegRNA and the second gRNA with PBS, RT.template, gRNA sequences, default pairedgRNAsForPE.xls

min.score

minimum score of an off target to included in the final output, default 0

topN

top N off targets to be included in the final output, default 1000

topN.OfftargetTotalScore

top N off target used to calculate the total off target score, default 10

annotateExon

Choose whether or not to indicate whether the off target is inside an exon or not, default TRUE

txdb

TxDb object, for creating and using TxDb object, please refer to GenomicFeatures package. For a list of existing TxDb object, please search for annotation package starting with Txdb at http://www.bioconductor.org/packages/release/BiocViews.html#___AnnotationData, such as

  • TxDb.Rnorvegicus.UCSC.rn5.refGene - for rat

  • TxDb.Mmusculus.UCSC.mm10.knownGene - for mouse

  • TxDb.Hsapiens.UCSC.hg19.knownGene - for human

  • TxDb.Dmelanogaster.UCSC.dm3.ensGene - for Drosophila

  • TxDb.Celegans.UCSC.ce6.ensGene - for C.elegans

orgAnn

organism annotation mapping such as org.Hs.egSYMBOL in org.Hs.eg.db package for human

ignore.strand

default to TRUE when annotating to gene

outputDir

the directory where the off target analysis and reports will be written to

fetchSequence

Fetch flank sequence of off target or not, default TRUE

upstream

upstream offset from the off target start, default 200

downstream

downstream offset from the off target end, default 200

upstream.search

upstream offset from the bed input starts to search for gRNAs, default 0

downstream.search

downstream offset from the bed input ends to search for gRNAs, default 0

weights

Applicable only when scoring.method is set to Hsu-Zhang a numeric vector size of gRNA length, default c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583) which is used in Hsu et al., 2013 cited in the reference section

baseBeforegRNA

Number of bases before gRNA used for calculating gRNA efficiency, default 4 Please note, for PAM located on the 5 prime, need to specify the number of bases before the PAM sequence plus PAM size.

baseAfterPAM

Number of bases after PAM used for calculating gRNA efficiency, default 3 for spCas9 Please note, for PAM located on the 5 prime, need to include the length of the gRNA plus the extended sequence on the 3 prime

featureWeightMatrixFile

Feature weight matrix file used for calculating gRNA efficiency. By default DoenchNBT2014 weight matrix is used. To use alternative weight matrix file, please input a csv file with first column containing significant features and the second column containing the corresponding weights for the features. Please see Doench et al., 2014 for details.

useScore

Default TRUE, display in gray scale with the darkness indicating the gRNA efficacy. The taller bar shows the Cas9 cutting site. If set to False, efficacy will not show. Instead, gRNAs in plus strand will be colored red and gRNAs in negative strand will be colored green.

useEfficacyFromInputSeq

Default FALSE. If set to TRUE, summary file will contain gRNA efficacy calculated from input sequences instead of from off-target analysis. Set it to TRUE if the input sequence is from a different species than the one used for off-target analysis.

outputUniqueREs

Default TRUE. If set to TRUE, summary file will contain REs unique to the cleavage site within 100 or 200 bases surrounding the gRNA sequence.

foldgRNAs

Default FALSE. If set to TRUE, summary file will contain minimum free energy of the secondary structure of gRNA with gRNA backbone from GeneRfold package provided that GeneRfold package has been installed.

gRNA.backbone

gRNA backbone constant region sequence. Default to the sequence in Sp gRNA backbone.

temperature

temperature in celsius. Default to 37 celsius.

overwrite

overwrite the existing files in the output directory or not, default FALSE

scoring.method

Indicates which method to use for offtarget cleavage rate estimation, currently two methods are supported, Hsu-Zhang and CFDscore

subPAM.activity

Applicable only when scoring.method is set to CFDscore A hash to represent the cleavage rate for each alternative sub PAM sequence relative to preferred PAM sequence

subPAM.position

Applicable only when scoring.method is set to CFDscore The start and end positions of the sub PAM. Default to 22 and 23 for spCas9 with 20bp gRNA and NGG as preferred PAM. For Cpf1, it could be c(1,2).

PAM.location

PAM location relative to gRNA. For example, default to 3prime for spCas9 PAM. Please set to 5prime for cpf1 PAM since it's PAM is located on the 5 prime end

rule.set

Specify a rule set scoring system for calculating gRNA efficacy. Please note that Root_RuleSet2_2016 requires the following python packages with specified verion and python 2.7. 1. scikit-learn 0.16.1 2. pickle 3. pandas 4. numpy 5. scipy

chrom_acc

Optional binary variable indicating chromatin accessibility information with 1 indicating accessible and 0 not accessible.

mismatch.activity.file

Applicable only when scoring.method is set to CFDscore A comma separated (csv) file containing the cleavage rates for all possible types of single nucleotide mismatche at each position of the gRNA. By default, using the supplemental Table 19 from Doench et al., Nature Biotechnology 2016

useBSgenome

Specify whether BSgenome is avaialbe for searching for gRNA and offtargets, default to FALSE. If set it to TRUE, the results should be the same as when using offtargetAnalysis function.

genomeSeqFile

Specify the genome sequence file in fasta format. It is only applicable and required when useBSgenome is set to FALSE.

predIndelFreq

Default to FALSE. Set it to TRUE to output the predicted indels and their frequencies.

predictIndelFreq.onTargetOnly

Default to TRUE, indicating that indels and their frequencies will be predicted for ontargets only. Usually, researchers are only interested in predicting the editing outcome for the ontargets since any editing in the offtargets are unwanted. Set it to FALSE if you are interested in predicting indels and their frequencies for offtargets. It will take longer time to run if you set it to FALSE.

method.indelFreq

Currently only Lindel method has been implemented. Please let us know if you think additional methods should be made available. Lindel is compatible with both Python2.7 and Python3.5 or higher. Please type help(predictRelativeFreqIndels) to get more details.

baseBeforegRNA.indelFreq

Default to 13 for Lindel method.

baseAfterPAM.indelFreq

Default to 24 for Lindel method.

Value

Four tab delimited files are generated in the output directory:

OfftargetAnalysis.xls

- detailed information of off targets

Summary.xls

- summary of the gRNAs

REcutDetails.xls

- restriction enzyme cut sites of each gRNA

pairedgRNAs.xls

- potential paired gRNAs

Author(s)

Lihua Julie Zhu

References

Patrick D Hsu, David A Scott, Joshua A Weinstein, F Ann Ran, Silvana Konermann, Vineeta Agarwala, Yinqing Li, Eli J Fine, Xuebing Wu, Ophir Shalem, Thomas J Cradick, Luciano A Marraffini, Gang Bao & Feng Zhang (2013) DNA targeting specificity of rNA-guided Cas9 nucleases. Nature Biotechnology 31:827-834

Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, Sullender M, Ebert BL, Xavier RJ, Root DE. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat Biotechnol. 2014 Sep 3. doi: 10.1038 nbt.3026

Lihua Julie Zhu, Benjamin R. Holmes, Neil Aronin and Michael Brodsky. CRISPRseek: a Bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genome-editing systems. Plos One Sept 23rd 2014

Moreno-Mateos, M., Vejnar, C., Beaudoin, J. et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat Methods 12, 982–988 (2015) doi:10.1038/nmeth.3543

Doench JG et al., Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nature Biotechnology Jan 18th 2016

Anzalone et al., Search-and-replace genome editing without double-strand breaks or donor DNA. Nature October 2019 https://www.nature.com/articles/s41586-019-1711-4

Wei Chen, Aaron McKenna, Jacob Schreiber et al., Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair, Nucleic Acids Research, Volume 47, Issue 15, 05 September 2019, Pages 7989–8003, https://doi.org/10.1093/nar/gkz487

See Also

CRISPRseek

Examples


	library(CRISPRseek)
	inputFilePath <- system.file("extdata", "inputseqWithoutBSgenome.fa",
            package = "CRISPRseek")
########### genomeSeq.fasta contains the genomic sequence in fasta format for gRNA and offtarget search###############
        genomeSeqFile <- system.file("extdata", "genomeSeq.fasta",
             package = "CRISPRseek")
        library(hash)
        subPAM.activity <- hash(AA = 0, AC = 0, AG = 0.259259259,
           AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444,
           GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0,
           TG = 0.038961039, TT = 0)
        featureWeightMatrixFile <- system.file("extdata", "Morenos-Mateo.csv",
            package = "CRISPRseek")
	results <- offTargetAnalysisWithoutBSgenome(inputFilePath = inputFilePath, 
            format = "fasta",
            useBSgenome = FALSE,
            genomeSeqFile = genomeSeqFile,
            findgRNAs = TRUE,
            exportAllgRNAs = "fasta",
            fetchSequence = FALSE, 
            findgRNAsWithREcutOnly = FALSE, 
            findPairedgRNAOnly = FALSE, 
            annotatePaired = FALSE,
	    max.mismatch = 1, 
            annotateExon = FALSE,            
            scoring.method = "CFDscore",
            min.score = 0.01,
            PAM = "NGG",
            PAM.pattern <- "NNN",
            rule.set = "CRISPRscan",
            featureWeightMatrixFile = featureWeightMatrixFile,
            subPAM.activity  = subPAM.activity,
            outputDir = "gRNAoutput-CRISPRseek-CRISPRscan-CFDscore", overwrite = TRUE)

LihuaJulieZhu/CRISPRseek documentation built on Feb. 3, 2024, 2:44 p.m.