getTargetSites: Function to extract target sites for each sequence sequence...

View source: R/getTargetSites.R

getTargetSitesR Documentation

Function to extract target sites for each sequence sequence input must be character of nucleotide sequence pam and altpams must be character consisting of IUPAC codes. getTargetSites loops through all pam and altpams to search for match in the sequence. if pam or altpams contain redundantly defined patterns (e.g. TTTN and TTTV), the sequence will be matched twice (once for each pattern). target site defined as 34mer consisting of 4bp + PAM (4bp) + 23bp protospacer + 3bp, as described by Supp Fig. 1 of Kim et al., Nat Biotech 2018. The entire 34mer must fit within the input sequence. returns data frame with columns – mer34 contains the target site 34mer original sequence mer27 contains sequence to be used as input for bowtie that can subsequently be intersected with ENCODE DNase-seq narrow peaks, as describe in Kim et al., Nat Biotech 2018. convertmer34 contains 34mer where any alternative PAM target sites were converted to 'TTTC' (hard-coded). This is because DeepCpf1 was only trained on 'TTTV' PAMs. This strategy of modifying the input for DeepCpf1 is described by Sanson et al., bioRxiv 2019.

Description

Function to extract target sites for each sequence sequence input must be character of nucleotide sequence pam and altpams must be character consisting of IUPAC codes. getTargetSites loops through all pam and altpams to search for match in the sequence. if pam or altpams contain redundantly defined patterns (e.g. TTTN and TTTV), the sequence will be matched twice (once for each pattern). target site defined as 34mer consisting of 4bp + PAM (4bp) + 23bp protospacer + 3bp, as described by Supp Fig. 1 of Kim et al., Nat Biotech 2018. The entire 34mer must fit within the input sequence. returns data frame with columns – mer34 contains the target site 34mer original sequence mer27 contains sequence to be used as input for bowtie that can subsequently be intersected with ENCODE DNase-seq narrow peaks, as describe in Kim et al., Nat Biotech 2018. convertmer34 contains 34mer where any alternative PAM target sites were converted to 'TTTC' (hard-coded). This is because DeepCpf1 was only trained on 'TTTV' PAMs. This strategy of modifying the input for DeepCpf1 is described by Sanson et al., bioRxiv 2019.

Usage

getTargetSites(sequence, pam, altpams)

Value

data frame with row containing NA if no target sites found


chris-hsiung/bears01 documentation built on April 9, 2024, 2:01 a.m.