find_cuts: Find Restriction Site Positions in Genomic Sequence

Description Usage Arguments Value Author(s) See Also Examples

Description

Function to find the positions of user-defined restriction site sequences in genomic sequences of interest. Uses an R gregexpr function to find matches of a pattern (restriction site sequence) in a target character vector (genomic sequences). Positions of these matches are then saved as output. Character vectors with no matches are defined as missing values (NA).

Usage

1
find_cuts(genomic_seq, restriction_site_seq)

Arguments

genomic_seq

Vector containing per-chromosome/scaffold genomic sequence strings. Argument is designed to be compatible with the output of the process_fasta function. (x from gregexpr function)

restriction_site_seq

String containing the restriction site sequence of interest. (pattern from gregexpr function). Sequence can be obtained from the renz dataset.

Value

Returns a list of vectors. Each list element contains a vector of the per-chromosome/scaffold restriction site positions. It is extracted from the first element from the gregexpr function output.

Example:

In chromosome 1, we have cutsites at positions 7, 19, 28 and 50. No cutsites were found in chromosome 2. Cutsites were found in positions 12, 30, 36 and 42 in chromosome 3.

[[1]]

[1] 7 19 28 50

[[2]]

[1] NA

[[3]]

[1] 12 30 36 42

Author(s)

Angel G. Rivera-Colon

See Also

process_fasta renz

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Create sequences object
myfasta <- system.file("extdata",
                       "test_geno.fa.gz",
                       package = "RADseqTools")
mySeqs <- process_fasta(myfasta, 10000)

#
#For this function
#

# Define restriction enzyme sequence
data(renz)
myEnz <- renz["sbfI"]

# Find position of cutsites in sequences
cutPos <- find_cuts(mySeqs, myEnz)

angelgr2/radseq_tools documentation built on May 15, 2019, 3:59 a.m.