I wrote the functions in a R package
. Source files of the package are in crispR_0.0.0.9000.tar.gz
. Installing the source will automatically check and install any missing dependencies.
# install.packages("devtools") devtools::install_local("path_to_local_package")
The function is called find_proto
and is the first function in theprotospacers.R
file.
library(crispR) find_proto(d_seq = "TGATCTACTAGAGACTACTAACGGGGATACATAG", l = 2, PAM = "NGG")
..or using DNA of the Dopamine Transporter (DAT
internal data):
library(crispR) print(DAT)
library(crispR) prot <- find_proto(d_seq = DAT, l = 20, PAM = "NGG") head(prot, 10)
I am not explicitly using any loop, but my function is in any case iterating and looking at each nucleotide of the sequence by using grep
(stringr
and regular expressions).
time Complexity: O(n)
The function is called find_FASTA
and is the second function in theprotospacers.R
file.
I downloaded the Reference Genome Sequence GRCh38 from here.
A total of 54 protospacers were identified on strand (+). Please note the arguments "start", "end" and "l" are 1-indexed and intervals are fully closed.
A tab-delimited file called solution.txt
is in the current archive.
All the dependencies are listed in the file DESCRIPTION
.
A quick and dirty version can be written probably in 1 hour or less. I polished the code, wrote the documentation too, and in total it took me few hours... but I also spent quite some time thinking about the reverse complementary strand!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.