Takehome exercises (v1[2]) for Bioinformatics Software Engineer position at Vertex Pharmaceuticals.
I wrote the functions directly in a R package
. This facilitates the installation of all the dependencies...
# install.packages("devtools") devtools::install_github("c1au6i0/crispR")
You can access the code of the function find_proto
here.
library(crispR) find_proto(d_seq = "TGATCTACTAGAGACTACTAACGGGGATACATAG", l = 2, PAM = "NGG")
..or using DNA of the Dopamine Transporter (DAT
internal data):
library(crispR) print(DAT)
library(crispR) find_proto(d_seq = DAT, l = 20, PAM = "NGG")
I am not explicitly using any loop, but my function is in any case iterating and looking at each nucleotide of the sequence by using grep
(stringr
and regular expressions).
time Complexity: O(n)
You can access the code of the function find_FASTA
here.
I downloaded the Reference Genome Sequence GRCh38 from here.
A total of 54 protospacers were identified on strand (+). Please note the arguments "start", "end" and "l" are 1-indexed and intervals are fully closed.
A tab-delimited file can be downloaded here.
All the dependencies are listed in the Description
file in my github account here.
A quick and dirty version can be written probably in 1 hour or less. I polished the code, wrote the documentation too, and in total it took me few hours... but I also spent quite some time thinking about the reverse complementary strand!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.