Description Usage Arguments Value Note Examples
This a function for aligning CRISPR barcode reads to library, or similar problems using edit distance to evaluate the distance between a read and a barcode for the error toleration.
1 2 3 4 | bcSeq_edit(sampleFile, libFile, outFile, misMatch = 2, tMat =
NULL, numThread = 4, count_only = TRUE, gap_left = 3,
ext_left = 1, gap_right = 3, ext_right = 1, pen_max =
6, userProb = NULL, detail_info = FALSE)
|
sampleFile |
(string) sample filename, needs to be a fastq file. |
libFile |
(string) library filename, needs to be a fasta or fastq file. |
outFile |
(string) output filename. |
misMatch |
(integer) the number of maximum mismatches or indels allowed in the alignment. |
tMat |
(two column dataframe) prior probability of a mismatch given a sequence. The first column is the prior sequence, the second column is the error rate. The default value for all prior sequences is 1/3. |
numThread |
(integer) the number of threads for parallel computing, default 4. |
count_only |
(bool) option for controlling function returns, default to be
|
gap_left |
(double) Penalty score for delete a base for the reads. |
ext_left |
(double) Penalty score for extending deletion of base for the reads. |
gap_right |
(double) Penalty score for delete a base for the barcodes. |
ext_right |
(double) Penalty score for extending deletion of base for the barcodes. |
pen_max |
(double) Max penalty allowed for a alignment. |
userProb |
(function) a function to compute the alignment probability
with 3 arguments |
detail_info |
(bool) option for controlling function returns, default to
be |
default |
No objects are returned to R, instead, a csv count table is created and written to files. The .csv file contains two columns, the first column is the sequences of the barcodes, and the second columns is the number of reads that aligned to the barcodes. |
count_only = FALSE |
If set to |
The user need to perform the removing of any adapter sequence before and after the barcode for the fastq file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | #### Generate barcodes
lFName <- "./libFile.fasta"
bases <- c(rep('A', 4), rep('C',4), rep('G',4), rep('T',4))
numOfBars <- 20
Barcodes <- rep(NA, numOfBars*2)
for (i in 1:numOfBars){
Barcodes[2*i-1] <- paste0(">barcode_ID: ", i)
Barcodes[2*i] <- paste(sample(bases, length(bases)), collapse = '')
}
write(Barcodes, lFName)
#### Generate reads and phred score
rFName <- "./readFile.fastq"
numOfReads <- 800
Reads <- rep(NA, numOfReads*4)
for (i in 1:numOfReads){
Reads[4*i-3] <- paste0("@read_ID_",i)
Reads[4*i-2] <- Barcodes[2*sample(1:numOfBars,1,
replace=TRUE, prob=seq(1:numOfBars))]
Reads[4*i-1] <- "+"
Reads[4*i] <- paste(rawToChar(as.raw(
33+sample(20:30, length(bases),replace=TRUE))),
collapse='')
}
write(Reads, rFName)
#### perform alignment
outFile <- "./count_edit.csv"
#res <- bcSeq_edit(rFName, lFName, outFile, misMatch = 2,
# tMat = NULL, numThread = 4, count_only = TRUE, userProb = NULL,
# gap_left = 2, ext_left = 1, gap_right = 2, ext_right = 1,
# pen_max = 7)
#### The user defined probability model function is a modeling
#### of the alignment probability and the penalty encounted
#### due to the edit distance between a read and a candidata
#### barcode. The user define probability model function serves
#### as a combined evaluation of the alignment quality by
#### considering both the alignment probability and the edit
#### distance penalty.
#### the user defined function has three arguements
#### (1) val: a double vector indicates the alignment probabilities
#### between a read and its candidate barcodes based on
#### edit distance.
#### (2) pens: a double vector indicates the penalties of edit distance
#### between a read and its candidate barcodes for the alignment
#### based on edit distance.
#### (3) m: a double scalor indicating the weight for alignment
#### probabilities and edit distance penalty to determine
#### the final alignment quality that can be used as
#### as clasifier.
#### User can also constuct a more complexed model by only keeping the
#### function signature.
#### Example function in R
useP <-function(m, val, pens) { val * (1 - log(2) + log(1 + m / (m + pens) ) ) }
#### Example function in C++(can be ported to R using Rcpp packages)
#library(Rcpp)
#cppFunction(
#'NumericVector cpp_fun(int m, NumericVector val, NumericVector pens) {
# int n = val.size();
# NumericVector out(n);
# for(int i = 0; i < n; ++i) {
# out[i] = val[i] * (1 - log(2) +
# log(1 + m / (m + pens[i]) ) );
# }
# return out;
#}')
outFile <- "./count_edit_2.csv"
#res <- bcSeq_edit(rFName, lFName, outFile, misMatch = 2,
# tMat = NULL, numThread = 4, count_only = TRUE, userProb = useP,
# gap_left = 2, ext_left = 1, gap_right = 2, ext_right = 1,
# pen_max = 7)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.