View source: R/05.predictTestSet.R
predictTestSet | R Documentation |
classify putative pA sites into true and false bins.
predictTestSet(
Ndata.NaiveBayes = NULL,
Pdata.NaiveBayes = NULL,
testSet.NaiveBayes,
classifier = NULL,
outputFile = "test-predNaiveBayes.tsv",
assignmentCutoff = 0.5,
return_sequences = FALSE
)
Ndata.NaiveBayes |
A data.frame, containing features for the negative
training data, which is built using the function
|
Pdata.NaiveBayes |
A data.frame, containing features for the positive
training data, which is built using the function
|
testSet.NaiveBayes |
An object of |
classifier |
An object of class PASclassifier. |
outputFile |
A character(1) vector, file name for outputting prediction results. The prediction output is written to the file, tab separated. |
assignmentCutoff |
A numeric(1) vector, specifying the cutoff for classifying a putative pA site into a true or false pA class. It should be any number between 0 and 1. For example, assignmentCutoff = 0.5 will assign an putative pA site with prob_true_pA > 0.5 to the True class (1), and any putative pA site with prob_true_pA < = 0.5 as False (0). |
return_sequences |
A logical(1) vector, indicating whether upstream and downstream sequences should be included in the output |
A data.frame including all info as described below. The upstream and downstream sequence used in assessing the putative pA site might be included when return_sequences = TRUE.
peak_name |
the name of the putative pA site (originally from the 4th field in the bed file). |
prob_fake_pA |
the probability that the putative pA site is false |
prob_true_pA |
the probability that the putative pA site is true |
pred_class |
the predicted class of the putative pA site, based on the assignment cutoff. 0 = Falsee/oligo(dT) internally primed, 1 = True |
upstream_seq |
the upstream sequence of the putative pA site used in the analysis |
downstream_seq |
the downstream sequence of the putative pA site used in the analysis. |
Sarah Sheppard, Haibo Liu, Jianhong Ou, Nathan Lawson, Lihua J. Zhu
Sheppard S, Lawson ND, Zhu LJ. Accurate identification of polyadenylation sites from 3' end deep sequencing using a naive Bayes classifier. Bioinformatics. 2013;29(20):2564-2571.
library(BSgenome.Drerio.UCSC.danRer7)
testFile <- system.file("extdata", "test.bed",
package = "cleanUpdTSeq")
## convert the test set to GRanges without upstream and downstream sequence
## information
peaks <- BED6WithSeq2GRangesSeq(file = testFile,
skip = 1L, withSeq = TRUE)
## build the feature vector for the test set without sequence information
testSet.NaiveBayes = buildFeatureVector(peaks,
genome = Drerio,
upstream = 40L,
downstream = 30L,
wordSize = 6L,
alphabet = c("ACGT"),
sampleType = "unknown",
replaceNAdistance = 30,
method = "NaiveBayes",
fetchSeq = TRUE,
return_sequences = TRUE)
data(data.NaiveBayes)
## sample the test data for code testing, DO NOT do this for real data
samp <- c(1:22, sample(23:4118, 50), 4119, 4120)
Ndata.NaiveBayes <- data.NaiveBayes$Negative[, samp]
Pdata.NaiveBayes <- data.NaiveBayes$Positive[, samp]
testSet.NaiveBayes@data <- testSet.NaiveBayes@data[, samp[-1]-1]
test_out <- predictTestSet(Ndata.NaiveBayes,
Pdata.NaiveBayes,
testSet.NaiveBayes,
outputFile = tempfile(),
assignmentCutoff = 0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.