View source: R/05.predictTestSet.R
predictTestSet | R Documentation |
classify putative pA sites into true and false bins.
predictTestSet( Ndata.NaiveBayes = NULL, Pdata.NaiveBayes = NULL, testSet.NaiveBayes, classifier = NULL, outputFile = "test-predNaiveBayes.tsv", assignmentCutoff = 0.5, return_sequences = FALSE )
Ndata.NaiveBayes |
A data.frame, containing features for the negative
training data, which is built using the function
|
Pdata.NaiveBayes |
A data.frame, containing features for the positive
training data, which is built using the function
|
testSet.NaiveBayes |
An object of |
classifier |
An object of class PASclassifier. |
outputFile |
A character(1) vector, file name for outputting prediction results. The prediction output is written to the file, tab separated. |
assignmentCutoff |
A numeric(1) vector, specifying the cutoff for classifying a putative pA site into a true or false pA class. It should be any number between 0 and 1. For example, assignmentCutoff = 0.5 will assign an putative pA site with prob_true_pA > 0.5 to the True class (1), and any putative pA site with prob_true_pA < = 0.5 as False (0). |
return_sequences |
A logical(1) vector, indicating whether upstream and downstream sequences should be included in the output |
A data.frame including all info as described below. The upstream and downstream sequence used in assessing the putative pA site might be included when return_sequences = TRUE.
peak_name |
the name of the putative pA site (originally from the 4th field in the bed file). |
prob_fake_pA |
the probability that the putative pA site is false |
prob_true_pA |
the probability that the putative pA site is true |
pred_class |
the predicted class of the putative pA site, based on the assignment cutoff. 0 = Falsee/oligo(dT) internally primed, 1 = True |
upstream_seq |
the upstream sequence of the putative pA site used in the analysis |
downstream_seq |
the downstream sequence of the putative pA site used in the analysis. |
Sarah Sheppard, Haibo Liu, Jianhong Ou, Nathan Lawson, Lihua J. Zhu
Sheppard S, Lawson ND, Zhu LJ. Accurate identification of polyadenylation sites from 3' end deep sequencing using a naive Bayes classifier. Bioinformatics. 2013;29(20):2564-2571.
library(BSgenome.Drerio.UCSC.danRer7) testFile <- system.file("extdata", "test.bed", package = "cleanUpdTSeq") ## convert the test set to GRanges without upstream and downstream sequence ## information peaks <- BED6WithSeq2GRangesSeq(file = testFile, skip = 1L, withSeq = TRUE) ## build the feature vector for the test set without sequence information testSet.NaiveBayes = buildFeatureVector(peaks, genome = Drerio, upstream = 40L, downstream = 30L, wordSize = 6L, alphabet = c("ACGT"), sampleType = "unknown", replaceNAdistance = 30, method = "NaiveBayes", fetchSeq = TRUE, return_sequences = TRUE) data(data.NaiveBayes) ## sample the test data for code testing, DO NOT do this for real data samp <- c(1:22, sample(23:4118, 50), 4119, 4120) Ndata.NaiveBayes <- data.NaiveBayes$Negative[, samp] Pdata.NaiveBayes <- data.NaiveBayes$Positive[, samp] testSet.NaiveBayes@data <- testSet.NaiveBayes@data[, samp[-1]-1] test_out <- predictTestSet(Ndata.NaiveBayes, Pdata.NaiveBayes, testSet.NaiveBayes, outputFile = tempfile(), assignmentCutoff = 0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.