View source: R/05.predictTestSet.R
| predictTestSet | R Documentation | 
classify putative pA sites into true and false bins.
predictTestSet(
  Ndata.NaiveBayes = NULL,
  Pdata.NaiveBayes = NULL,
  testSet.NaiveBayes,
  classifier = NULL,
  outputFile = "test-predNaiveBayes.tsv",
  assignmentCutoff = 0.5,
  return_sequences = FALSE
)
| Ndata.NaiveBayes | A data.frame, containing features for the negative 
training data, which is built using the function 
 | 
| Pdata.NaiveBayes | A data.frame, containing features for the positive 
training data, which is built using the function 
 | 
| testSet.NaiveBayes | An object of  | 
| classifier | An object of class PASclassifier. | 
| outputFile | A character(1) vector, file name for outputting prediction results. The prediction output is written to the file, tab separated. | 
| assignmentCutoff | A numeric(1) vector, specifying the cutoff for classifying a putative pA site into a true or false pA class. It should be any number between 0 and 1. For example, assignmentCutoff = 0.5 will assign an putative pA site with prob_true_pA > 0.5 to the True class (1), and any putative pA site with prob_true_pA < = 0.5 as False (0). | 
| return_sequences | A logical(1) vector, indicating whether upstream and downstream sequences should be included in the output | 
A data.frame including all info as described below. The upstream and downstream sequence used in assessing the putative pA site might be included when return_sequences = TRUE.
| peak_name | the name of the putative pA site (originally from the 4th field in the bed file). | 
| prob_fake_pA | the probability that the putative pA site is false | 
| prob_true_pA | the probability that the putative pA site is true | 
| pred_class | the predicted class of the putative pA site, based on the assignment cutoff. 0 = Falsee/oligo(dT) internally primed, 1 = True | 
| upstream_seq | the upstream sequence of the putative pA site used in the analysis | 
| downstream_seq | the downstream sequence of the putative pA site used in the analysis. | 
Sarah Sheppard, Haibo Liu, Jianhong Ou, Nathan Lawson, Lihua J. Zhu
Sheppard S, Lawson ND, Zhu LJ. Accurate identification of polyadenylation sites from 3' end deep sequencing using a naive Bayes classifier. Bioinformatics. 2013;29(20):2564-2571.
library(BSgenome.Drerio.UCSC.danRer7)
testFile <- system.file("extdata", "test.bed",
                        package = "cleanUpdTSeq")
## convert the test set to GRanges without upstream and downstream sequence
## information
peaks <- BED6WithSeq2GRangesSeq(file = testFile, 
                               skip = 1L, withSeq = TRUE)
## build the feature vector for the test set without sequence information
testSet.NaiveBayes = buildFeatureVector(peaks,
                                        genome = Drerio, 
                                        upstream = 40L,
                                        downstream = 30L, 
                                        wordSize = 6L, 
                                        alphabet = c("ACGT"),
                                        sampleType = "unknown",
                                        replaceNAdistance = 30,
                                        method = "NaiveBayes", 
                                        fetchSeq = TRUE,
                                        return_sequences = TRUE)
data(data.NaiveBayes)
## sample the test data for code testing, DO NOT do this for real data
samp <- c(1:22, sample(23:4118, 50), 4119, 4120)
Ndata.NaiveBayes <- data.NaiveBayes$Negative[, samp]
Pdata.NaiveBayes <- data.NaiveBayes$Positive[, samp]
testSet.NaiveBayes@data <- testSet.NaiveBayes@data[, samp[-1]-1]
    
test_out <- predictTestSet(Ndata.NaiveBayes, 
                           Pdata.NaiveBayes,
                           testSet.NaiveBayes,
	                          outputFile = tempfile(), 
                           assignmentCutoff = 0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.