predictTestSet: predict authenticity of putative pA sites

View source: R/05.predictTestSet.R

predictTestSetR Documentation

predict authenticity of putative pA sites

Description

classify putative pA sites into true and false bins.

Usage

predictTestSet(
  Ndata.NaiveBayes = NULL,
  Pdata.NaiveBayes = NULL,
  testSet.NaiveBayes,
  classifier = NULL,
  outputFile = "test-predNaiveBayes.tsv",
  assignmentCutoff = 0.5,
  return_sequences = FALSE
)

Arguments

Ndata.NaiveBayes

A data.frame, containing features for the negative training data, which is built using the function buildFeatureVector. It is described further indata.NaiveBayes.

Pdata.NaiveBayes

A data.frame, containing features for the positive training data, which is built using the function buildFeatureVector. It is described further indata.NaiveBayes.

testSet.NaiveBayes

An object of featureVector for test data built for Naive Bayes analysis using the function buildFeatureVector.

classifier

An object of class PASclassifier.

outputFile

A character(1) vector, file name for outputting prediction results. The prediction output is written to the file, tab separated.

assignmentCutoff

A numeric(1) vector, specifying the cutoff for classifying a putative pA site into a true or false pA class. It should be any number between 0 and 1. For example, assignmentCutoff = 0.5 will assign an putative pA site with prob_true_pA > 0.5 to the True class (1), and any putative pA site with prob_true_pA < = 0.5 as False (0).

return_sequences

A logical(1) vector, indicating whether upstream and downstream sequences should be included in the output

Value

A data.frame including all info as described below. The upstream and downstream sequence used in assessing the putative pA site might be included when return_sequences = TRUE.

peak_name

the name of the putative pA site (originally from the 4th field in the bed file).

prob_fake_pA

the probability that the putative pA site is false

prob_true_pA

the probability that the putative pA site is true

pred_class

the predicted class of the putative pA site, based on the assignment cutoff. 0 = Falsee/oligo(dT) internally primed, 1 = True

upstream_seq

the upstream sequence of the putative pA site used in the analysis

downstream_seq

the downstream sequence of the putative pA site used in the analysis.

Author(s)

Sarah Sheppard, Haibo Liu, Jianhong Ou, Nathan Lawson, Lihua J. Zhu

References

Sheppard S, Lawson ND, Zhu LJ. Accurate identification of polyadenylation sites from 3' end deep sequencing using a naive Bayes classifier. Bioinformatics. 2013;29(20):2564-2571.

Examples

library(BSgenome.Drerio.UCSC.danRer7)
testFile <- system.file("extdata", "test.bed",
                        package = "cleanUpdTSeq")
## convert the test set to GRanges without upstream and downstream sequence
## information
peaks <- BED6WithSeq2GRangesSeq(file = testFile, 
                               skip = 1L, withSeq = TRUE)
## build the feature vector for the test set without sequence information
testSet.NaiveBayes = buildFeatureVector(peaks,
                                        genome = Drerio, 
                                        upstream = 40L,
                                        downstream = 30L, 
                                        wordSize = 6L, 
                                        alphabet = c("ACGT"),
                                        sampleType = "unknown",
                                        replaceNAdistance = 30,
                                        method = "NaiveBayes", 
                                        fetchSeq = TRUE,
                                        return_sequences = TRUE)
data(data.NaiveBayes)
## sample the test data for code testing, DO NOT do this for real data
samp <- c(1:22, sample(23:4118, 50), 4119, 4120)
Ndata.NaiveBayes <- data.NaiveBayes$Negative[, samp]
Pdata.NaiveBayes <- data.NaiveBayes$Positive[, samp]
testSet.NaiveBayes@data <- testSet.NaiveBayes@data[, samp[-1]-1]
    
test_out <- predictTestSet(Ndata.NaiveBayes, 
                           Pdata.NaiveBayes,
                           testSet.NaiveBayes,
	                          outputFile = tempfile(), 
                           assignmentCutoff = 0.5)


jianhong/cleanUpdTSeq documentation built on Jan. 3, 2025, 10:31 p.m.