predictTestSet: predictTestSet

Description Usage Arguments Value Author(s) References Examples

Description

This function can be used to predict the probabilities for a set of putative pA sites.

Usage

1
2
predictTestSet(Ndata.NaiveBayes, Pdata.NaiveBayes, testSet.NaiveBayes, classifier=NULL,
 outputFile = "test-predNaiveBayes.tsv", assignmentCutoff = 0.5)

Arguments

Ndata.NaiveBayes

This is the negative training data, described further in data.NaiveBayes.

Pdata.NaiveBayes

This is the positive training data, described further in data.NaiveBayes.

classifier

An object of class PASclassifier.

testSet.NaiveBayes

This is the test data, a feature vector that has been built for Naive Bayes analysis using the function "buildFeatureVector".

outputFile

This is the name of the file the output will be written to.

assignmentCutoff

This is the cutoff used to assign whether a putative pA is true or false. This can be any floating point number between 0 and 1. For example, assignmentCutoff = 0.5 will assign an putative pA site with prob.1 > 0.5 to the True class (1), and any putative pA site with prob.1 <= 0.5 as False (0).

Value

The output is written to a tab separated file containing fields for peak name, the probability of the putative pA site being false (prob.0), the probability of the putative pA site being true (prob.1), the predicted class (0/False or 1/True) depending on the assignment cutoff, and the upstream and downstream sequence used in assessing the putative pA site.

PeakName

This is the name of the putative pA site (originally from the 4th field in the bed file).

prob False/oligodT internally primed

This is the probability that the putative pA site is false. Values range from 0-1, with 1 meaning the site is False/oligodT internally primed.

prob True

This is the probability that the putative pA site is true. Values range from 0-1, with 1 meaning the site is True.

pred.class

This is the predicted class of the putative pA site, based on the assignment cutoff. 0= Falsee/oligodT internally primed, 1 = True

UpstreamSeq

This is the upstream sequence of the putative pA site used in the analysis.

DownstreamSeq

This is the downstream sequence of the putative pA site used in the analysis.

The function also return an invisible matrix including all info as decribed above.

Author(s)

Sarah Sheppard, Jianhong Ou, Nathan Lawson, Lihua Julie Zhu

References

Sarah Sheppard, Nathan D. Lawson, and Lihua Julie Zhu. 2013. Accurate identification of polyadenylation sites from 3' end deep sequencing using a na\"ive Bayes classifier. Bioinformatics. Under revision

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
    testFile = system.file("extdata", "test.bed", package="cleanUpdTSeq")
    testSet = read.table(testFile, sep = "\t", header = TRUE)
		
	#convert the test set to GRanges without upstream and downstream sequence information
        peaks = BED2GRangesSeq(testSet,withSeq=FALSE)
        
	#build the feature vector for the test set without sequence information
	library(BSgenome.Drerio.UCSC.danRer7)
	testSet.NaiveBayes = buildFeatureVector(peaks,BSgenomeName = Drerio, upstream = 40,
         downstream = 30, wordSize = 6, alphabet=c("ACGT"),
         sampleType = "unknown",replaceNAdistance = 30,
        method = "NaiveBayes", ZeroBasedIndex = 1, fetchSeq = TRUE)
        
    data(data.NaiveBayes)
    
    ## sample the test data for code testing, DO NOT do this for real data
    ## START SAMPLING
    samp <- c(1:22, sample(23:4119, 50), 4119, 4120)
    Ndata.NaiveBayes <- data.NaiveBayes$Negative[,samp]
    Pdata.NaiveBayes <- data.NaiveBayes$Positive[,samp]
    testSet.NaiveBayes@data <- testSet.NaiveBayes@data[, samp-1]
    ## END SAMPLING
    
	predictTestSet(Ndata.NaiveBayes, 
                   Pdata.NaiveBayes,
                   testSet.NaiveBayes,
	               outputFile="test-predNaiveBayes.xls", 
                   assignmentCutoff = 0.5)

cleanUpdTSeq documentation built on Nov. 8, 2020, 8:30 p.m.