Home

/

GitHub

/

haibol2016/cleanUpdTSeq

/

predictTestSet: predict authenticity of putative pA sites

predictTestSet: predict authenticity of putative pA sites
In haibol2016/cleanUpdTSeq: cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

View source: R/05.predictTestSet.R

predictTestSet

R Documentation

predict authenticity of putative pA sites

Description

classify putative pA sites into true and false bins.

Usage

predictTestSet(
  Ndata.NaiveBayes = NULL,
  Pdata.NaiveBayes = NULL,
  testSet.NaiveBayes,
  classifier = NULL,
  outputFile = "test-predNaiveBayes.tsv",
  assignmentCutoff = 0.5,
  return_sequences = FALSE
)

Arguments

`Ndata.NaiveBayes`	A data.frame, containing features for the negative training data, which is built using the function `buildFeatureVector`. It is described further in`data.NaiveBayes`.
`Pdata.NaiveBayes`	A data.frame, containing features for the positive training data, which is built using the function `buildFeatureVector`. It is described further in`data.NaiveBayes`.
`testSet.NaiveBayes`	An object of `featureVector` for test data built for Naive Bayes analysis using the function `buildFeatureVector`.
`classifier`	An object of class PASclassifier.
`outputFile`	A character(1) vector, file name for outputting prediction results. The prediction output is written to the file, tab separated.
`assignmentCutoff`	A numeric(1) vector, specifying the cutoff for classifying a putative pA site into a true or false pA class. It should be any number between 0 and 1. For example, assignmentCutoff = 0.5 will assign an putative pA site with prob_true_pA > 0.5 to the True class (1), and any putative pA site with prob_true_pA < = 0.5 as False (0).
`return_sequences`	A logical(1) vector, indicating whether upstream and downstream sequences should be included in the output

Value

A data.frame including all info as described below. The upstream and downstream sequence used in assessing the putative pA site might be included when return_sequences = TRUE.

`peak_name`	the name of the putative pA site (originally from the 4th field in the bed file).
`prob_fake_pA`	the probability that the putative pA site is false
`prob_true_pA`	the probability that the putative pA site is true
`pred_class`	the predicted class of the putative pA site, based on the assignment cutoff. 0 = Falsee/oligo(dT) internally primed, 1 = True
`upstream_seq`	the upstream sequence of the putative pA site used in the analysis
`downstream_seq`	the downstream sequence of the putative pA site used in the analysis.

Author(s)

Sarah Sheppard, Haibo Liu, Jianhong Ou, Nathan Lawson, Lihua J. Zhu

References

Sheppard S, Lawson ND, Zhu LJ. Accurate identification of polyadenylation sites from 3' end deep sequencing using a naive Bayes classifier. Bioinformatics. 2013;29(20):2564-2571.

Examples

library(BSgenome.Drerio.UCSC.danRer7)
testFile <- system.file("extdata", "test.bed",
                        package = "cleanUpdTSeq")
## convert the test set to GRanges without upstream and downstream sequence
## information
peaks <- BED6WithSeq2GRangesSeq(file = testFile, 
                               skip = 1L, withSeq = TRUE)
## build the feature vector for the test set without sequence information
testSet.NaiveBayes = buildFeatureVector(peaks,
                                        genome = Drerio, 
                                        upstream = 40L,
                                        downstream = 30L, 
                                        wordSize = 6L, 
                                        alphabet = c("ACGT"),
                                        sampleType = "unknown",
                                        replaceNAdistance = 30,
                                        method = "NaiveBayes", 
                                        fetchSeq = TRUE,
                                        return_sequences = TRUE)
data(data.NaiveBayes)
## sample the test data for code testing, DO NOT do this for real data
samp <- c(1:22, sample(23:4118, 50), 4119, 4120)
Ndata.NaiveBayes <- data.NaiveBayes$Negative[, samp]
Pdata.NaiveBayes <- data.NaiveBayes$Positive[, samp]
testSet.NaiveBayes@data <- testSet.NaiveBayes@data[, samp[-1]-1]
    
test_out <- predictTestSet(Ndata.NaiveBayes, 
                           Pdata.NaiveBayes,
                           testSet.NaiveBayes,
	                          outputFile = tempfile(), 
                           assignmentCutoff = 0.5)

haibol2016/cleanUpdTSeq documentation built on April 14, 2022, 9:56 p.m.

haibol2016/cleanUpdTSeq index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

haibol2016/cleanUpdTSeq
cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

predictTestSet: predict authenticity of putative pA sites
In haibol2016/cleanUpdTSeq: cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

predict authenticity of putative pA sites

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to predictTestSet in haibol2016/cleanUpdTSeq...

R Package Documentation

Browse R Packages

We want your feedback!

haibol2016/cleanUpdTSeq cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

predictTestSet: predict authenticity of putative pA sites In haibol2016/cleanUpdTSeq: cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

predict authenticity of putative pA sites

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to predictTestSet in haibol2016/cleanUpdTSeq...

R Package Documentation

Browse R Packages

We want your feedback!

haibol2016/cleanUpdTSeq
cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

predictTestSet: predict authenticity of putative pA sites
In haibol2016/cleanUpdTSeq: cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data