View source: R/03.buildFeatureVector.R
buildFeatureVector | R Documentation |
This function creates a data frame. Fields include peak name, upstream sequence, downstream sequence, and features to be used in classifying the putative polyadenylation site.
buildFeatureVector( peaks, genome = Drerio, upstream = 40L, downstream = 30L, wordSize = 6L, alphabet = "ACGT", sampleType = c("TP", "TN", "unknown"), replaceNAdistance = 30L, method = c("NaiveBayes", "SVM"), fetchSeq = FALSE, return_sequences = FALSE )
peaks |
An object of GRanges that may contain the upstream and downstream sequence information. This item is created by the function BED6WithSeq2GRangesSeq. |
genome |
Name of the genome to get sequences from. To find out a list of available genomes, please type BSgenome::available.genomes() in R. |
upstream |
An integer(1) vector, length of upstream sequence to retrieve. |
downstream |
An integer(1) vector, length of downstream sequence to retrieve. |
wordSize |
An integer(1) vector, size of the kmer feature for the upstream sequence. wordSize = 6 should always be used. |
alphabet |
A character(1) vector, a string containing DNA bases. By default, "ACTG". |
sampleType |
A character(1) vector, indicating type of sequences for building feature vectors. Options are TP (true positive) and TN (true negative) for training data, or unknown for test data. |
replaceNAdistance |
An integer(1) vector, specifying an number for avg.distanceA2PeakEnd, the average distance of As to the putative pA site, when there is no A in the downstream sequence. |
method |
A character(1) vector, specifying a machine learning method to to use. Currently, only "NaiveBayes" is implemented. |
fetchSeq |
A logical (1), indicating whether upstream and downstream sequences should be retrieved from the BSgenome object at this step or not. |
return_sequences |
A logical(1) vector, indicating whether upstream and downstream sequences should be included in the output |
An object of "featureVector
"
Sarah Sheppard, Haibo Liu, Jianhong Ou, Nathan Lawson, Lihua J. Zhu
library(BSgenome.Drerio.UCSC.danRer7) testFile <- system.file("extdata", "test.bed", package = "cleanUpdTSeq") peaks <- BED6WithSeq2GRangesSeq(file = testFile, skip = 1L, withSeq = TRUE) ## build the feature vector for the test set with sequence information testSet.NaiveBayes = buildFeatureVector(peaks, genome = Drerio, upstream = 40L, downstream = 30L, wordSize = 6L, alphabet = "ACGT", sampleType = "unknown", replaceNAdistance = 30, method = "NaiveBayes", fetchSeq = FALSE, return_sequences = TRUE) ## convert the test set to GRanges without upstream and downstream ## sequence information peaks <- BED6WithSeq2GRangesSeq(file = testFile, skip = 1L, withSeq = FALSE) #build the feature vector for the test set without sequence information testSet.NaiveBayes = buildFeatureVector(peaks, genome = Drerio, upstream = 40L, downstream = 30L, wordSize = 6L, alphabet = "ACGT", sampleType = "unknown", replaceNAdistance = 30, method = "NaiveBayes", fetchSeq = TRUE, return_sequences = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.