View source: R/03.buildFeatureVector.R
buildFeatureVector | R Documentation |
This function creates a data frame. Fields include peak name, upstream sequence, downstream sequence, and features to be used in classifying the putative polyadenylation site.
buildFeatureVector(
peaks,
genome = Drerio,
upstream = 40L,
downstream = 30L,
wordSize = 6L,
alphabet = "ACGT",
sampleType = c("TP", "TN", "unknown"),
replaceNAdistance = 30L,
method = c("NaiveBayes", "SVM"),
fetchSeq = FALSE,
return_sequences = FALSE
)
peaks |
An object of GRanges that may contain the upstream and downstream sequence information. This item is created by the function BED6WithSeq2GRangesSeq. |
genome |
Name of the genome to get sequences from. To find out a list of available genomes, please type BSgenome::available.genomes() in R. |
upstream |
An integer(1) vector, length of upstream sequence to retrieve. |
downstream |
An integer(1) vector, length of downstream sequence to retrieve. |
wordSize |
An integer(1) vector, size of the kmer feature for the upstream sequence. wordSize = 6 should always be used. |
alphabet |
A character(1) vector, a string containing DNA bases. By default, "ACTG". |
sampleType |
A character(1) vector, indicating type of sequences for building feature vectors. Options are TP (true positive) and TN (true negative) for training data, or unknown for test data. |
replaceNAdistance |
An integer(1) vector, specifying an number for avg.distanceA2PeakEnd, the average distance of As to the putative pA site, when there is no A in the downstream sequence. |
method |
A character(1) vector, specifying a machine learning method to to use. Currently, only "NaiveBayes" is implemented. |
fetchSeq |
A logical (1), indicating whether upstream and downstream sequences should be retrieved from the BSgenome object at this step or not. |
return_sequences |
A logical(1) vector, indicating whether upstream and downstream sequences should be included in the output |
An object of "featureVector
"
Sarah Sheppard, Haibo Liu, Jianhong Ou, Nathan Lawson, Lihua J. Zhu
library(BSgenome.Drerio.UCSC.danRer7)
testFile <- system.file("extdata", "test.bed",
package = "cleanUpdTSeq")
peaks <- BED6WithSeq2GRangesSeq(file = testFile,
skip = 1L, withSeq = TRUE)
## build the feature vector for the test set with sequence information
testSet.NaiveBayes = buildFeatureVector(peaks,
genome = Drerio,
upstream = 40L,
downstream = 30L,
wordSize = 6L,
alphabet = "ACGT",
sampleType = "unknown",
replaceNAdistance = 30,
method = "NaiveBayes",
fetchSeq = FALSE,
return_sequences = TRUE)
## convert the test set to GRanges without upstream and downstream
## sequence information
peaks <- BED6WithSeq2GRangesSeq(file = testFile,
skip = 1L, withSeq = FALSE)
#build the feature vector for the test set without sequence information
testSet.NaiveBayes = buildFeatureVector(peaks,
genome = Drerio,
upstream = 40L,
downstream = 30L,
wordSize = 6L,
alphabet = "ACGT",
sampleType = "unknown",
replaceNAdistance = 30,
method = "NaiveBayes",
fetchSeq = TRUE,
return_sequences = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.