Home

/

GitHub

/

jianhong/cleanUpdTSeq

/

buildFeatureVector: build Feature Vector_2

buildFeatureVector: build Feature Vector_2
In jianhong/cleanUpdTSeq: cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

View source: R/03.buildFeatureVector.R

buildFeatureVector

R Documentation

build Feature Vector_2

Description

This function creates a data frame. Fields include peak name, upstream sequence, downstream sequence, and features to be used in classifying the putative polyadenylation site.

Usage

buildFeatureVector(
  peaks,
  genome = Drerio,
  upstream = 40L,
  downstream = 30L,
  wordSize = 6L,
  alphabet = "ACGT",
  sampleType = c("TP", "TN", "unknown"),
  replaceNAdistance = 30L,
  method = c("NaiveBayes", "SVM"),
  fetchSeq = FALSE,
  return_sequences = FALSE
)

Arguments

`peaks`	An object of GRanges that may contain the upstream and downstream sequence information. This item is created by the function BED6WithSeq2GRangesSeq.
`genome`	Name of the genome to get sequences from. To find out a list of available genomes, please type BSgenome::available.genomes() in R.
`upstream`	An integer(1) vector, length of upstream sequence to retrieve.
`downstream`	An integer(1) vector, length of downstream sequence to retrieve.
`wordSize`	An integer(1) vector, size of the kmer feature for the upstream sequence. wordSize = 6 should always be used.
`alphabet`	A character(1) vector, a string containing DNA bases. By default, "ACTG".
`sampleType`	A character(1) vector, indicating type of sequences for building feature vectors. Options are TP (true positive) and TN (true negative) for training data, or unknown for test data.
`replaceNAdistance`	An integer(1) vector, specifying an number for avg.distanceA2PeakEnd, the average distance of As to the putative pA site, when there is no A in the downstream sequence.
`method`	A character(1) vector, specifying a machine learning method to to use. Currently, only "NaiveBayes" is implemented.
`fetchSeq`	A logical (1), indicating whether upstream and downstream sequences should be retrieved from the BSgenome object at this step or not.
`return_sequences`	A logical(1) vector, indicating whether upstream and downstream sequences should be included in the output

Value

An object of "featureVector"

Author(s)

Sarah Sheppard, Haibo Liu, Jianhong Ou, Nathan Lawson, Lihua J. Zhu

Examples

library(BSgenome.Drerio.UCSC.danRer7)
testFile <- system.file("extdata", "test.bed",
                        package = "cleanUpdTSeq")
peaks <- BED6WithSeq2GRangesSeq(file = testFile, 
                               skip = 1L, withSeq = TRUE)
## build the feature vector for the test set with sequence information 
testSet.NaiveBayes = buildFeatureVector(peaks,
                                        genome = Drerio, 
                                        upstream = 40L,
                                        downstream = 30L, 
                                        wordSize = 6L, 
                                        alphabet = "ACGT",
                                        sampleType = "unknown",
                                        replaceNAdistance = 30, 
                                        method = "NaiveBayes", 
                                        fetchSeq = FALSE,
                                        return_sequences = TRUE)

## convert the test set to GRanges without upstream and downstream 
## sequence information
peaks <- BED6WithSeq2GRangesSeq(file = testFile, 
                               skip = 1L, withSeq = FALSE)
#build the feature vector for the test set without sequence information
testSet.NaiveBayes = buildFeatureVector(peaks,
                                        genome = Drerio, 
                                        upstream = 40L,
                                        downstream = 30L, 
                                        wordSize = 6L,
                                        alphabet = "ACGT",
                                        sampleType = "unknown",
                                        replaceNAdistance = 30,
                                        method = "NaiveBayes", 
                                        fetchSeq = TRUE,
                                        return_sequences = TRUE)

jianhong/cleanUpdTSeq documentation built on June 8, 2025, 2:07 a.m.

jianhong/cleanUpdTSeq index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jianhong/cleanUpdTSeq
cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

buildFeatureVector: build Feature Vector_2
In jianhong/cleanUpdTSeq: cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

build Feature Vector_2

Description

Usage

Arguments

Value

Author(s)

Examples

Related to buildFeatureVector in jianhong/cleanUpdTSeq...

R Package Documentation

Browse R Packages

We want your feedback!

jianhong/cleanUpdTSeq cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

buildFeatureVector: build Feature Vector_2 In jianhong/cleanUpdTSeq: cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

build Feature Vector_2

Description

Usage

Arguments

Value

Author(s)

Examples

Related to buildFeatureVector in jianhong/cleanUpdTSeq...

R Package Documentation

Browse R Packages

We want your feedback!

jianhong/cleanUpdTSeq
cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

buildFeatureVector: build Feature Vector_2
In jianhong/cleanUpdTSeq: cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data