buildFeatureVector: buildFeatureVector

Description Usage Arguments Value Author(s) Examples

View source: R/buildFeatureVector.R

Description

This function creates a data frame. Fields include peak name, upstream sequence, downstream sequence, and features to be used in classifying the putative polyadenylation site.

Usage

1
2
3
4
buildFeatureVector(peaks, BSgenomeName = Drerio, upstream = 50L, 
	downstream = 40L, wordSize = 6L, alphabet = c("ACGT"),
 sampleType = c("TP", "TN", "unknown"), replaceNAdistance = 40L,
 method = c("NaiveBayes", "SVM"), ZeroBasedIndex = 1L, fetchSeq = FALSE)

Arguments

peaks

An object of GRanges that may contain the upstream and downstream sequence information. This item is created by the function BED2GRangesSeq.

BSgenomeName

Name of the genome to use to get sequence. To find out a list of available genomes, please type available.genomes() in R.

upstream

This is the length of upstream sequence to use in the analysis.

downstream

This is the length of downstream sequence to use in the analysis.

wordSize

This is the size of the word to use as a feature for the upstream sequence. wordSize = 6 should always be used.

alphabet

These are the bases to use, for example DNA bases ACTG.

sampleType

This is the type of sample. For example TP (true positive) or TN (true negative) for training data or unknown for test data.

replaceNAdistance

If there is no A in the downstream sequence, then use this for the average distance of As to the putative polyadenylation site.

method

This is which machine learning method to specify. For this release, method should always be set to NaiveBayes.

ZeroBasedIndex

If the coordinates are set using Zero Based indexing, set this = 1.

fetchSeq

Boolean, for getting upstream and downstream sequence at this step or not.

Value

An object of "featureVector"

Author(s)

Sarah Sheppard, Jianhong Ou, Nathan Lawson, Lihua Julie Zhu

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
testFile = system.file("extdata", "test.bed", package="cleanUpdTSeq")
testSet = read.table(testFile, sep = "\t", header = TRUE)
	
#convert the test set to GRanges with upstream and downstream sequence information
peaks = BED2GRangesSeq(testSet[1:10, ], upstream.seq.ind = 7, downstream.seq.ind = 8, withSeq=TRUE)
#build the feature vector for the test set with sequence information 
library(BSgenome.Drerio.UCSC.danRer7)
testSet.NaiveBayes = buildFeatureVector(peaks,BSgenomeName = Drerio, upstream = 40,
	   downstream = 30, wordSize = 6, alphabet=c("ACGT"),
	   sampleType = "unknown",replaceNAdistance = 30, 
	   method = "NaiveBayes", ZeroBasedIndex = 1, fetchSeq = FALSE)
    
#convert the test set to GRanges without upstream and downstream sequence information
peaks = BED2GRangesSeq(testSet[1:10, ],withSeq=FALSE)
        
#build the feature vector for the test set without sequence information
testSet.NaiveBayes = buildFeatureVector(peaks,BSgenomeName = Drerio, upstream = 40,
        downstream = 30, wordSize = 6, alphabet=c("ACGT"),
        sampleType = "unknown",replaceNAdistance = 30,
        method = "NaiveBayes", ZeroBasedIndex = 1, fetchSeq = TRUE)

cleanUpdTSeq documentation built on Nov. 8, 2020, 8:30 p.m.