sequenceDerivedFeatures: Extract sequence predictive features from range-based data

Description Usage Arguments Details Value See Also Examples

View source: R/sequenceDerivedFeatures.R

Description

A function to extract sequence features from the input GRanges object and the BSgenome object.

Usage

1
sequenceDerivedFeatures(x, sequence, encoding = c("onehot", "iRNA"))

Arguments

x

A GRanges object for the genomic ranges to be annotated, the width of x must all be equal.

sequence

A BSgenome or XStringSet object for the genome sequence.

encoding

Can be one of the following:

onehot

From the 5' end to the 3'end of x, each nucleotide position is coded by 4 indicators/dummy variables, where each dummy variable indicates that the position is equal to the base "A", "T", "C", and "G", respectively.

iRNA

Each nucleotide position is encoded by 4 variables, the first variable indicates that the nucleotide is purine (A or G), the second variable indicates that the nucleotide has an amino group (A or C), and the third variable indicates the formation of weak hydrogen bond (A or T), the fourth variable is calculated by the cumulative frequency of nucleotides from the leftmost position to that position.

Details

The function first extract sequence within the genomic ranges defined by x. Then, the sequences are processed according to the selected encoding method.

Value

A data.frame object whose number of rows is the length of x, and the number of columns is 4 times the width of x. The column types in the data.frame are all numeric.

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
library(BSgenome.Hsapiens.UCSC.hg19)

## Define the Granges to be annotated:
set.seed(01)

X <- GRanges(rep(c("chr1", "chr2"), c(15, 15)),
             IRanges(c(sample(11874:12127, 15), sample(38814:41527,15)), width=5),
             strand=Rle(c("+", "-"), c(15, 15)))
   
## Extract onehot encoded sequence features
seq_onehot <- sequenceDerivedFeatures(X, Hsapiens, encoding = "onehot")       
str(seq_onehot)                

## Extract iRNA encoded sequence features
seq_iRNA <- sequenceDerivedFeatures(X, Hsapiens, encoding = "iRNA")
str(seq_iRNA)

ZW-xjtlu/WhistleR documentation built on March 13, 2021, 10:50 a.m.