mapFeaturesToCRMs: R interface to bed_to_matrix REST in server

Description Usage Arguments Value Examples

View source: R/mapFeaturesToCRMs.R

Description

The mapFeaturesToCRMs function allows the user to create a training set matrix to build a predictive model. The training set is composed of positive regions (known to be involved in the pathway of interest) and negative regions (randomly picked or known to not be involved in the pathway of interest) that will be described (scored) by features. Three types of features file format are accepted: Position specific scoring matrices modeling motifs recognised by transcription factors, bed files containing region coordinates for any discrete feature (NGS peaks, conservation blocks) and wig/bigWig files containing signal data. This script has been tested with version 0.99 of the online server. Go here to see current version of the server http://ifbprod.aitorgonzalezlab.org/map_features_to_crms.php

Usage

1
2
3
4
5
6
  mapFeaturesToCRMs(URL = "http://ifbprod.aitorgonzalezlab.org/map_features_to_crms.php",
  positive.bed = NULL, genome = NULL, negative.bed = NULL,
  shuffling = NULL, background.seqs = NULL, genome.info = NULL,
  pssm = NULL, background.freqs = NULL, ngs = NULL, bed.overlap = NULL,
  my.values = NULL, feature.ranking = NULL, feature.nb = NULL,
  crm.feature.file = NULL, stderr.log.file = NULL, stdout.log.file = NULL)

Arguments

URL

URL of the server REST target

positive.bed

Positive bed file path. Compulsory

genome

Genome code, eg. dm3 for Drosophila Melanogaster. Compulsory

negative.bed

Negative bed file path.

shuffling

Integer with number of time shuffle background sequences (background.seqs). If negative.bed is NULL and shuffling is set at 0, the feature matrix does not contain negative sequences. It is useful to produce a test set matrix.

background.seqs

Background sequences used for shuffling. If shuffling = 0, set this parameter at 0.

genome.info

File require for shuffling bed. If shuffling = 0, set this parameter at 0.

pssm

Position specific scoring matrices

background.freqs

Background frequencies of nucleotides in genome

ngs

NGS (bed and wig) files

bed.overlap

Minimal overlap as a fraction of query sequence with NGS bed peak. Equivalent with intersectBed -f argument. Default 1bp.

my.values

Bed file where fourth column are values to append to the SVM matrix

feature.ranking

File with ranked features (Output of rankFeatures). It is used for scoring a query bed file

feature.nb

Integer with feature.nb

crm.feature.file

Path to feature matrix file

stderr.log.file

Path to error log

stdout.log.file

Path to standard output log

Value

A list

feature.matrix

a data frame where each row is a region and each column a feature, each cell carry a score, the first column is the response vector

stdout.log

Standard output log of mapFeaturesToCRMs script in server

stderr.log

Standard error log of mapFeaturesToCRMs script in server

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 
 dirPath <- system.file("extdata", package="LedPred")
 file.list <-   list.files(dirPath, full.names=TRUE)
 background.freqs <- file.list[grep("freq", file.list)]
 positive.regions <-  file.list[grep("positive", file.list)]
 negative.regions <-  file.list[grep("negative", file.list)]
 TF.matrices <-  file.list[grep("tf", file.list)]
 ngs.path <- system.file("extdata/ngs", package="LedPred")
 ngs.files=list.files(ngs.path, full.names=TRUE)
 crm.features.list <- mapFeaturesToCRMs(positive.bed=positive.regions,
     negative.bed=negative.regions,  background.freqs=background.freqs,
     pssm=TF.matrices, genome="dm3", ngs=ngs.files,
     crm.feature.file = "crm.features.tab",
     stderr.log.file = "stderr.log", stdout.log.file = "stdout.log")
 names(crm.features.list)
 class(crm.features.list$crm.features)
 crm.features.list$stdout.log
 crm.features.list$stderr.log

## End(Not run)

LedPred documentation built on Nov. 8, 2020, 8 p.m.