search.pwm: function to predict transcription factor binding sites using...

Description Usage Arguments Value Author(s) References See Also Examples

Description

function to predict transcription factor binding sites using the method matchPWM from package Biostrings

Usage

1
2
3
## S4 method for signature 'cobindr'
search.pwm(x, min.score = "80%", append = FALSE, background_scan =
FALSE, n.cpu = NA)

Arguments

x

an object of the class "cobindr", which will hold all necessary information about the sequences and the hits.

min.score

minimal score to define threshold for hits (default = .8)

append

logical flag, if append=TRUE the binding sites will be appended to already existing results

background_scan

logical flag, if background_scan=TRUE the background sequences will be searched for transcription factor binding sites

n.cpu

number of CPUs to be used for parallelization. Default value is 'NA' in which case the number of available CPUs is checked and than used.

Value

x

an object of the class "cobindr" including the predicted transcription factor binding sites

Author(s)

Robert Lehmann <r.lehmann@biologie.hu-berlin.de

References

uses matchPWM from package "Biostrings" (http://www.bioconductor.org/packages/release/bioc/html/Biostrings.html)

See Also

rtfbs, search.gadem

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
############################################################
# use simulated sequences
library(Biostrings)

n <- 400 # number of input sequences
l <- 500 # length of sequences
n.hits <- 250 # number of 'true' binding sites
bases <- c("A","C","G","T") # alphabet
# generate random input sequences with two groups with differing GC content
seqs <- sapply(1:(3*n/4), function(x) paste(sample(bases, l, replace=TRUE, 
		prob=c(.3,.22,.2,.28)), collapse=""))
seqs <- append(seqs, sapply(1:(n/4), function(x) paste(sample(bases, l, replace=TRUE, 
		prob=c(.25,.25,.25,.25)), collapse="")))
path <- system.file('extdata/pfms/myod.tfpfm',package='cobindR')
motif <- read.transfac.pfm(path)[[1]] # get PFM of binding site 
# add binding sites with distance specificity
for(position in c(110, 150)) {
	hits <- apply(apply(motif, 2, function(x) sample(x=bases, size=n.hits, prob=x, 
			replace=TRUE)), 1, paste, collapse='')
	pos.hits <- round(rnorm(n.hits, mean=position, sd=8))
	names(pos.hits) <- sample(1:n, n.hits)
	for(i in 1:n.hits) substr(seqs[as.integer(names(pos.hits)[i])], start=pos.hits[i], 
							stop=pos.hits[i]+ncol(motif)) <- hits[i] 
}
#save sample sequences in fasta file
tmp.file <- tempfile(pattern = "cobindr_sample_seq", tmpdir = tempdir(), fileext = ".fasta")
writeXStringSet(DNAStringSet(seqs), tmp.file)
#run cobindr
cfg <- cobindRConfiguration()
sequence_type(cfg) <- 'fasta'
sequence_source(cfg) <- tmp.file
sequence_origin(cfg) <- 'artificial sequences'
pfm_path(cfg) <- system.file('extdata/pfms',package='cobindR')
pairs(cfg) <- 'V$MYOD_01 V$MYOD_01' 
runObj <- cobindr(cfg, name='cobindr test using sampled sequences')
# perform tfbs prediction using matchPWM
runObj.bs <- search.pwm(runObj, min.score = '90')
# show results
plot.positionprofile(runObj.bs)
# clean up
unlink(tmp.file)

cobindR documentation built on April 28, 2020, 6:40 p.m.