This will be an introductory vignette on how and why to use GenomicLayers.

Introduction

Rationale...

GenomicLayers simulates epi-genetic changes to DNA sequence or entire genomes. It does this by linking the DNA sequence, which cannot be changed, with one or more equally sized layers, which can be modified by binding factors. The binding factors may recognise specific DNA sequences or motifs or they may recognise existing modifications already applied to the layers. After binding, the binding factors may then make modifications to any or all of the layers. In this way, some regions of the sequence or genomes may change states (e.g. open vs closed) and permit or deny access to further binding factors.

Installation

You may have already done this but if not, you can install the GenomicLayers package direct from GitHub if you already have the devtools package installed and working. For devtools see here: ...

If you already have devtools, then the following commands should install GenomicLayers:-

    library(devtools)
    install_github(repo="davetgerrard/GenomicLayers",build_vignettes = TRUE)

The above may take several minutes and requires several dependencies. If it does not work, or you are in a hurry, leave out the 'build_vignettes'. Hint: make sure devtools is up to the most recent version.

    install_github(repo="davetgerrard/GenomicLayers")

To view the introduction vignette, type

    vignette("GenomicLayersIntroduction")

To view what vignettes are available, type

    vignette(package="GenomicLayers")

Set up the target sequence or genome

Load the Biostrings library.

library(Biostrings)    # (this command outputs a lot of information, which is hidden in this vignette)

Our sequence can be based on a single DNA sequence or on a entire genome. For individual sequences we convert them to DNAString objects. If the latter, it must be loaded as a BSgenome object. See ...

GenomicLayers is designed to work on whole eukaryotic genomes including human and mouse but as these are rather large, for the purpose of demonstration we will use a shorter sequence as an example. The Saccharomyces cerevisiae (Yeast) chromosome 1 is included in the Biostrings package, which should have been installed along with GenomicLayers

data("yeastSEQCHR1")   # from Biostrings package
class(yeastSEQCHR1)
nchar(yeastSEQCHR1)
targetSeq <- DNAString(yeastSEQCHR1)
targetSeq 

The chromosome is now stored as a DNAString object, which gives us access to a range of pattern searching functions provided by the Biostrings package.

To use larger sequences and genomes comprised of multiple chromosomes we expect users to make use of the Bioconductor BSgenome packages. For more on this, see the section Using larger genomes below.

Create some layers on our sequence.

This is where we use the sequence and build some layers on it.

library(GenomicLayers)
layers1 <- createLayerList.DNAstring(targetSeq, layerNames=c("recruiter","promoter"))
layers1

Create some binding factors

This is where we specify some binding factors that can bind to the sequence and, optionally, modify the layers (but not the sequence).

If you want to learn about pattern matching against DNA sequences, we recommend that you check out the vignettes in the Biostrings package. Perhaps also have a look at PWMenrich, MOTIV etc.

We first create a binding factor that can recognise a DNA motif for a TATA box, we'll call it tata-box and then test it against the layerSet object we've already created. It should create an IRanges or hits object with the locations for all matches to the sequence TATAA. We'll also specify that this binding factor should mark those positions on the layer recruiter.

bf1 <- createBindingFactor.DNA_consensus(name="tata-box", patternString = "TATAWAWA", profile.layers=NULL, mod.layers="recruiter",mod.marks=1) 
#print(bf1)
matchBindingFactor(layerSet=layers1$layerSet, bindingFactor=bf1)

Then create a second binding factor that does not recognise a particular sequence, but which can bind to marks on the recruiter layer . If we test this binding factor alone, it should produce a hits object with no hits because the recruiter layer starts with not marked regions.

bf2 <- createBindingFactor.layer_region(name="tata-box", profile.layers="recruiter",profile.marks=1,                                                                        mod.layers="promoter",mod.marks=1,  patternLength = 5) 
#print(bf2)
matchBindingFactor(layerSet=layers1$layerSet, bindingFactor=bf2)

To use multiple binding factors together in a sequence, we add them into a named list. IMPORTANT: currently, this list must have names, so typically we specify the names when building the list. The list names do not have to agree with the $name property of the individual binding factors (this allows the same binding factor to be re-used without re-specifying the model).

(Also, nicer formatting for binding factors when printing)

bf_list <- list(bf1=bf1, bf2=bf2)
#print.bfSet(list(bf1))  # not working in package. FIX this .

Run a sequence of layer binding

Run a simulation and collect the output

modLayers <- runLayerBinding(layerList=layers1, factorSet=bf_list, verbose=TRUE, iter=1000)

Hopefully, the layer called 'promoter' should now be marked in some places.

modLayers$layerSet[['promoter']]

Visualise the results

What happened in the simulation?

Are the marked regions anywhere near the real Yeast promoters? - No.

Using larger genomes

Get to know the BSgenome packages....



davetgerrard/GenomicLayers documentation built on April 23, 2024, 3:55 p.m.