damid.seq: DamID-Seq pipeline

Description Usage Arguments Details Value Author(s) Examples

Description

Full DamID-Seq pipeline; includes adapter removal, mapping and enrichment finding.

Usage

1
2
3
4
5
damid.seq(raw.file.name, input.format="fastq", multi.core=F,
          exp.name="damid-gatc-sites",
          bin.len=100000, adapt.seq="CGCGGCCGAG", errors=1, qc=T,
          species="BSgenome.Celegans.UCSC.ce10", chr.names = NULL,
          restr.seq="GATC", normali=T, log10t=T, m.hits=1)

Arguments

raw.file.name

A tab-separated file with 2 or 3 columns. See raw-file.txt as example!

  • first column: the path to the raw fastq (fastq.gz) or fasta file (if mapping = FALSE, the file must be a grange .RData object).

  • Second column: name of the sample.

  • Third column: single character ā€˜sā€™ for sample and ā€˜cā€™ for control (if not set the pipeline does not perform a log2 fold change output). The number of samples ("s") must be the same as the number of controls ("c").

input.format

if "fastq" or "fasta", the pipeline starts with the raw fastq/a(.gz) or fastq/a files after sequencing. If "g", the pipeline starts with the GRange object made by the full pipeline to skip the adapter removal and rbowtie mapping. If "b" the pipeline starts with mapped .bam files

multi.core

if TRUE, bowtie is performed with multiple cores

exp.name

a generic name for your experiment

bin.len

the bin length for the analysis

errors

errors in adapter sequence allowed during adapter removal

adapt.seq

the adapter sequence to be removed before mapping

qc

if TRUE, a raw reads quality control is performed

species

an object of class "BSgenome", the name of the BSgenome object for rbowtie mapping and GATC sites extraction

chr.names

a string vector of chromosome names convention of BSgenome-utils or NULL to take all chromosomes from species

restr.seq

the sequence of the restriction site

normali

if TRUE, the reads are normalized by dividing the reads in a bin by the total reads numbers of the sample

log10t

if TRUE, reads per bin are log transformed in qc plots

m.hits

Number of mapping positions allowed per read for bowtie mapping (1 = only unique mappable reads are mapped)

...

not used.

Details

The tab-separated values in raw.file.name must be provided as shown in the example below to analyze a sample using controls. Column names can be chosen freely.

FilePath/Name sampleName groupIndex
rawfile/sample.fastq.gz test.sample "s"
rawfile/contro.fastq.gz test.control "c"

The group index can be omited if no sample-control comparison is wished. (Log2 fold change between sample and control will not be plotted)

FilePath/Name sampleName
rawfile/sample.fastq.gz test.sample
rawfile/contro.fastq.gz test.contrl

several files are saved during the pipeline in several folders.

folder name saved files
cutadapter cut reads (.fastq/a)
cutadapter cut read length (.txt)
cutadapter cut read information (.txt)
bowtie mapped reads (.bam)
bowtie bam information (.bam.bai)
bowtie bam information (.txt)
granges reads per input (sample) file (.Rdata)
granges reads per GATC site (.Rdata)
granges reads per GATC site plus strand (.Rdata)
granges reads per GATC site minus strand (.Rdata)
granges reads per GATC fragment (.Rdata)
granges GATC sites in genome
qc-reports fastqc report if qc = T (.txt)
qc-reports correlation between samples: GATC and bin (.txt)
qc-reports number of reads lost in each step (.txt)
results correlation and read distribution: GATC and bin (.pdf)
results read plots: GATC and bin (.pdf)
results log2 fold change sample / control (.pdf)

Value

A list with 5 GRanges-class objects.

1 $GATC sequencing reads per GATC site
2 $GATC.plus sequencing reads per GATC site on the plus strand
3 $GATC.minus sequencing reads per GATC site on the minus strand
4 $GATC.frag sequencing reads per GATC fragment
5 $bins sequencing reads per chosen bin

Author(s)

Dominic Ritler

Examples

1
2
3
4
5
6
damid.seq(raw.file.name = "raw-file.txt", input.format = "fastq",
          multi.core = T, exp.name = "test", bin.len = 100000,
          adapt.seq="CGCGGCCGAG", errors = 1,  qc = F,
          species = "BSgenome.Celegans.UCSC.ce10",
          chr.names = NULL, restr.seq = "GATC", normali = T, log10t = F,
          m.hits = 1)

damidseq/RDamIDSeq documentation built on May 14, 2019, 3:33 p.m.