dataPDR1: ChIP-seq results (IP and control samples) obtained with the...

Description Usage Format Details Source References Examples

Description

ChIP-seq experiments were performed in order to identify the genomic regions that interact with the transcription factor Pdr1, in yeast Saccharomyces cerevisiae. Two samples (IP and control) were sequenced simultaneously using the Illumina technology (ENS transcriptome platform). Only the data for chrIV are available here, but complete datasets can be found online: http://bpeaks.gene-networks.net

Usage

1

Format

dataPDR1$IPdata: IPdata and controlData are dataframes with three columns. The first column comprises chromosome names, the second column comprises base positions and the third column comprises the numbers of sequences mapped at the considered position. dataPDR1$controlData: IPdata and controlData are dataframes with three columns. The first column comprises chromosome names, the second column comprises base positions and the third column comprises the numbers of sequences mapped at the considered position. dataPDR1$cdsPositions: A table with annotated positions of genes in yeast S. cerevisiae. The first column indicates chromosome names, the second and third columns indicate respectively "start" and "end" positions of genes, and the fourth column indicates the gene annotation (according to the Saccharomyces Genome Database (SGD http://www.yeastgenome.org/)).

Details

Complete procedure to analyze sequencing data (intial FASTQ files) can be found online:

http://bpeaks.gene-networks.net. Initial read length was 50 bases. After quality controls and filtering of low quality bases, around 30.000.000 of sequences (IP sample) and around 88.000.000 of sequences (control sample) were independantly mapped on the genome using the bowtie algorithm [1]. Output files (SAM format) were converted into BAM files and indexed using the SAMTOOLS suite [2]. Numbers of sequences mapped on each nucleotide in the reference genome were finally calculated using the "genomeCoverageBed" tool available from the BEDTOOLS suite [3].

Source

Sample sequencing was performed at the "transcriptome platform", ENS institute in Paris (France), http://www.transcriptome.ens.fr.

References

More information concerning this dataset can found online : http://bpeaks.gene-networks.net.

[1] Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.

[2] Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078 9. [PMID: 19505943]

[3] Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841 842.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# get library
library(bPeaks)

# get data
data(dataPDR1)

summary(dataPDR1)
#             Length Class      Mode
# IPdata       3      data.frame list
# controlData  3      data.frame list
# cdsPositions 5      data.frame list

# run peak calling, comparing IP and control samples
# (only 10 kb of chrIV are analyzed here, as an illustration)
bPeaksAnalysis(IPdata = dataPDR1$IPdata[40000:50000,], 
               controlData = dataPDR1$controlData[40000:50000,], 
               windowSize = 150, windowOverlap = 50, 
               IPcoeff = 4, controlCoeff = 2, 
	       log2FC = 1, averageQuantiles = 0.5,
               resultName = "bPeaks_example", 
               peakDrawing = TRUE, promSize = 800)

## Not run: 
# -> bPeaks analysis, all chromosome IV and default parameters (optimized for yeasts)

# STEP 1: get PDR1 data (ChIP-seq experiments, IP and control samples, 
# 	related to the transcription factor Pdr1 in yeast Saccharomyces 
# 	cerevisiae) 
data(dataPDR1)

# STEP 2: bPeaks analysis
bPeaksAnalysis(IPdata = dataPDR1$IPdata, 
               controlData  = dataPDR1$controlData,
	       cdsPositions = dataPDR1$cdsPositions, 
               windowSize = 150, windowOverlap = 50, 
               IPcoeff = 2, controlCoeff = 2, 
               log2FC = 2, averageQuantiles = 0.9,
               resultName = "bPeaks_PDR1", 
               peakDrawing = TRUE)

# STEP 3 : procedure to locate peaks according to 
# 	   gene positions
peakLocation(bedFile = "bPeaks_PDR1_bPeaks_allGenome.bed", 
            cdsPositions = yeastCDS$Saccharomyces.cerevisiae,
            outputName = "bPeakLocation_finalPDR1", promSize = 800)

# -> Note that cds (genes) positions are stored in bPeaks package for several yeast
# species
data(yeastCDS)

summary(yeastCDS)
#                         Length Class      Mode     
# Debaryomyces.hansenii    31370  -none-     character
# Eremothecium.gossypii    23615  -none-     character
# Kluyveromyces.lactis     25380  -none-     character
# Pichia.sorbitophila      55875  -none-     character
# Saccharomyces.kluyveri   27790  -none-     character
# Yarrowia.lipolytica      32235  -none-     character
# Zygosaccharomyces.rouxii 24955  -none-     character
# Saccharomyces.cerevisiae     5  data.frame list     
# Candida.albicans             5  data.frame list     
# Candida.glabrata             5  data.frame list

# number of detected peaks
print(resLocation$numPeaks)

# number of peaks "upstream" annotated genes (W strand)
print(resLocation$beforeFeatures)

# number of peaks "in" annotated genes
print(resLocation$inFeatures)

# number of peaks "upstream" annotated genes (C strand)
print(resLocation$afterFeatures)

## End(Not run)

bPeaks documentation built on May 1, 2019, 10:14 p.m.