makeCountSet: make differential binding sites data frame

Description Usage Arguments Value Examples

View source: R/makeCountSet.R

Description

This is an utility function to create a data frame. The data frame contains binding sites merged by peaks from two conditions, count ChIP read counts, smoothing control counts for each candidate region, and indicate the common peaks from two conditions.

Usage

1
	makeCountSet(conf,design,filetype,species,peak.center=FALSE,peak.ext=0,binsize=50,mva.span=c(1000,5000,10000))

Arguments

conf

A data frame that represents the ChIP experiments information. It contains 6 columns,sampleID,condition,factor,ipReads,ctReads,peaks. condition refers to treatment condition or cell line; factor refers to transcription factor or histone modification; ipReads is the ChIP sequence data in bam or bed format; ctReads is the control sequence data in bam or bed format; peaks is the called peaks from existing peak-calling software.

design

Two column design matrix. The number of rows equals number of ChIP samples from two conditions. The first column are all 1s, which indicates intercept in regression model. The second column are 1s for one condition and 0s for another condition.

filetype

Two sequence file types are supported (bed or bam).

species

Two species are supported (hg19 or mm9). Other species are supported by specifying other.

peak.center

This argument is coupled with peak.ext. Default is FALSE. The argument is used when centered regions of peaks are more of interest.

peak.ext

This argument is coupled with peak.center. Default is 0.

binsize

binsize in bp to calculate the smooth local lambda in poisson distribution. The default is 50bp.

mva.span

1 kb, 5 kb or 10 kb window centered at the peak location in the control sample.

Value

A object ChIPComp. Column chr,start,end are the binding site genomic coordinate; Column ip_c(\#condition)_r(\#replicate) indicates the ChIP counts in \#replicate in \#condition; Column ct_c(\#condition)_r(\#replicate) indicates the smoothing control counts in \#replicate in \#condition; Column commonPeak indicates the common binding sites.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
	conf=data.frame(
		SampleID=1:4,
		condition=c("Helas3","Helas3","K562","K562"),
		factor=c("H3k27ac","H3k27ac","H3k27ac","H3k27ac"),
		ipReads=system.file("extdata",c("Helas3.ip1.bed","Helas3.ip2.bed","K562.ip1.bed","K562.ip2.bed"),package="ChIPComp"),
		ctReads=system.file("extdata",c("Helas3.ct.bed","Helas3.ct.bed","K562.ct.bed","K562.ct.bed"),package="ChIPComp"),
		peaks=system.file("extdata",c("Helas3.peak.bed","Helas3.peak.bed","K562.peak.bed","K562.peak.bed"),package="ChIPComp")
	)
	conf$condition=factor(conf$condition)
  conf$factor=factor(conf$factor)
	design=as.data.frame(lapply(conf[,c("condition","factor")],as.numeric))-1
	design=as.data.frame(model.matrix(~condition,design))
	countSet=makeCountSet(conf,design,filetype="bed", species="hg19",binsize=1000)

ChIPComp documentation built on Nov. 8, 2020, 5:24 p.m.