chipseq_hooks: Read ChIP-Seq dataset

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/read_data_hooks.R

Description

Read ChIP-Seq dataset

Usage

1
chipseq_hooks(..., RESET = FALSE, READ.ONLY = NULL, LOCAL = FALSE)

Arguments

...

please ignore, see 'details' section.

RESET

remove all hooks

READ.ONLY

please ignore

LOCAL

please ignore

Details

Unlike methylation dataset which is always stored as matrix, ChIP-Seq dataset is stored as a list of peak regions that each one corresponds to peaks in one sample. In many cases, there are ChIP-Seq datasets for multiple histone marks that each mark does not include all samples sequenced in e.g. whole genome bisulfite sequencing or RNA-Seq, thus, to import such type of flexible data format, users need to define following hook functions:

sample_id

This self-defined function returns a list of sample IDs given the name of a histone mark.

peak

This function should return a GRanges object which are peaks for a given histone mark in a given sample. The GRanges object should better have a meta column named "density" which is the density of the histone modification signals. (**Note when you want to take the histone modification signals as quatitative analysis, please make sure they are properly normalized between samples**)

chromHMM

This hook is optional. If chromatin segmentation by chromHMM is avaialble, this hook can be defined as a function which accepts sample ID as argument and returns a GRanges object. The GRanges object should have a meta column named "states" which is the chromatin states inferred by chromHMM.

The chipseq_hooks$peak() must have two arguments mark and sid which are the name of the histone mark and the sample id. There can also be more arguments such as chromosomes.

As an example, let's assume the peak files are stored in a format of path/$sample_id/$mark.bed, then we can define hooks functions as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  # here `qq` is from GetoptLong package which allows simple variable interpolation
  chipseq_hooks$sample_id = function(mark) {
      peak_files = scan(pipe(qq("ls path/*/@{mark}.bed")), what = "character")
      sample_id = gsub("^path/(.*?)/.*$", "\1", peak_files)
      return(sample_id)
  }

  # here ... is important that the epik package will pass more arguments to it
  chipseq_hooks$peak = function(mark, sid, ...) {
      peak_file = qq("path/@{sid}/@{mark}.bed")
      df = read.table(peak_file, sep = "\t", stringsAsFactors = FALSE)
      GRanges(seqnames = df[[1]], ranges = IRanges(df[[2]], df[[3]]), density = df[[5]])
  }  

Normally chipseq_hooks$peak() are not directly used, it is usually used by get_peak_list to read peaks from all samples as a list. You can also add more arguments when defining chipseq_hooks$peak() that these arguments can be passed from get_peak_list as well. For example, you can add chromosome name as the third argument that you do not need to read the full dataset at a time:

1
2
3
4
5
6
  # to make it simple, let's assume it only allows one single chromosome
  chipseq_hooks$peak = function(mark, sid, chr) {
      peak_file = qq("path/@{sid}/@{mark}.bed")
      df = read.table(pipe(qq("awk '$1==\"@{chr}\"' @{peak_file}")), sep = "\t", stringsAsFactors = FALSE)
      GRanges(seqnames = df[[1]], ranges = IRanges(df[[2]], df[[3]]), density = df[[5]])
  }  

then you can call get_peak_list as:

1
  get_peak_list(mark, chr = "chr1")  

The chipseq_hooks$chromHMM() must have one argument sid which is the sample id, also there can be more arguments such as chromosomes. The usage for the additional argumetns are same as chipseq_hooks$peak().

Value

Hook functions

Author(s)

Zuguang Gu <z.gu@dkfz.de>

See Also

get_peak_list, get_chromHMM_list

Examples

1
2
# There is no example
NULL

jokergoo/epik documentation built on Sept. 28, 2019, 9:20 a.m.