process_haib_caltech_wrap: Wrapper method for processing ENCODE HAIB and Caltech HTS...

Description Usage Arguments Value Author(s) Examples

Description

process_haib_caltech_wrap is a wrapper method for processing HTS data and returning the methylation promoter regions and the corresponding gene expression data for those promoter regions. Note that the format of BS-Seq data should be in the Encode Haib bed format and for the RNA-Seq data in Encode Caltech bed format.

Usage

1
2
3
4
5
process_haib_caltech_wrap(bs_files, rna_files, chrom_size_file = NULL,
  chr_discarded = NULL, upstream = -7000, downstream = 7000,
  min_bs_cov = 4, max_bs_cov = 1000, cpg_density = 10, sd_thresh = 0.1,
  ignore_strand = TRUE, gene_log2_transf = TRUE, gene_outl_thresh = TRUE,
  gex_outlier = 300, fmin = -1, fmax = 1)

Arguments

bs_files

Filename (or vector of filenames if there are replicates) of the BS-Seq '.bed' formatted data to read values from.

rna_files

Filename of the RNA-Seq '.bed' formatted data to read values from. Currently, this version does not support pooling RNA-Seq replicates.

chrom_size_file

Optional filename containing genome chromosome sizes.

chr_discarded

A vector with chromosome names to be discarded.

upstream

Integer defining the length of bp upstream of TSS for creating the promoter region.

downstream

Integer defining the length of bp downstream of TSS for creating the promoter region.

min_bs_cov

The minimum number of reads mapping to each CpG site. CpGs with less reads will be considered as noise and will be discarded.

max_bs_cov

The maximum number of reads mapping to each CpG site. CpGs with more reads will be considered as noise and will be discarded.

cpg_density

Optional integer defining the minimum number of CpGs that have to be in a methylated region. Regions with less than n CpGs are discarded.

sd_thresh

Optional numeric defining the minimum standard deviation of the methylation change in a region. This is used to filter regions with no methylation change.

ignore_strand

Logical, whether or not to ignore strand information.

gene_log2_transf

Logical, whether or not to log2 transform the gene expression data.

gene_outl_thresh

Logical, whehter or not to remove outlier gene expression data.

gex_outlier

Numeric, denoting the threshold above of which the gene expression data (before the log2 transformation) are considered as noise.

fmin

Optional minimum range value for region location scaling. Under this version, this parameter should be left to its default value.

fmax

Optional maximum range value for region location scaling. Under this version, this parameter should be left to its default value.

Value

A processHTS object which contains following information:

Author(s)

C.A.Kapourani C.A.Kapourani@ed.ac.uk

Examples

1
2
3
4
# Obtain the path to the files
rrbs_file <- system.file("extdata", "rrbs.bed", package = "BPRMeth")
rnaseq_file <- system.file("extdata", "rnaseq.bed", package = "BPRMeth")
proc_data <- process_haib_caltech_wrap(rrbs_file, rnaseq_file)

andreaskapou/BPRMeth-devel documentation built on May 12, 2019, 3:32 a.m.