Description Usage Arguments Value Author(s) Examples
View source: R/deprecated_process_functions.R
(DEPRECATED) process_haib_caltech_wrap
is a wrapper
method for processing HTS data and returning the methylation promoter
regions and the corresponding gene expression data for those promoter
regions. Note that the format of BS-Seq data should be in the Encode Haib
bed format and for the RNA-Seq data in Encode Caltech bed format.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | process_haib_caltech_wrap(
bs_files,
rna_files,
chrom_size_file = NULL,
chr_discarded = NULL,
upstream = -7000,
downstream = 7000,
min_bs_cov = 4,
max_bs_cov = 1000,
cpg_density = 10,
sd_thresh = 0.1,
ignore_strand = TRUE,
gene_log2_transf = TRUE,
gene_outl_thresh = TRUE,
gex_outlier = 300,
fmin = -1,
fmax = 1
)
|
bs_files |
Filename (or vector of filenames if there are replicates) of the BS-Seq '.bed' formatted data to read values from. |
rna_files |
Filename of the RNA-Seq '.bed' formatted data to read values from. Currently, this version does not support pooling RNA-Seq replicates. |
chrom_size_file |
Optional filename containing genome chromosome sizes. |
chr_discarded |
A vector with chromosome names to be discarded. |
upstream |
Integer defining the length of bp upstream of TSS for creating the promoter region. |
downstream |
Integer defining the length of bp downstream of TSS for creating the promoter region. |
min_bs_cov |
The minimum number of reads mapping to each CpG site. CpGs with less reads will be considered as noise and will be discarded. |
max_bs_cov |
The maximum number of reads mapping to each CpG site. CpGs with more reads will be considered as noise and will be discarded. |
cpg_density |
Optional integer defining the minimum number of CpGs that
have to be in a methylated region. Regions with less than |
sd_thresh |
Optional numeric defining the minimum standard deviation of the methylation change in a region. This is used to filter regions with no methylation change. |
ignore_strand |
Logical, whether or not to ignore strand information. |
gene_log2_transf |
Logical, whether or not to log2 transform the gene expression data. |
gene_outl_thresh |
Logical, whehter or not to remove outlier gene expression data. |
gex_outlier |
Numeric, denoting the threshold above of which the gene expression data (before the log2 transformation) are considered as noise. |
fmin |
Optional minimum range value for region location scaling. Under this version, this parameter should be left to its default value. |
fmax |
Optional maximum range value for region location scaling. Under this version, this parameter should be left to its default value. |
A processHTS
object which contains following information:
methyl_region
: A list containing methylation data,
where each entry in the list is an L_{i} X 3 dimensional matrix,
where L_{i} denotes the number of CpGs found in region i
. The
columns contain the following information:
1st column: Contains the locations of CpGs relative to TSS. Note that the actual locations are scaled to the (fmin, fmax) region.
2nd column: Contains the total reads of each CpG in the corresponding location.
3rd column: Contains the methylated reads each CpG in the corresponding location.
gex
: A vector containing the corresponding gene
expression levels for each entry of the methyl_region
list.
prom_region
: A GRanges
object
containing corresponding annotated promoter regions for each entry of the
methyl_region
list. The GRanges object has one additional metadata
column named tss
, which stores the TSS of each promoter.
rna_data
: A GRanges
object containing
the corresponding RNA-Seq data for each entry of the methyl_region
list. The GRanges object has three additional metadata columns which are
explained in read_rna_encode_caltech
upstream
:
Integer defining the length of bp upstream of TSS.
downstream
: Integer defining the length of bp downstream of TSS.
cpg_density
: Integer defining the minimum number of CpGs that
have to be in a methylated region. Regions with less than n
CpGs are
discarded.
sd_thresh
: Numeric defining the minimum standard
deviation of the methylation change in a region. This is used to filter
regions with no methylation change.
fmin
: Minimum range
value for region location scaling.
fmax
: Maximum range value
for region location scaling.
C.A.Kapourani C.A.Kapourani@ed.ac.uk
1 2 3 4 | # Obtain the path to the files
rrbs_file <- system.file("extdata", "rrbs.bed", package = "BPRMeth")
rnaseq_file <- system.file("extdata", "rnaseq.bed", package = "BPRMeth")
proc_data <- process_haib_caltech_wrap(rrbs_file, rnaseq_file)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.