GeneToPeakDist: Find the nearest ChIP-seq binding site to each gene in a GTF...

Description Usage Arguments Value Note Examples

View source: R/GeneToPeakDist.R

Description

Find the nearest ChIP-seq binding site to each gene in a GTF file with chromatin weighting

Usage

1
2
GeneToPeakDist(ChIP, GTF, Genes = NULL, TAD = NULL, TAD_Penalty = 100,
  PCHC = NULL, PCHC_Bonus = 100, numCores = 4)

Arguments

ChIP

A dataframe containing ChIP-seq peak information in bed format. At least three columns indicating chromosome, start position, and stop position for each binding site.

GTF

A dataframe containing gene location information in Gene Transfer Format (GTF). Function expects nine columns with chromosome in column 1, gene start and stop postion in columns 4 and 5, strand information in column 6, and gene ID information in column 9. Ensure only rows corresponding to unique genes are included and chromosome column is formatted identically to ChIP-seq file.

Genes

A vector of gene IDs to which the package will calculate the nearest ChiP-seq binding distance. If NULL, all genes present in the provided GTF file will be used. Defaults to NULL.

TAD

A dataframe containing TAD boundaries with at least 3 columns containing chromosome, start position, and stop position of each TAD. Defaults to NULL.

TAD_Penalty

A numeric indicating the distance penalty to put on binding sites that fall outside a genes TAD. Only used if a TAD boundary File is provided. Defaults to 100

PCHC

A dataframe containing promoter looping information. The function expects a 5 column data frame with the bait gene ENSEMBL ID, the capture chromsome, capture start position, capture stop position, and an interaction frequencing metric. Ensure the capture chromosome column is formatted identically to the ChIP file. Defaults to NULL

PCHC_Bonus

A numeric indicating the reward to provide to ChIP peaks falling in regions that loop into a gene's promoter. Only used if a PCHC file is provided. Defaults to 100.

numCores

A numeric indicating the number of cores the function can use via R's parallel package. Defaults to 4.

Value

Returns a dataframe with two columns. The first is the ENSEMBL gene ID and the second is the distance to the nearest ChIP-seq binding site.

Note

#Citations:

Examples

1
2
3
data("GM12878_BATF_ChIP")
data("Homo_sapiens.GRCh37.82.chr.gtf")
DistanceFrame<-GeneToPeakDist(ChIP = ChIP, Genes = c("ENSG00000186092", "ENSG00000237683", "ENSG00000235249"))

rramaker/GRNF documentation built on May 20, 2019, 2:23 p.m.