supported_locusdefs: Display supported locus definitions

Description Usage Value Selecting A Locus Definition Examples

View source: R/supported.R

Description

The locus definitions are defined as below. For advice on selecting a locus definition, see the 'Selecting A Locus Definition' section below.

nearest_tss:

The locus is the region spanning the midpoints between the TSSs of adjacent genes.

nearest_gene:

The locus is the region spanning the midpoints between the boundaries of each gene, where a gene is defined as the region between the furthest upstream TSS and furthest downstream TES for that gene. If two gene loci overlap each other, we take the midpoint of the overlap as the boundary between the two loci. When a gene locus is completely nested within another, we create a disjoint set of 3 intervals, where the outermost gene is separated into 2 intervals broken apart at the endpoints of the nested gene.

1kb:

The locus is the region within 1kb of any of the TSSs belonging to a gene. If TSSs from two adjacent genes are within 2 kb of each other, we use the midpoint between the two TSSs as the boundary for the locus for each gene.

1kb_outside_upstream:

The locus is the region more than 1kb upstream from a TSS to the midpoint between the adjacent TSS.

1kb_outside:

The locus is the region more than 1kb upstream or downstream from a TSS to the midpoint between the adjacent TSS.

5kb:

The locus is the region within 5kb of any of the TSSs belonging to a gene. If TSSs from two adjacent genes are within 10 kb of each other, we use the midpoint between the two TSSs as the boundary for the locus for each gene.

5kb_outside_upstream:

The locus is the region more than 5kb upstream from a TSS to the midpoint between the adjacent TSS.

5kb_outside:

The locus is the region more than 5kb upstream or downstream from a TSS to the midpoint between the adjacent TSS.

10kb:

The locus is the region within 10kb of any of the TSSs belonging to a gene. If TSSs from two adjacent genes are within 20 kb of each other, we use the midpoint between the two TSSs as the boundary for the locus for each gene.

10kb_outside_upstream:

The locus is the region more than 10kb upstream from a TSS to the midpoint between the adjacent TSS.

10kb_outside:

The locus is the region more than 10kb upstream or downstream from a TSS to the midpoint between the adjacent TSS.

exon:

Each gene has multiple loci corresponding to its exons. Overlaps between different genes are allowed.

intron:

Each gene has multiple loci corresponding to its introns. Overlaps between different genes are allowed.

Usage

1

Value

A data.frame with columns genome, locusdef.

Selecting A Locus Definition

For a transcription factor ChIP-seq experiment, selecting a particular locus definition for use in enrichment testing can have implications relating to how the TF regulates genes. For example, selecting the '1kb' locus definition will imply that the biological processes found enriched are a result of TF regulation near the promoter. In contrast, selecting the '5kb_outside' locus definition will imply that the biological processes found enriched are a result of TF regulation distal from the promoter.

Selecting a locus definition can also help reduce the noise in the enrichment tests. For example, if a TF is known to primarily regulate genes by binding around the promoter, then selecting the '1kb' locus definition can help to reduce the noise from TSS-distal peaks in the enrichment testing.

The plot_dist_to_tss QC plot displays where genomic regions fall relative to TSSs genome-wide, and can help inform the choice of locus definition. For example, if many peaks fall far from the TSS, the 'nearest_tss' locus definition may be a good choice because it will capture all input genomic regions, whereas the '1kb' locus definition may not capture many of the input genomic regions and adversely affect the enrichment testing.

Examples

1

chipenrich documentation built on Nov. 8, 2020, 8:11 p.m.