annotate_cds_by_site: Add feature data columns to fData

View source: R/utils.R

annotate_cds_by_siteR Documentation

Add feature data columns to fData

Description

Annotate the sites of your CDS with feature data based on coordinate overlap.

Usage

annotate_cds_by_site(
  cds,
  feature_data,
  verbose = FALSE,
  maxgap = 0,
  all = FALSE,
  header = FALSE
)

Arguments

cds

A CDS object.

feature_data

Data frame, or a character path to a file of feature data. If a path, the file should be tab separated. Default assumes no header, if your file has a header, set header = FALSE. For either a data frame or a path, the file should be in bed-like format, with the first 3 columns containing chromosome, start and stop respectively. The remaining columns will be added to the fData table as feature data.

verbose

Logical, should progress messages be printed?

maxgap

The maximum number of base pairs allowed between the peak and the feature for the feature and peak to be considered overlapping. Default = 0 (overlapping). Details in findOverlaps-methods. If maxgap is set to "nearest" then the nearest feature will be assigned regardless of distance.

all

Logical, should all overlapping intervals be reported? If all is FALSE, the largest overlap is reported.

header

Logical, if reading a file, is there a header?

Details

annotate_cds_by_site will add columns to the fData table of a CDS object based on the overlap of peaks with features in a data frame or file. An "overlap" column will be added, along with any columns beyond the three required columns in the feature data. The "overlap" column is the number of base pairs overlapping the fData site. When maxgap is used, the true overlap is still calculated (overlap will be 0 if the two features only overlap because of maxgap) NA means that there was no overlapping feature. If a peak overlaps multiple data intervals and all is FALSE, the largest overlapping interval will be chosen (in a tie, the first entry is taken), otherwise all intervals will be chosen and annotations will be collapsed using a comma as a separator.

Value

A CDS object with updated fData table.

Examples

  data("cicero_data")
  input_cds <- make_atac_cds(cicero_data, binarize = TRUE)
  feat <- data.frame(chr = c("chr18", "chr18", "chr18", "chr18"),
                     bp1 = c(10000, 10800, 50000, 100000),
                     bp2 = c(10700, 11000, 60000, 110000),
                     type = c("Acetylated", "Methylated", "Acetylated",
                     "Methylated"))
  input_cds <- annotate_cds_by_site(input_cds, feat)


cole-trapnell-lab/cicero-release documentation built on Sept. 4, 2024, 1:49 p.m.