get_conservation_scores: Get conservation scores

View source: R/conservation.R

get_conservation_scoresR Documentation

Get conservation scores

Description

Given a peak set, this function obtains the mean conservation scores in the given window and bins.

Usage

get_conservation_scores(
  peak_file,
  id_col = 1,
  cons_score = phastCons7way.UCSC.hg38::phastCons7way.UCSC.hg38,
  bin_width = 20,
  window_width = 2000,
  n_bins = NULL,
  n_bins_exp = 2,
  merge_fun = "mean",
  random_control = TRUE,
  genome = "hg38",
  mask = NULL,
  per.chromosome = FALSE,
  summarise = TRUE
)

Arguments

peak_file

Path for the peak file or GRanges object.

id_col

Number or name of mcol that contains unique identifier for each peak. Default: 1.

cons_score

Annotation to use for getting the conservation scores. See gscores for details.

bin_width

Number of base pairs for binning the peak and obtaining the conservation scores. Ignored if the Fixed Number of Bins mode is activated (see Details). Default: 20

window_width

Number of base pairs of a window centered in the center of the peak where to focus the conservation analysis. Ignored if the Fixed Number of Bins mode is activated (see Details). Default = 2000

n_bins

Integer width the number of bins each region should be divided into. Giving it a value activates the Fixed Number of Bins mode (see Details).

n_bins_exp

If n_bins is set, factor by which to resize the input peaks. Default: 2 (output region width = peak width * 2).

merge_fun

Function for summarizing scores in the same region.

random_control

Logical indicating whether to perform the same analysis using a randomized set of peaks as a control. Default = TRUE

genome

Character indicating the name of the genome where to randomize the regions. See randomizeRegions for details.

per.chromosome

Logical indicating if the randomization should be performed along the same chromosome as the original set of peaks. See randomizeRegions for details.

summarise

Logaical indicating whether to summarise the results, computing the mean of all peaks at a specific position. Default: TRUE.

Details

This function has two modes to calculate conservation scores in a set of regions: fixed window sizes (default) or fixed number of bins.

  • Fixed window sizes. Regions are resized to be window_width and are then divided into bin_width bp bins.

  • Fixed number of bins. Regions are divided into n_bins, idependently of their size.

Value

It returns a data.frame containing the summarized phastCons conservation scores for the file and randomized control (if random_control = TRUE) at each relative position from the peak center. The summarization is performed by computing the mean for all peaks in that specific position. Standard deviation is also returned.


mireia-bioinfo/meowmics documentation built on July 29, 2023, 10 p.m.