generate_cicero_models: Generate cicero models

Description Usage Arguments Details Value References See Also Examples

View source: R/runCicero.R

Description

Function to generate graphical lasso models on all sites in a CDS object within overlapping genomic windows.

Usage

1
2
3
4
5
6
7
8
generate_cicero_models(
  cds,
  distance_parameter,
  s = 0.75,
  window = 5e+05,
  max_elements = 200,
  genomic_coords = cicero::human.hg19.genome
)

Arguments

cds

A cicero CDS object generated using make_cicero_cds.

distance_parameter

Distance based penalty parameter value. Generally, the mean of the calculated distance_parameter values from estimate_distance_parameter.

s

Power law value. See details.

window

Size of the genomic window to query, in base pairs.

max_elements

Maximum number of elements per window allowed. Prevents very large models from slowing performance.

genomic_coords

Either a data frame or a path (character) to a file with chromosome lengths. The file should have two columns, the first is the chromosome name (ex. "chr1") and the second is the chromosome length in base pairs. See data(human.hg19.genome) for an example. If a file, should be tab-separated and without header.

Details

The purpose of this function is to compute the raw covariances between each pair of sites within overlapping windows of the genome. Within each window, the function then estimates a regularized correlation matrix using the graphical LASSO (Friedman et al., 2008), penalizing pairs of distant sites more than proximal sites. The scaling parameter, distance_parameter, in combination with the power law value s determines the distance-based penalty.

The parameter s is a constant that captures the power-law distribution of contact frequencies between different locations in the genome as a function of their linear distance. For a complete discussion of the various polymer models of DNA packed into the nucleus and of justifiable values for s, we refer readers to (Dekker et al., 2013) for a discussion of justifiable values for s. We use a value of 0.75 by default in Cicero, which corresponds to the “tension globule” polymer model of DNA (Sanborn et al., 2015). This parameter must be the same as the s parameter for estimate_distance_parameter.

Further details are available in the publication that accompanies this package. Run citation("cicero") for publication details.

Value

A list of results for each window. Either a glasso object, or a character description of why the window was skipped. This list can be directly input into assemble_connections to create a reconciled list of cicero co-accessibility scores.

References

See Also

estimate_distance_parameter

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  data("cicero_data")
  data("human.hg19.genome")
  sample_genome <- subset(human.hg19.genome, V1 == "chr18")
  sample_genome$V2[1] <- 100000
  input_cds <- make_atac_cds(cicero_data, binarize = TRUE)
  input_cds <- reduceDimension(input_cds, max_components = 2, num_dim=6,
                               reduction_method = 'tSNE',
                               norm_method = "none")
  tsne_coords <- t(reducedDimA(input_cds))
  row.names(tsne_coords) <- row.names(pData(input_cds))
  cicero_cds <- make_cicero_cds(input_cds, reduced_coordinates = tsne_coords)
  model_output <- generate_cicero_models(cicero_cds,
                                         distance_parameter = 0.3,
                                         genomic_coords = sample_genome)

cicero documentation built on Dec. 10, 2020, 2 a.m.