estimate_distance_parameter: Calculate distance penalty parameter
In cole-trapnell-lab/cicero-release: Predict cis-co-accessibility from single-cell chromatin accessibility data

estimate_distance_parameter

R Documentation

Calculate distance penalty parameter

Description

Function to calculate distance penalty parameter (distance_parameter) for random genomic windows. Used to choose distance_parameter to pass to generate_cicero_models.

Usage

estimate_distance_parameter(
  cds,
  window = 5e+05,
  maxit = 100,
  s = 0.75,
  sample_num = 100,
  distance_constraint = 250000,
  distance_parameter_convergence = 1e-22,
  max_elements = 200,
  genomic_coords = cicero::human.hg19.genome,
  max_sample_windows = 500
)

Arguments

`cds`	A cicero CDS object generated using `make_cicero_cds`.
`window`	Size of the genomic window to query, in base pairs.
`maxit`	Maximum number of iterations for distance_parameter estimation.
`s`	Power law value. See details for more information.
`sample_num`	Number of random windows to calculate `distance_parameter` for.
`distance_constraint`	Maximum distance of expected connections. Must be smaller than `window`.
`distance_parameter_convergence`	Convergence step size for `distance_parameter` calculation.
`max_elements`	Maximum number of elements per window allowed. Prevents very large models from slowing performance.
`genomic_coords`	Either a data frame or a path (character) to a file with chromosome lengths. The file should have two columns, the first is the chromosome name (ex. "chr1") and the second is the chromosome length in base pairs. See `data(human.hg19.genome)` for an example. If a file, should be tab-separated and without header.
`max_sample_windows`	Maximum number of random windows to screen to find sample_num windows for distance calculation. Default 500.

Details

The purpose of this function is to calculate the distance scaling parameter used to adjust the distance-based penalty function used in Cicero's model calculation. The scaling parameter, in combination with the power law value s determines the distance-based penalty.

This function chooses random windows of the genome and calculates a distance_parameter. The function returns a vector of values calculated on these random windows. We recommend using the mean value of this vector moving forward with Cicero analysis.

The function works by finding the minimum distance scaling parameter such that no more than 5 distance_constraint have non-zero entries after graphical lasso regularization and such that fewer than 80 nonzero.

If the chosen random window has fewer than 2 or greater than max_elements sites, the window is skipped. In addition, the random window will be skipped if there are insufficient long-range comparisons (see below) to be made. The max_elements parameter exist to prevent very dense windows from slowing the calculation. If you expect that your data may regularly have this many sites in a window, you will need to raise this parameter.

Calculating the distance_parameter in a sample window requires peaks in that window that are at a distance greater than the distance_constraint parameter. If there are not enough examples at high distance have been found, the function will return the warning "Warning: could not calculate sample_num distance_parameters - see documentation details".When looking for sample_num example windows, the function will search max_sample_windows windows. By default this is set at 500, which should be well beyond the 100 windows that need to be found. However, in very sparse datasets, increasing max_sample_windows may help avoid the above warning. Increasing max_sample_windows may slow performance in sparse datasets. If you are still not able to get enough example windows, even with a large max_sample_windows paramter, this may mean your window parameter needs to be larger or your distance_constraint parameter needs to be smaller. A less likely possibility is that your max_elements parameter needs to be larger. This would occur if your data is particularly dense.

The parameter s is a constant that captures the power-law distribution of contact frequencies between different locations in the genome as a function of their linear distance. For a complete discussion of the various polymer models of DNA packed into the nucleus and of justifiable values for s, we refer readers to (Dekker et al., 2013) for a discussion of justifiable values for s. We use a value of 0.75 by default in Cicero, which corresponds to the “tension globule” polymer model of DNA (Sanborn et al., 2015). This parameter must be the same as the s parameter for generate_cicero_models.

Further details are available in the publication that accompanies this package. Run citation("cicero") for publication details.

Value

A list of results of length sample_num. List members are numeric distance_parameter values.

References

Dekker, J., Marti-Renom, M.A., and Mirny, L.A. (2013). Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403.
Sanborn, A.L., Rao, S.S.P., Huang, S.-C., Durand, N.C., Huntley, M.H., Jewett, A.I., Bochkov, I.D., Chinnappan, D., Cutkosky, A., Li, J., et al. (2015). Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl. Acad. Sci. U. S. A. 112, E6456–E6465.

Examples

  data("cicero_data")
  data("human.hg19.genome")
  sample_genome <- subset(human.hg19.genome, V1 == "chr18")
  sample_genome$V2[1] <- 100000
  input_cds <- make_atac_cds(cicero_data, binarize = TRUE)
  input_cds <- reduceDimension(input_cds, max_components = 2, num_dim=6,
                               reduction_method = 'tSNE',
                               norm_method = "none")
  tsne_coords <- t(reducedDimA(input_cds))
  row.names(tsne_coords) <- row.names(pData(input_cds))
  cicero_cds <- make_cicero_cds(input_cds, reduced_coordinates = tsne_coords)
  distance_parameters <- estimate_distance_parameter(cicero_cds,
                                                     sample_num=5,
                                                     genomic_coords = sample_genome)

cole-trapnell-lab/cicero-release documentation built on Sept. 4, 2024, 1:49 p.m.

cole-trapnell-lab/cicero-release index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cole-trapnell-lab/cicero-release
Predict cis-co-accessibility from single-cell chromatin accessibility data

estimate_distance_parameter: Calculate distance penalty parameter
In cole-trapnell-lab/cicero-release: Predict cis-co-accessibility from single-cell chromatin accessibility data

Calculate distance penalty parameter

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to estimate_distance_parameter in cole-trapnell-lab/cicero-release...

R Package Documentation

Browse R Packages

We want your feedback!

cole-trapnell-lab/cicero-release Predict cis-co-accessibility from single-cell chromatin accessibility data

estimate_distance_parameter: Calculate distance penalty parameter In cole-trapnell-lab/cicero-release: Predict cis-co-accessibility from single-cell chromatin accessibility data

Calculate distance penalty parameter

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to estimate_distance_parameter in cole-trapnell-lab/cicero-release...

R Package Documentation

Browse R Packages

We want your feedback!

cole-trapnell-lab/cicero-release
Predict cis-co-accessibility from single-cell chromatin accessibility data

estimate_distance_parameter: Calculate distance penalty parameter
In cole-trapnell-lab/cicero-release: Predict cis-co-accessibility from single-cell chromatin accessibility data