similarity: Calculate metrics which estimate the level of similarity...

View source: R/similaRpeakMethods.R

similarityR Documentation

Calculate metrics which estimate the level of similarity between two ChIP-Seq profiles

Description

It returns a list containing information about both ChIP-Seq profiles and a list of all similarity metrics: the ratio of the maximum values, the ratio of the areas, the ratio between the intersection area and the total area (for normalized and non-normalized profiles), the difference between two profiles maximal peaks positions and the Spearman's rho statistic.

Usage

similarity(profile1, profile2, ratioAreaThreshold = 1,
  ratioMaxMaxThreshold = 1, ratioIntersectThreshold = 1,
  ratioNormalizedIntersectThreshold = 1, diffPosMaxThresholdMinValue = 1,
  diffPosMaxThresholdMaxDiff = 100, diffPosMaxTolerance = 0.01,
  spearmanCorrSDThreashold = 1e-08)

Arguments

profile1

Vector containing the RPM values of the first ChIP-Seq profile for each position of the selected region.

profile2

Vector containing the RPM values of the second ChIP-Seq profile for each position of the selected region.

ratioAreaThreshold

The minimum denominator accepted to calculate the ratio of the area between both profiles. The value has to be positive. Default = 1.

ratioMaxMaxThreshold

The minimum denominator accepted to calculate the ratio of the maximal peaks values between both profiles. The value has to be positive. Default = 1.

ratioIntersectThreshold

The minimum denominator accepted to calculate the ratio of the intersection area of both profiles over the total area. The value has to be positive. Default = 1.

ratioNormalizedIntersectThreshold

The minimum denominator accepted to calculate the ratio of the intersection area of both normalized profiles over the total area. The value has to be positive. Default = 1.

diffPosMaxThresholdMinValue

The minimum peak accepted to calculate the metric. The value has to be positive. Default = 1.

diffPosMaxThresholdMaxDiff

The maximum distance accepted between 2 peaks positions in one profile to calculate the metric. The value has to be positive. Default=100.

diffPosMaxTolerance

The maximum of variation accepted on the maximum value to consider a position as a peak position. The value can be between 0 and 1. Default=0.01.

spearmanCorrSDThreashold

The minimum standard deviation accepted on both profiles to calculate the metric. Default=1e-8.

Details

similarity uses the two vectors passed as arguments to calculate the metrics. When the metric is a ratio, it always verify that the threshold for the denominator is respected. If the threshold is not respected, the metric is assigned the NA value.

Value

A list containing :

  • nbrPosition The number of positions included in each profile.

  • areaProfile1 The area of the first profile.

  • areaProfile2 The area of the second profile.

  • maxProfile1 The maximum value in the first profile.

  • maxProfile2 The maximum value in the second profile.

  • maxPositionProfile1 The list of positions of the maximum value in the first profile.

  • maxPositionProfile2 The list of positions of the maximum value in the second profile.

  • metrics A list with thefollowing items:

    • RATIO_AREA The ratio between the areas. The larger value is always divided by the smaller value.NA if minimal threshold is not respected.

    • DIFF_POS_MAX The difference between the maximal peaks positions. The difference is always the first profile value minus the second profile value. NA is returned if minimal peak value is not respected. A profile can have more than one position with the maximum value. In that case, the median position is used. A threshold argument can be set to consider all positions within a certain range of the maximum value. A threshold argument can also be set to ensure that the distance between two maximum values is not too wide. When this distance is not respected, it is assumed that more than one peak is present in the profile and NA is returned.

    • RATIO_MAX_MAX The ratio between the maximal peaks values. The first profile is always divided by the second profile. NA if minimal threshold is not respected.

    • RATIO_INTERSECT The ratio between the intersection area and the total area. NA if minimal threshold is not respected.

    • RATIO_NORMALIZED_INTERSECT The ratio between the intersection area and the total area of normalized profiles. NA if minimal threshold is not respected.

    • SPEARMAN_CORRELATION The Spearman's rho statistic between profiles. NA if minimal threshold is not respected or when no complete element pair is present between both profiles.

Author(s)

Astrid Deschenes, Elsa Bernatchez

See Also

  • MetricFactory for using a interface to calculate all available metrics separately or togheter.

  • demoProfiles for more informations about ChIP-Seq profiles present in the demoProfiles data.

Examples


## Defining two CHiP-Seq profiles 
profile1<-c(3,59,6,24,65,34,15,4,53,22,21,12,11)
profile2<-c(15,9,46,44,9,39,27,34,34,4,3,4,2)

## Example usign default thresholds
similarity(profile1, profile2)

## Example using customised thresholds
similarity(profile1, profile2, 
    ratioAreaThreshold=5, 
    ratioMaxMaxThreshold=5, 
    ratioIntersectThreshold=12,
    ratioNormalizedIntersectThreshold=2.2,
    diffPosMaxThresholdMinValue=2, 
    diffPosMaxThresholdMaxDiff=130, 
    diffPosMaxTolerance=0.03,
    spearmanCorrSDThreashold=1e-3)
    
## Example using ChIP-Seq profiles of H3K27ac (DCC accession: ENCFF000ASG) 
## and H3K4me1 (DCC accession: ENCFF000ARY) from the Encyclopedia of DNA  
## Elements (ENCODE) for the region 
data(demoProfiles)

## Visualize ChIP-Seq profiles 
plot(demoProfiles$chr2.70360770.70361098$H3K27ac,
    type="l", col="blue", xlab="", ylab="", ylim=c(0, 25),
    main="chr2:70360770-70361098")
par(new=TRUE)
plot(demoProfiles$chr2.70360770.70361098$H3K4me1,
    type="l", col="darkgreen", xlab="Position", 
    ylab="Coverage in reads per million (RPM)",  ylim=c(0, 25))
legend("topright", c("H3K27ac","H3K4me1"), cex=1.2, 
    col=c("blue","darkgreen"), lty=1)
    
# Calculate metrics
similarity(demoProfiles$chr2.70360770.70361098$H3K4me1, 
    demoProfiles$chr2.70360770.70361098$H3K27ac, 
    ratioAreaThreshold=15, 
    ratioMaxMaxThreshold=5, 
    ratioIntersectThreshold=12,
    ratioNormalizedIntersectThreshold=2.2,
    diffPosMaxThresholdMinValue=2, 
    diffPosMaxThresholdMaxDiff=130, 
    diffPosMaxTolerance=0.03,
    spearmanCorrSDThreashold=0.1)
    
## You can refer to the vignette to see more examples using ChIP-Seq profiles
## extracted from the Encyclopedia of DNA Elements (ENCODE) data.


adeschen/similaRpeak documentation built on March 23, 2022, 11:10 a.m.