GetDistFromFeaturePos: GetDistFromFeaturePos Function (ModAnnot)
In AlexisHardy/DNAModAnnot: Toolbox for DNA Modifications filtering and annotation

GetDistFromFeaturePos

R Documentation

GetDistFromFeaturePos Function (ModAnnot)

Description

Return, in GRanges objects via a GRangesList, the distance between the "Positions" provided with "grangesData" argument (e.g. position of some target sites) and the feature positions provided with "grangesAnnotations". If the ranges in "grangesAnnotations" are not 1-bp positions, the positions of the boundaries will be used as the feature positions: in this case, 2 GRanges ("Position" vs featureStart; "Position" vs featureEnd) will be exported in the output instead of 1 Granges ("Position" vs featureStart). This function can also directly compute counts or proportion of the provided "Positions" at each nucleotide position around provided features (see "lGetGRangesInsteadOfListCounts" argument).

Usage

GetDistFromFeaturePos(
  grangesAnnotations,
  cSelectFeature = NULL,
  grangesData,
  lGetGRangesInsteadOfListCounts = FALSE,
  lGetPropInsteadOfCounts = TRUE,
  nWindowSizeAroundFeaturePos,
  cWhichStrandVsFeaturePos = "both",
  lAddCorrectedDistFrom5pTo3p = TRUE,
  cFeaturePosNames = c("Start", "End")
)

Arguments

`grangesAnnotations`	A GRanges object containing the annotation for the genome assembly corresponding to the grangesModPos and gposModTargetBasePos provided. The Genomic features categories must be in a column named "type".
`cSelectFeature`	The name of the feature from the annotation to be analysed. Defaults to NULL (all ranges from grangesAnnotations will be kept).
`grangesData`	A GRanges object containing "Positions" to be counted around features positions.
`lGetGRangesInsteadOfListCounts`	If FALSE, return, in dataframes via a list, the counts (or proportion if "lGetPropInsteadOfCounts" is TRUE) of "Positions" at each base position around feature positions. Defaults to FALSE.
`lGetPropInsteadOfCounts`	If "lGetGRangesInsteadOfListCounts" and "lGetPropInsteadOfCounts" are TRUE, return the proportion of "Positions" near feature positions: counts / sum of counts. If the ranges in "grangesAnnotations" are not 1-bp positions, the proportion of "Positions" is calculated near both feature positions (Start and End): counts / (sum of counts near feature1 + sum of counts near feature2). Defaults to TRUE.
`nWindowSizeAroundFeaturePos`	Size, in base pairs, of the viewing window around the feature positions.
`cWhichStrandVsFeaturePos`	A character value describing if distance comparison must be made between "Mod" (or "Base") and the feature positions... "same": ...if these positions are on the same strand only. "opposite":...if these positions are on opposite strands only. "both": ...for all of these positions: same and opposite strands.
`lAddCorrectedDistFrom5pTo3p`	If TRUE, the distance will be corrected to reflect 5' to 3' direction and will be stored in a new column (dist_5to3). Defaults to TRUE.
`cFeaturePosNames`	A character vector returning the names of the feature positions provided. Defaults to c("Start","End"). If the width of ranges is equal to 1, the name of the feature will be the first element of the vector. If the width of ranges is above 1, the names of the feature borders will be the first element then the second element.

Value

A GRangesList with 1 or 2 GRanges objects containing ranges of "Positions" with their distance to feature positions:

If the width of annotation ranges is equal to 1, 1 GRanges is provided ("Position" vs featureStart).
If the width of annotation ranges is above 1, 2 GRanges are provided ("Position" vs featureStart; "Position" vs featureEnd).

If "lGetGRangesInsteadOfListCounts" is FALSE, retrieve instead a list with 1 or 2 dataframe(s) containing "Positions" counts by distance to feature positions:

If the width of annotation ranges is equal to 1, 1 dataframe is provided ("Position" vs featureStart).
If the width of annotation ranges is above 1, 2 dataframes are provided ("Position" vs featureStart; "Position" vs featureEnd).

If a "Position" is within "nWindowSizeAroundFeaturePos" base pairs of x different feature positions: this "Position" will then reported x times with the distance to each feature position around this "Position".

Examples

# loading genome
myGenome <- Biostrings::readDNAStringSet(system.file(
  package = "DNAModAnnot", "extdata",
  "ptetraurelia_mac_51_sca171819.fa"
))

# loading annotation
library(rtracklayer)
myAnnotations <- readGFFAsGRanges(system.file(
  package = "DNAModAnnot", "extdata",
  "ptetraurelia_mac_51_annotation_v2.0_sca171819.gff3"
))

# Preparing a grangesPacBioGFF and a grangesPacBioCSV datasets
myGrangesPacBioGFF <-
  ImportPacBioGFF(
    cPacBioGFFPath = system.file(
      package = "DNAModAnnot", "extdata",
      "ptetraurelia.modifications.sca171819.gff"
    ),
    cNameModToExtract = "m6A",
    cModNameInOutput = "6mA",
    cContigToBeAnalyzed = names(myGenome)
  )
myGposPacBioCSV <-
  ImportPacBioCSV(
    cPacBioCSVPath = system.file(
      package = "DNAModAnnot", "extdata",
      "ptetraurelia.bases.sca171819.csv"
    ),
    cSelectColumnsToExtract = c(
      "refName", "tpl", "strand", "base",
      "score", "ipdRatio", "coverage"
    ),
    lKeepExtraColumnsInGPos = TRUE, lSortGPos = TRUE,
    cContigToBeAnalyzed = names(myGenome)
  )
myGposPacBioCSV <- myGposPacBioCSV[myGposPacBioCSV$base == "A"]

# Retrieve, in a list, dataframes of Mod counts per Distance values from feature positions
myModDistCountsList <- GetDistFromFeaturePos(
  grangesAnnotations = myAnnotations,
  cSelectFeature = "gene",
  grangesData = myGrangesPacBioGFF,
  lGetGRangesInsteadOfListCounts = FALSE,
  lGetPropInsteadOfCounts = TRUE,
  cWhichStrandVsFeaturePos = "both", nWindowSizeAroundFeaturePos = 600,
  lAddCorrectedDistFrom5pTo3p = TRUE,
  cFeaturePosNames = c("TSS", "TTS")
)
myModDistCountsList

AlexisHardy/DNAModAnnot documentation built on Feb. 27, 2023, 12:03 a.m.