GetSeqPctByContig: GetSeqPctByContig Function (SeQual)

View source: R/SeQual.R

GetSeqPctByContigR Documentation

GetSeqPctByContig Function (SeQual)

Description

Return a list with the percentage of sequencing by strand for all scaffolds of genome assembly provided. This function is not adapted for data from DeepSignal.

Usage

GetSeqPctByContig(gposPacBioCSV, grangesGenome)

Arguments

gposPacBioCSV

An UnStitched GPos object containing PacBio CSV data to be analysed.

grangesGenome

A GRanges object containing the width of each contig.

Value

A list composed of 3 dataframes: 1 dataframe by strand and 1 dataframe with both strands. In each dataframe:

  • refName: The names of each contig.

  • strand: The strand of each contig.

  • width: The width of each contig.

  • nb_sequenced: The number of bases sequenced by strand for each contig.

  • seqPct: The percentage of bases sequenced for each strand for each contig (percentage of sequencing).

Examples

myGenome <- Biostrings::readDNAStringSet(system.file(
  package = "DNAModAnnot", "extdata",
  "ptetraurelia_mac_51_sca171819.fa"
))
myGrangesGenome <- GetGenomeGRanges(myGenome)

# Preparing a gposPacBioCSV dataset
myGposPacBioCSV <-
  ImportPacBioCSV(
    cPacBioCSVPath = system.file(
      package = "DNAModAnnot", "extdata",
      "ptetraurelia.bases.sca171819.csv"
    ),
    cSelectColumnsToExtract = c(
      "refName", "tpl", "strand", "base",
      "score", "ipdRatio", "coverage"
    ),
    lKeepExtraColumnsInGPos = TRUE, lSortGPos = TRUE,
    cContigToBeAnalyzed = names(myGenome)
  )

myPct_seq_csv <- GetSeqPctByContig(myGposPacBioCSV, grangesGenome = myGrangesGenome)
myPct_seq_csv

AlexisHardy/DNAModAnnot documentation built on Feb. 27, 2023, 12:03 a.m.