knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
DNAModAnnot is a R package providing a comprehensive toolkit for the analysis and annotation of DNA modifications (e.g. 6-methyladenine (6mA)). Its modular architecture allows the analysis of modification detection performed using Pacific Biosciences (PacBio) kineticsTools or Oxford Nanopore Technologies via DeepSignal software.
This documentation provides a guided use of this package using PacBio input data. By following this tutorial, you will learn to:
This package is available via GitHub ( https://github.com/AlexisHardy/DNAModAnnot ) and can be installed using the following code:
if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install(c('Biostrings', 'BSgenome', 'Gviz', 'seqLogo'))
Then, you can install DNAModAnnot using the tar.gz file from GitHub repository:
setwd("path/to/package/file/") install.packages("DNAModAnnot_0.0.0.9018.tar.gz", repos = NULL, type = 'source')
Or you can directly install from GitHub using devtools package:
install.packages("devtools") library(devtools) install_github("AlexisHardy/DNAModAnnot")
You should then be able to load the package into your R session with:
library(DNAModAnnot)
DNAModAnnot can load data generated by the PacBio platform after using the Pacific Biosciences (PacBio) kineticsTools on PacBio RSI/RSII/Sequel/SequelII data. The following output files will be used by DNAModAnnot to perform a stringent analysis of detected modification patterns:
It is also possible for DNAModAnnot to load data generated by the Nanopore platform after using the DeepSignal software. Here, the output of the call_modification_frequency.py script (containing modification detection parameters for all target sites sequenced) will then be required (cf Deepsignal documentation).
Before analyzing modifications distribution, the package DNAModAnnot proposes various functions to assess the quality of sequencing and filter out bias (e.g. lack of coverage) coming from some scaffolds/bases.
You can load a genome (fasta file) using readDNAStringSet
from Biostrings package.
The GetGenomeGRanges
function will return a GRanges object representing the scaffolds which will be used by various functions later.
# Loading genome assembly (fasta file or DNAStringSet) ptetraurelia_genome_fa <- system.file( package = "DNAModAnnot", "extdata", "ptetraurelia_mac_51_sca171819.fa" ) ptetraurelia_genome <- Biostrings::readDNAStringSet(ptetraurelia_genome_fa) ptetraurelia_genome_range <- GetGenomeGRanges(ptetraurelia_genome)
The GetAssemblyReport
function returns a classic report (similar to reports from QUAST software) in a data.frame object.
# Statistics summary of genome report_assembly <- GetAssemblyReport( dnastringsetGenome = ptetraurelia_genome, cOrgAssemblyName = "ptetraurelia_mac_51" )
The GetContigCumulLength
function returns a data.frame with scaffolds size and cumulated size. This data.frame can be used by DrawContigCumulLength
function to plot cumulated length by contig.
# Cumulative length of genome assembly contig_cumulative_length <- GetContigCumulLength(ptetraurelia_genome) DrawContigCumulLength( nContigCumsumLength = contig_cumulative_length$cumsum_Mbp_length, cOrgAssemblyName = "ptetraurelia_mac_51", lGridInBackground = FALSE )
The modifications.csv (PacBioCSV) and modifications.gff (PacBioGFF) files can be loaded using the ImportPacBioCSV
and ImportPacBioGFF
functions respectively.
ImportPacBioCSV
will import PacBioCSV file as an Unstitched GPos object.
Note that
SortGPos=TRUE
will take a longer time for the importation but will allow to greatly reduce the size of the GPos output.
ImportPacBioGFF
will import PacBioGFF file as a GRanges object.
Note that
ImportPacBioGFF
will import one modification type at a time that you must precise usingcNameModToExtract
option.
# loading SMRT-seq data (PacBio gff and csv files) PacBioGFF_path <- system.file(package = "DNAModAnnot", "extdata", "ptetraurelia.modifications.sca171819.gff") PacBioGFF_granges <- ImportPacBioGFF( cPacBioGFFPath = PacBioGFF_path, cNameModToExtract = "m6A", cModNameInOutput = "6mA", cContigToBeAnalyzed = names(ptetraurelia_genome) ) PacBioCSV_path <- system.file(package = "DNAModAnnot", "extdata", "ptetraurelia.bases.sca171819.csv") PacBioCSV_gpos <- ImportPacBioCSV( cPacBioCSVPath = PacBioCSV_path, cSelectColumnsToExtract = c( "refName", "tpl", "strand", "base", "score", "ipdRatio", "coverage" ), lKeepExtraColumnsInGPos = TRUE, lSortGPos = TRUE, cContigToBeAnalyzed = names(ptetraurelia_genome) )
You can use the GetSeqPctByContig
function with the PacBioCSV GPos object to retrieve, into a list, the percentage of sequenced bases by contig. This list can be used for filtering with the FiltPacBio
function.
You can also draw a barplot by strand using the DrawBarplotBothStrands
function with the content of this list.
# Percentage of sequencing by scaffold and by strand contig_percentage_sequencing <- GetSeqPctByContig(PacBioCSV_gpos, grangesGenome = ptetraurelia_genome_range ) DrawBarplotBothStrands( nParamByContigForward = contig_percentage_sequencing$f_strand$seqPct, nParamByContigReverse = contig_percentage_sequencing$r_strand$seqPct, cContigNames = contig_percentage_sequencing$f_strand$refName, cGraphName = "Percentage of sequencing per contig" )
The same approach can be used for any numeric parameter available in the PacBioCSV GPos and PacBioGFF GRanges objects.
You can calculate the mean by scaffold for a numeric parameter using the GetMeanParamByContig
function. Then, you can use the output list to represent barplots by strand using the DrawBarplotBothStrands
function again.
# Mean coverage by scaffold and by strand contig_mean_coverage <- GetMeanParamByContig( grangesData = PacBioCSV_gpos, dnastringsetGenome = ptetraurelia_genome, cParamName = "coverage" ) DrawBarplotBothStrands( nParamByContigForward = contig_mean_coverage$f_strand$mean_coverage, nParamByContigReverse = contig_mean_coverage$r_strand$mean_coverage, cContigNames = contig_mean_coverage$f_strand$refName, cGraphName = "Mean Coverage per contig" )
To represent the distribution of a numeric parameter, the DrawDistriHistBox
function allows you to plot an histogram along a boxplot showing the global range of this parameter.
# coverage distribution of all bases sequenced DrawDistriHistBox(PacBioCSV_gpos$coverage, cGraphName = "Coverage distribution of all bases sequenced", cParamName = "Coverage", lTrimOutliers = FALSE )
You can use the FiltPacBio
function to filter the PacBioCSV GPos and PacBioGFF GRanges objects.
This function will return a list with the new PacBioCSV GPos as the first element of the list, and the new PacBioGFF GRanges as the second element of the list.
Some options of this function require PacBioCSV GPos object or the sequence of the genome (dnastringsetGenome
object). (see FiltPacBio
documentation for more information)
# First filter: contigs PacBio_filtered_data <- FiltPacBio( gposPacBioCSV = PacBioCSV_gpos, grangesPacBioGFF = PacBioGFF_granges, cContigToBeRemoved = NULL, dnastringsetGenome = ptetraurelia_genome, nContigMinSize = 1000, listPctSeqByContig = contig_percentage_sequencing, nContigMinPctOfSeq = 1, listMeanCovByContig = contig_mean_coverage, nContigMinCoverage = 20 ) PacBioCSV_gpos_filt1 <- PacBio_filtered_data$csv PacBioGFF_granges_filt1 <- PacBio_filtered_data$gff
This part will describe some tools which can be used before and/or after filtering based on False Discovery (FDR) estimations.
You can use the GetModReport
function to obtain a data.frame giving global informations about modification distribution (Modification counts, ratio, motifs associated, mean of parameters provided...).
# report basic Mod for remaining scaffolds report_modifications <- GetModReportPacBio( grangesGenome = ptetraurelia_genome_range, grangesPacBioGFF = PacBioGFF_granges_filt1, gposPacBioCSV = PacBioCSV_gpos_filt1, cOrgAssemblyName = "ptetraurelia_mac_51", dnastringsetGenome = ptetraurelia_genome, cBaseLetterForMod = "A", cModNameInOutput = "6mA" )
You can use the GetModRatioByContig
function with the PacBioGFF GRanges object to retrieve, in a list, the modification ratio (ModRatio) by scaffold.
Again, you can draw a barplot by strand using DrawBarplotBothStrands
function with the content of this list.
# Modif ratio by scaffold and by strand contig_modification_ratio <- GetModRatioByContig(PacBioGFF_granges_filt1, PacBioCSV_gpos_filt1[PacBioCSV_gpos_filt1$base == "A"], dnastringsetGenome = ptetraurelia_genome, cBaseLetterForMod = "A" ) DrawBarplotBothStrands( nParamByContigForward = contig_modification_ratio$f_strand$Mod_ratio, nParamByContigReverse = contig_modification_ratio$r_strand$Mod_ratio, cContigNames = contig_modification_ratio$f_strand$refName, cGraphName = "Modif/Base ratio per contig (Sequenced sites only)" )
You can reveal enrichment (or enrichment + depletion) around modified bases by plotting a logo with the DrawModLogo
function based on the Logolas package.
Here, you need to provide the sequence around each modified base. You can get those sequences by using the PacBioGFF GRanges object with the GetGRangesWindowSeqandParam
function and retrieving the sequence
column.
You can decide the length of the sequence to be compared around the modifications using the nUpstreamBpToAdd
and nDownstreamBpToAdd
options.
# Sequence logo associated to Modif detected PacBioGFF_granges_with_sequence <- GetGRangesWindowSeqandParam(PacBioGFF_granges_filt1, ptetraurelia_genome_range, dnastringsetGenome = ptetraurelia_genome, nUpstreamBpToAdd = 5, nDownstreamBpToAdd = 5 ) DrawModLogo( dnastringsetSeqAroundMod = as(PacBioGFF_granges_with_sequence$sequence, "DNAStringSet"), nGenomicBgACGT = c(0.35, 0.15, 0.15, 0.35), cYunit = "ic_hide_bg", nPositionsToAnnotate = c(6), cAnnotationText = c("6mA"), nTagTextFontSize = 12 )
The FiltPacBio
function can be used to filter PacBioGFF GRanges only depending on the other arguments provided. (see FiltPacBio
documentation for more information)
# Second filter: fraction PacBioGFF_granges_filt2 <- FiltPacBio( grangesPacBioGFF = PacBioGFF_granges_filt1, cParamNameForFilter = "frac", nFiltParamLoBoundaries = 0.05, nFiltParamUpBoundaries = 1, cFiltParamBoundariesToInclude = "upperOnly" )$gff
DNAModAnnot provides tools to estimate False Discovery Rate (FDR) by threshold on parameters associated to modification detection.
These FDR estimations can help deciding filters to be used for filtering DNA modifications to be kept.
First, you can use the ExtractQuantifByModMotif
function to retrieve a list containing the following elements:
nModMotifMinProp
option.The PacBioGFF GRangesList object will be used to keep only data from motifs over-represented.
# Extract Mod Data by motif over-represented (at least % of motifs) motif_pct_and_PacBioGFF_grangeslist <- ExtractListModPosByModMotif( grangesModPos = PacBioGFF_granges_filt2, grangesGenome = ptetraurelia_genome_range, dnastringsetGenome = ptetraurelia_genome, nUpstreamBpToAdd = 0, nDownstreamBpToAdd = 1, nModMotifMinProp = 0.05, cBaseLetterForMod = "A", cModNameInOutput = "6mA" )
Before estimating FDR, you must convert the PacBioCSV GPos object into a GRanges object. This will allows you to extract the sequence with the PacBioCSV data (using the GetGRangesWindowSeqandParam
function) which is required for later steps.
Then, GetFdrEstListByThresh
function can be used to retrieve a list (by motif over-represented) of data.frames. Each data.frame contains the FDR estimated by threshold (for the provided parameter) (and the adjusted FDR ~= cumulative minimum FDR).
This FDR estimation list can be used by the GetFdrBasedThreshLimit
function to retrieve, in a list, the threshold associated to a chosen FDR.
Note that this chosen FDR is defined by
nFdrPropForFilt
option. The threshold will be defined by the closest value below this level of FDR for each motif (except for some motifs iflUseBestThrIfNoFdrThr=TRUE
is used).
This FDR estimation list can also be used by the DrawFdrEstList
function to represent for each motif the distribution of FDR estimation by threshold.
# Motif FDR estimation for each motif over-represented (at least % of motifs) BaseCSV_granges_filt1 <- as(PacBioCSV_gpos_filt1[PacBioCSV_gpos_filt1$base == "A"], "GRanges") BaseCSV_granges_with_sequence <- GetGRangesWindowSeqandParam( grangesData = BaseCSV_granges_filt1, grangesGenome = ptetraurelia_genome_range, dnastringsetGenome = ptetraurelia_genome, nUpstreamBpToAdd = 0, nDownstreamBpToAdd = 1 ) score_fdr_by_motif_list <- GetFdrEstListByThresh( grangesDataWithSeq = BaseCSV_granges_with_sequence, grangesDataWithSeqControl = NULL, cNameParamToTest = "score", nRoundDigits = 1, cModMotifsAsForeground = motif_pct_and_PacBioGFF_grangeslist$motifs_to_analyse ) score_fdr_by_motif_limit <- GetFdrBasedThreshLimit(score_fdr_by_motif_list, nFdrPropForFilt = 0.05, lUseBestThrIfNoFdrThr = TRUE ) DrawFdrEstList( listFdrEstByThr = score_fdr_by_motif_list, cNameParamToTest = "score", nFdrPropForFilt = 0.05 )
Finally, the output of the GetFdrBasedThreshLimit
function can be used for filtering by motif using the FiltPacBio
function.
Note that
grangesModPos
must be a GRangesList object in this case.
PacBioGFF_grangeslist_filt <- FiltPacBio( grangesPacBioGFF = motif_pct_and_PacBioGFF_grangeslist$GRangesbyMotif, listFdrEstByThrIpdRatio = NULL, listFdrEstByThrScore = score_fdr_by_motif_limit )$gff
This part will describe the tools that can be used after filtering based on False Discovery (FDR) estimations and can be launched for each motif independently.
You can use the following commands to select data for one motif:
# Select one motif for the analysis by motif motifs_base <- motif_pct_and_PacBioGFF_grangeslist$motifs_to_analyse[motif_pct_and_PacBioGFF_grangeslist$motifs_to_analyse == "AT"] motifs_modifications <- motif_pct_and_PacBioGFF_grangeslist$mod_motif[motifs_base == motif_pct_and_PacBioGFF_grangeslist$motifs_to_analyse] PacBioGFF_granges_filtAT <- PacBioGFF_grangeslist_filt[[motifs_base]] BaseCSV_granges_filtAT <- BaseCSV_granges_with_sequence[BaseCSV_granges_with_sequence$sequence == motifs_base, ]
To analyse the distribution of modifications in the genome, you must provide an annotation file as a GRanges object.
The package rtracklayer contains various functions for file importation as GRanges objects depending on the format of your annotation file.
For some functions, the feature "intergenic" will be required for comparison between genes and intergenic regions. Here, you can use the PredictMissingAnnotation
function to complete the annotation with "intergenic" features (and/or introns/exons if your annotation provides exons or introns positions).
# Loading GFF annotation file (Minimum annotation required: gene (seqnames, start, end, strand)) annotations_path <- system.file(package = "DNAModAnnot", "extdata", "ptetraurelia_mac_51_annotation_v2.0_sca171819.gff3") annotations_range <- rtracklayer::readGFFAsGRanges(annotations_path) annotations_range <- PredictMissingAnnotation( grangesAnnotations = annotations_range, grangesGenome = ptetraurelia_genome_range, cFeaturesColName = "type", cGeneCategories = c("gene"), lAddIntronRangesUsingExon = TRUE )
The GetModBaseCountsByFeature
function returns the counts of the DNA modification (and its unmodified base) for each feature provided.
The DrawModBasePropByFeature
function can then calculate (and draw as a barplot) the proportion of modified (or unmodified) base between features to be compared.
# Mod annotation by feature annotations_range_ModBase_counts <- GetModBaseCountsByFeature( grangesAnnotations = annotations_range, grangesModPos = PacBioGFF_granges_filtAT, gposModTargetBasePos = BaseCSV_granges_filtAT, lIgnoreStrand = FALSE ) DrawModBasePropByFeature( grangesAnnotationsWithCounts = annotations_range_ModBase_counts, cFeaturesToCompare = c("gene", "intergenic"), lUseCountsPerkbp = TRUE, cBaseMotif = motifs_base, cModMotif = motifs_modifications )
You can also compare a quantitative parameter versus Modifications counts in features.
Here you can load an example with RNA-seq data for the annotation provided:
# Loading additional Data: RNA-seq dataset expression_file_path <- system.file(package = "DNAModAnnot", "extdata", "ptetraurelia.gene_expression.sca171819.tsv") expression_dataframe <- read.table( file = expression_file_path, header = TRUE, sep = "\t", dec = ",", quote = "\"", fill = FALSE ) expression_dataframe <- expression_dataframe[, c("ID", "T30")]
You need to load the quantitative parameter to be compared with modification counts as a new column within the annotation (+counts) GRanges object.
# merge RNA-seq data with Annotation GRanges object genes_range_ModBase_counts_param <- annotations_range_ModBase_counts[annotations_range_ModBase_counts$type == "gene"] GenomicRanges::mcols(genes_range_ModBase_counts_param) <- merge( x = GenomicRanges::mcols(genes_range_ModBase_counts_param), by.x = "Name", y = expression_dataframe, by.y = "ID" )
Then, you can use the DrawParamPerModBaseCategories
function. This will plot the distribution of the quantitative parameter provided by category of modified (and unmodified) base counts.
# Comparison of quantitative parameter with Mod annotation DrawParamPerModBaseCategories( grangesAnnotationsWithCounts = genes_range_ModBase_counts_param, cParamColname = "T30", cParamFullName = "Gene expression at T30", cParamYLabel = "Normalized RNA-seq read counts (T30)", cSelectFeature = "gene", lUseCountsPerkbp = TRUE, cBaseMotif = motifs_base, cModMotif = motifs_modifications, lBoxPropToCount = FALSE )
The GetModBaseCountsWithinFeature
function returns the counts of modified base (and its unmodified target base) within each feature provided.
For this, each feature provided is cut into a specific number of windows (defined by the nWindowsNb
option) and counts are returned for each window of each feature.
The DrawModBaseCountsWithinFeature
function can then represent the distribution within features through a barplot.
# Mod annotation within feature gene_annotation_range <- annotations_range[annotations_range$type == "gene", ] gene_annotation_range <- GetModBaseCountsWithinFeature( grangesAnnotations = gene_annotation_range, grangesModPos = PacBioGFF_granges_filtAT, gposModTargetBasePos = BaseCSV_granges_filtAT, lIgnoreStrand = FALSE, nWindowsNb = 20 ) DrawModBaseCountsWithinFeature( grangesAnnotationsWithCountsByWindow = gene_annotation_range, cFeatureName = "gene", cBaseMotif = motifs_base, cModMotif = motifs_modifications )
The GetDistFromFeaturePos
function returns, for each feature provided, its distance, in bp, from a modification (using a window of specific size around each feature).
If the feature provided is larger than 1bp, each feature boundaries will be used instead for computing the distance.
If the lGetGRangesInsteadOfListCounts
option is set to FALSE
, the GetDistFromFeaturePos
function will return instead a list of dataframes giving the counts (or proportion) of modified (or unmodified) bases by distance toward the feature.
This list can be represented on a plot using DrawModBasePropDistFromFeature
function.
You can also add 1 or 2 additional axis on the plot to compare modification signal with another parameters by using the AddToModBasePropDistFromFeaturePlot
function.
Here is an example using MNase-seq data to represent nucleosomes:
BamFile
function from RSamtools package and the readGAlignments
function from GenomicAlignments package. GetGposCenterFromGRanges
function. GetDistFromFeaturePos
function before using the AddToModBasePropDistFromFeaturePlot
function. # ModBase distance from feature/feature limit Mod_distance_feature_countslist <- GetDistFromFeaturePos( grangesAnnotations = annotations_range, cSelectFeature = "gene", grangesData = PacBioGFF_granges_filtAT, lGetGRangesInsteadOfListCounts = FALSE, lGetPropInsteadOfCounts = TRUE, cWhichStrandVsFeaturePos = "both", nWindowSizeAroundFeaturePos = 600, lAddCorrectedDistFrom5pTo3p = TRUE, cFeaturePosNames = c("TSS", "TTS") ) Base_distance_feature_countslist <- GetDistFromFeaturePos( grangesAnnotations = annotations_range, cSelectFeature = "gene", grangesData = BaseCSV_granges_filtAT, lGetGRangesInsteadOfListCounts = FALSE, lGetPropInsteadOfCounts = TRUE, cWhichStrandVsFeaturePos = "both", nWindowSizeAroundFeaturePos = 600, lAddCorrectedDistFrom5pTo3p = TRUE, cFeaturePosNames = c("TSS", "TTS") ) DrawModBasePropDistFromFeature( listModCountsDistDataframe = Mod_distance_feature_countslist, listBaseCountsDistDataframe = Base_distance_feature_countslist, cFeaturePosNames = c("TSS", "TTS"), cBaseMotif = motifs_base, cModMotif = motifs_modifications ) # ModBase and Reads center (from Bam file) distance from feature/feature limit bamfile_path <- system.file(package = "DNAModAnnot", "extdata", "PTET_MonoNuc_3-2new.pe.sca171819.sorted.bam") bamfile_object <- Rsamtools::BamFile(file = bamfile_path) bamfile_ranges <- as(GenomicAlignments::readGAlignments(bamfile_object), "GRanges") bamfile_ranges <- GetGposCenterFromGRanges(grangesData = bamfile_ranges) bamfile_distance_feature_countslist <- GetDistFromFeaturePos( grangesAnnotations = annotations_range, cSelectFeature = "gene", grangesData = bamfile_ranges, lGetGRangesInsteadOfListCounts = FALSE, lGetPropInsteadOfCounts = FALSE, cWhichStrandVsFeaturePos = "both", nWindowSizeAroundFeaturePos = 600, lAddCorrectedDistFrom5pTo3p = TRUE, cFeaturePosNames = c("TSS", "TTS") ) AddToModBasePropDistFromFeaturePlot( dPosCountsDistFeatureStart = bamfile_distance_feature_countslist[[1]], dPosCountsDistFeatureEnd = bamfile_distance_feature_countslist[[2]], cSubtitleContent = "Along with nucleosome center distance (MonoNuc_3-2newreplicate)", cParamYLabel = "Nucleosome center count (MonoNuc_3-2newreplicate)", cParamColor = "cyan3", lAddAxisOnLeftSide = TRUE )
You can see locally the distribution of modifications (for any modification or for some motifs) by using the Gviz package.
After preparing a set of tracks, you can display them using the plotTracks
function from Gviz package. (see Gviz package for more information)
DNAModAnnot provides several functions which can be used alongside the Gviz package for local visualisation.
The ExportFilesForGViz
function allows the user to export files which can be used for streaming (except for the gff3 format) with the plotTracks
function from Gviz package. Here, this makes possible to use streaming also for genomic annotations using the bam format (and using the ImportBamExtendedAnnotationTrack
function when making tracks). (see ExportFilesForGViz
function documentation for more information)
ipdRatio6mABigwig <- system.file(package = "DNAModAnnot", "extdata", "ipdRatio6mATsites.bw") ptet51GenesBam <- system.file(package = "DNAModAnnot", "extdata", "ptet51Genes.bam")
# #Gviz file making for vignette (not run) ExportFilesForGViz( cFileNames = c( ipdRatio6mABigwig, ptet51GenesBam ), listObjects = list( PacBioGFF_granges_filtAT, annotations_range[annotations_range$type == "gene"] ), cFileFormats = c("bw", "bam"), lBigwigParametersByStrand = c(TRUE, NA), cBigwigParameters = c("ipdRatio", NA), cBamXaParameters = c(NA, "Name"), dnastringsetGenome = ptetraurelia_genome )
To plot an Ideogram, you can use the IdeogramTrack
function from Gviz package. But if you do not have any information on cytobands, you can use the AdaptedIdeogramTrackWithoutBandsData
function to plot an empty representation of chromosome.
If the chromosome names that you are using are present in the UCSC database, you can use the command
options(ucscChromosomeNames=TRUE)
before using the Gviz package. Otherwise, you must useoptions(ucscChromosomeNames=FALSE)
command instead.
cContigToViz <- unique(GenomicRanges::seqnames(ptetraurelia_genome_range)) # Generating Ideogram TRACK-------- options(ucscChromosomeNames = FALSE) trackIdeogram <- AdaptedIdeogramTrackWithoutBandsData( grangesGenome = ptetraurelia_genome_range, cContigToViz = cContigToViz, cOrgAssemblyName = "ptetraurelia_mac_51" )
When making tracks using files generated with the ExportFilesForGViz
function, you must add stream = TRUE
to allow streaming when displaying tracks.
# Generating classic Gviz tracks # GenomeAxis TRACK------ trackGenomeAxis <- Gviz::GenomeAxisTrack(cex = 1) # Sequence TRACK-------- trackSequence <- Gviz::SequenceTrack(ptetraurelia_genome_fa, chromosome = cContigToViz, add53 = TRUE, complement = FALSE, cex = 0.8, stream = TRUE ) # DATATRACK-------- trackData6mATipdRatio <- Gviz::DataTrack(ipdRatio6mABigwig, stream = TRUE, name = "6mAT\nipdRatio", type = "histogram", col.histogram = c("red"), fill = "red", background.title = "darkred", col = NULL ) trackDataNuclCoverage <- Gviz::DataTrack(bamfile_path, stream = TRUE, name = "Nucleosome\ncoverage\n(MNase-seq)", type = "histogram", col.histogram = c("blue"), fill = "blue", background.title = "darkblue", col = NULL )
To display genomic annotations (in adapted bam file; see ExportFilesForGViz
function documentation) using streaming, you must use the ImportBamExtendedAnnotationTrack
function as the import function (via stream
and importFunction
options while making the annotation track).
# Generating Annotation TRACK with streaming-------- trackAnnotation <- Gviz::AnnotationTrack(ptet51GenesBam, name = "Gene", stacking = "squish", stream = TRUE, importFunction = ImportBamExtendedAnnotationTrack, group = "tag", groupAnnotation = "tag", just.group = "below", fontcolor.group = "black", fontsize.group = 18, fill = "lightblue", shape = "fixedArrow", arrowHeadWidth = 50, lwd = 3, background.title = "darkblue" )
In order to allow the names of the genomic features to be displayed, the "mapping" sub-list of the generated annotation track must be completed with the new "id" and "group" values.
trackAnnotation@mapping <- list(id = "tag", group = "tag")
Once the tracks are all generated, you can use the plotTracks
function from Gviz package to display them on a chosen location (defined by the chromosome
,from
and to
options).
# Plotting Tracks-------- Gviz::plotTracks( trackList = list( trackIdeogram, trackGenomeAxis, trackData6mATipdRatio, trackSequence, trackDataNuclCoverage, trackAnnotation ), chromosome = "scaffold51_17", from = 361000, to = 365000 )
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.