README.md

BWASPR

R functions and scripts to process output from the BWASP workflow

This repository contains R functions and scripts we use to analyze the *.mcalls output files from the BWASP workflow.

Installation

Please find detailed installation instructions and options in the INSTALL document.

Reference

Claire Morandin and Volker P. Brendel (2021) Tools and applications for integrative analysis of DNA methylation in social insects. Molecular Ecology Resources, 00, 1-19. https://doi.org/10.1111/1755-0998.13566.

Original pre-print: at BioRxiv.

Contact

Please direct all comments and suggestions to Volker Brendel at Indiana University.

What BWASPR does

Required input to the BWASPR workflow consists of the *.mcalls files (tab delimited data for the named columns)

SeqID.Pos SequenceID  Position  Strand Coverage  Prcnt_Meth  Prcnt_Unmeth

and two files specifying the data labels and *.mcalls file locations and certain parameters, respectively. Let's look at the example files in inst/extdata:

AmHE.dat
================================================================================
# Samples from Herb et al. (2012) Nature Neuroscience:
#
Am      HE      forager 0       CpGhsm  ../inst/extdata/Amel-forager.CpGhsm.mcalls
Am      HE      forager 0       CpGscd  ../inst/extdata/Amel-forager.CpGscd.mcalls
Am      HE      nurse   0       CpGhsm  ../inst/extdata/Amel-nurse.CpGhsm.mcalls
Am      HE      nurse   0       CpGscd  ../inst/extdata/Amel-nurse.CpGscd.mcalls
AmHE.par
================================================================================
SPECIESNAME     Apis mellifera
ASSEMBLYVERSION Amel_4.5
GENOMESIZE      250270657
TOTALNBRPMSITES 20307353
SPECIESGFF3DIR  ../inst/extdata/AmGFF3DIR
GENELISTGFF3    Amel.gene.gff3
EXONLISTGFF3    Amel.exon.gff3
PCGEXNLISTGFF3  Amel.pcg-exon.gff3
PROMOTRLISTGFF3 Amel.promoter.gff3
CDSLISTGFF3     Amel.pcg-CDS.gff3
UTRFLAGSET      1
5UTRLISTGFF3    Amel.pcg-5pUTR.gff3
3UTRLISTGFF3    Amel.pcg-3pUTR.gff3

The first file has columns for species (here Am); study (here HE); sample (here forager and nurse"); replicate number (here 0, indicating single samples or, as in the case of this study, aggregates over replicates); and file locations (here for the CpGhsm and CpGscd *.mcalls files); note that the file locations in this example are relative links, assuming you will run the example discussed in the demo directory. The second file specifies the species name, genome assembly version, genome size (in base pairs), total number of potential methylation sites (CpGs), and file names for GFF3 annotation of various genomic features (UTRFLAGSET is set to 1 to use UTR annotation in the GFF3 file).

A typical BWASPR workflow will read the specified *.mcalls files and generate various output tables and plots, labeled in various ways with species_ study_ sample_ replicate labels. The demo/Rscript.BWASPR file shows a template workflow. Initial customization is done at the top of the file and mostly from inclusion of a configuration file such as demo/sample.conf. The following table summarizes the successive workflow steps. You may want to open the demo/Rscript.BWASPR and demo/sample.conf in separate windows as a reference while viewing the table. Details on running the workflow with the demo data are given in demo/README.

RUNflag to expected output correspondence

| RUNflag | input | (select) parameters | function | theme | output files | |------------|-----------------------|----------------------------|-------------------------------|-------------------------------------------------|----------------------------------------------------------------------------------------------------------------------| | RUNcms | studymk | covlist, locount, hicount | cmStats() | sample coverage and methylation statistics | cms-*.txtcms-*.pdf | | RUNpwc | studymkstudymc | - | cmpSites() | pairwise sample comparisons | pwc-*.vs.*.txt | | RUNcrl | studymk | destrand | cmpSamples() | correlations between aggregate samples | crl-*.txtcrl-*.pdf | | | | | | | | | RUNrepcms | replicate *.mcalls | repcovlist,replocount, rephicount | cmStats() | replicate coverage and methylation statistics | repcms-*.txtrepcms-*.pdf | | RUNrepcrl | replicate *.mcalls | destrand | cmpSamples() | correlations between replicates | repcrl-*.txtrepcrl-*.pdf | | | | | | | | | RUNmmp | studymk | - | map_methylome() | methylation to annotation maps | mmp-*.txt | | RUNacs | studymk | destrand | annotate_methylome() | annotation of common sites | acs-*.txt | | RUNrnk | studymk | genome_ann$region | rank_rbm() | ranked genes and promoters | ranked-*.txtsites-in-*.txtrnk-sig-*.pdfsip-*.txtrnk-sip-*.txtrnk-sip-*.pdf | | RUNmrpr | studymk | ddsetnr2ddoplots | det_mrpr() | methylation-rich and -poor regions | dst-*.txt*ds-*.pdfmdr-*.tabmdr-*.bedmpr-*.txtmrr-*.txtrmp-*.txtgwr-*.txt | | | | | | | | | RUNdmt | studymc | wsize, stepsize | det_dmt() | differentially methylated tiles and genes | dmt-*.txtdmg-*.txt | | RUNdmsg | sample *.mcalls | highcoveragedestrand | det_dmsg() | differentially methylated sites and genes | dms-*.txtdmg-*.txt | | RUNdmgdtls | studyhc | destrand | show_dmsg() | details for differentially methylated genes | dmg-*.vs.*_details.txtdmg-*.vs._heatmaps.pdf | | RUNogl | studyhc | - | explore_dmsg()rank_dmg() | ranked lists of differentially methylated genes | ogl-*.txtrnk-dmg-*.vs.*.txtrnk-dmg-*.vs.*.pdfwrt-.txt | | | | | | | | | RUNsave | workflow output | - | save.image() | save image of workflow output | *.RData |



BrendelGroup/BWASPR documentation built on Feb. 6, 2022, 9:09 a.m.