readme.md

BlissR

Package for analysis of Bliss data from Rad Lab

functions overview

load_annotated_peaks() -> load a series annotated peak files in the R environment

load_bams() -> load a series of alignment files in the R environment

add_cancer_gene_census() -> add cancer gene census information to the annotation table

add_cpg_islands() -> add cpg_islands annotation to annotation table

add_repeat_masker_table() -> add repeat masker table information to annotation table

annotation_pie() -> create a pie plot of the annotation regions

coverage_plot() -> create a bar plot of the percentage of total reads in the top n peaks

plot_selection_frequencies() -> select a column of the annotation table and plot the frequence of the elements in it

venn_generator() -> create a venn plot by peaks overlapping or by gene hits of all the peaks(or genes) or of the ones that have a specified characteristic

plot_density() -> to plot the density over chromosomes of the bam files

load_annotated_peaks(organism, blacklist_folder)

arguments:

organism -> "human", "hg", "homo sapiens" or "mouse", "mm", "mus musculus", default is "human"

blacklist_folder -> the folder in which the blacklist files for human and mouse are stored

This function use the sample.csv table contained in the working directory to choose, baseed on the organism, the files to load. This files should be stored in the ./data folder

The output of the function os a data table with all the coordinates of the peaks and the annotation

load_bams(organism, blacklist_folder)

arguments:

organism -> "human", "hg", "homo sapiens" or "mouse", "mm", "mus musculus", default is "human"

blacklist_folder -> the folder in which the blacklist files for human and mouse are stored

This function use the sample.csv table contained in the working directory to choose, baseed on the organism, the files to load. This files should be stored in the ./data folder

The output of the function os a data table with all the coordinates of the mapped reads

add_cancer_gene_census(samples, cgc_folder)

arguments:

samples -> a list of annotated peaks

cgc_folder -> the folder where the cancer gene census table is stored, default is ./utils

This function add to the annotation table information about wheter or not the hit gene is a cancer gene presented in the Cancer Gene Census, if yes, also the role of the gene in cancer and the type of tumor in which i usually involved You can download Cancer Gene Census table from the website: https://cancer.sanger.ac.uk/census#cl_search

add_cpg_islands(samples, genome, cpg_annots)

arguments:

samples -> a list of annotated peaks

genome -> saved when you load the annotated peak files
cpg_annots -> also saved when you load the annotated peak files, both the genome and the cpg_annots depends on the organism

This function uses the Bioconductor library "annotatr" (https://www.bioconductor.org/packages/release/bioc/html/annotatr.html) to add a column called "cpg_island" which value is "yes" if the peak coordinates fall inside a CpG island

# add_repeat_masker_table(samples, table_folder) arguments:

samples -> a list of annotated peaks

table_folder -> the folder where the repeat masker table is stored, default is ./utils

This function add to the annotation three other columns that says wheter or not the sequence fall in a repeated region such as a transposable element, and provide informations about the classification of that repetitive element The repeat masker table can be downloaded from the UCSC Table Browser at this link: http://www.genome.ucsc.edu/cgi-bin/hgTables

# plot_selection_frequencies(samples, column_name) arguments:

samples -> a list of annotated peaks

column_name -> the name of the annotation column that you want to select

This function plots the percentage of each element found in that column

example: plot_selection_frequencies(samples, "Tier") plot the frequency of the three elements of the Tier column, that are "1", "2" or "none"

plot_example

coverage_plot(samples, n_top, plot_dir)

arguments:

 samples -> a list of annotated peaks

 n_top -> the number of top peaks that you want to display in the plot, default is 100

 plot_dir -> the folder where the plot will be saved, default is ./plots

This function plots a bar plot of the percentage of reads that fall in each of the n_top peaks. The variable n_top is set to 100 by default, but its value can be modified

example: coverage_plot(samples, 200, "plots") plot the percentage of reads in the top 200 peaks

plot_example

DSBs_per_gene_length(samples, plot_dir)

arguments:

    samples -> a list of annotated peaks

    plot_dir -> the folder where the plot will be stored, default is ./plots

This function plots a bar plot of the number of peaks per length window, where the length windows are the following:

)0-100bp

)100-1'000bp

)1'000-10'000bp

)10'000-100'000bp

)100000-200'000bp

)more than 200'000bp

example: DSBs_per_gene_length(samples, "plots")

plot_example

plot_density(samples, plot_dir)

arguments:

   samples -> a list of annotated peaks

   plot_dir -> the folder where the plot will be stored, default is ./plots

This function plots the density of the mapped reads loaded with 'load_bam()' on the chromosomes. By defaults all the chromosomes are displayed, to display only specific chromosome or chromosomes you should change the code(I will implement an argument that allow you to do this)

example: plot_density(samples, plot_dir)

plot_example

annotation_pie(sample, plot_dir)

arguments:

    samples -> a list of annotated peaks

    plot_dir -> the folder where the plot will be stored, default is ./plots

This function plots a pie plot of the annotation regions(UTR, Promoters, Exons, etc.)

plot_example

venn_generator(samples, by = "peaks", selection)

arguments:

    samples -> a list of annotated peaks

    by -> it can be "peaks" or "genes". If it's "peaks" the venn plot is built upon peaks overlapping, if it's "genes" it's built based on gene hits.

    selection -> provide the possibility to build the venn plot on subsets of the samples based on selected annotation characteristics. For example, I could build a venn plot in "genes" mode with only genes that are involved in cancer. Default is "all"

example: venn_generator(samples, by = "genes", Tier==1) NOTE: the synthax of the selection must follow the rule Column_name==Value

plot_example



riccardo-trozzo/BlissR documentation built on Aug. 1, 2020, 12:23 a.m.