All counts are computed from sorted, indexed BAM files using the make_counts()
function. This function requires two files:
1. A GFF [1] file of bait regions on the genome 2. A csv file showing the sample -> treatment -> bam file mappings for the experiment.
The mapping file has the following structure:
"sample_name", | "bam_file_path", | "treatment" ---------------|------------------|------------ "control_001", | "data/control1/aligned_merged_sorted.bam" | "control" "control_002", | "data/control2/aligned_merged_sorted.bam" | "control" "control_003", | "data/control3/aligned_merged_sorted.bam" | "control" "treatment_001", | "data/treatment1/aligned_merged_sorted.bam" | "treatment" "treatment_002", | "data/treatment2/aligned_merged_sorted.bam" | "treatment" "treatment_003", | "data/treatment3/aligned_merged_sorted.bam" | "treatment"
The BAM indices (.bai
files) are presumed to be with the BAM files.
atacr
.As far as atacr
is concerned, ATACseq data is counted into equal sized windows within the bait windows - so that you end up with many more regions with counts, than you have baits. This behaviour means you can find regions of smaller than bait size that are differentially accessible. Conversely, RNAseq data is counted into one window per region declared in the GFF file, so you get just one expression estimate per gene/transcript.
ATACseq is the default data type expected in atacr
. The make_counts()
call is the simplest in this case.
counts <- make_counts("bait_regions.gff", "sample_treatment_mapping.csv")
The width of the genomic windows in which to compute counts across the defined bait regions is set to 50 nt, to change this use the width
parameter to the size of the windows you want to use, e.g 100 nt.
counts <- make_counts("bait_regions.gff", "sample_treatment_mapping.csv", width = 100)
When loading RNAseq data it is neccesary to set the is_rnaseq
option in make_counts()
counts <- make_counts("bait_regions.gff", "sample_treatment_mapping.csv", is_rnaseq = TRUE)
atacr
allows you to set values determining which reads will be included in counts. By default a simple filter object can be passed from the make_params()
function to the filter_params
argument of make_counts()
.
my_params = make_params( paired_map = TRUE, minq = 30, dedup = TRUE ) counts <- make_counts("bait_regions.gff", "sample_treatment_mapping.csv", is_rnaseq = TRUE, filter_params = my_params )
The paired_map
option sets whether reads must be mapped as pairs to be counted, TRUE
is the default. The dedup
option removes reads that seem like PCR duplicates to the aligner TRUE
is the default. minq
sets the minimum PHRED mapping quality score for a read to be counted, 30
is the default
If you require greater control over mapping filters for read counts from RNAseq, you can use an Rsamtools::ScanBamParam()
object instead. See https://www.rdocumentation.org/packages/Rsamtools/versions/1.24.0/topics/ScanBamParam for details
For greater control over mapping filters for read counts when using ATACseq data, use a csaw::readParam()
object. See http://bioconductor.org/packages/release/bioc/manuals/csaw/man/csaw.pdf for details.
Region names are loaded from the GFF file. As GFF is a bit of a fluid format different files may encode this information differently. By default, make_counts()
will look into the attribute (final) column in the GFF and use the attribute called ID
. To use a different attribute set gene_id_col
counts <- make_counts("bait_regions.gff", "sample_treatment_mapping.csv", gene_id_col = "GENE_NAME")
atacr
objectThe result of make_counts()
is an atacr
object of counts, basically an R list
with slots for counts from bait windows, non-bait windows, the sample and BAM information. The count information is held in 'SummarizedExperiment' objects from Bioconductor. See http://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html for details.
Computing the atacr
count object can take a while, especially when you are analysing many BAM files. It can be useful to save the object after computation. This can be done with base R's saveRDS()
function.
saveRDS(counts, file="my_output_file.rds") reloaded_counts <- readRDS("my_output_file.rds")
[1] http://gmod.org/wiki/GFF3#GFF3
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.