knitr::opts_chunk$set(echo = TRUE)
This is the instruction vignette for the package BC32BarSeq. It is designed to help analyze BarSeq sequencing runs, specifically with the BC32 system. The set of functions provided in this package depends on the following R packages, so make sure they are installed:
Load the BC32BarSeq package:
library(BC32BarSeq)
Load all other packages at once with the function:
initBC()
Define your variables:
pat <- 'CTANNCAGNNCTTNNCGANNCTANNCTTNNGGANNCTANNCAGNNCTTNNCGANNCTANNCTTNNGGANNCTANNCAGNN' # matching pattern (BC32) restriction <- 'CTCGAG' # The restriction sequence flanking the barcode pattern at the 3'end idx_mis <- 1 # number of mismatch allowed in index base_q <- 20 # minimum average base quality score in first 90 nucleotides bb_mis <- 1 # number of mismatch allowed in barcode backbone sequence indels <- 1 # total edit distance resulting from indels that are tolerated for barcode matching (try to keep this low) dl <- 12 # threshold for Damerau-Levenshtein distance in step pooling similar sequences min.count <- 10 # in the final merged table, replace barcodes with occurence below min.count (considered as noise) with 0
Next, you need to prepare a data frame containing minimal information about the data to be analyzed. In this Vignette an example is provided through a textfile named 'sampname.txt' in which each row of the file contains 3 tab-separated columns indicating the sample names, paths of the corresponding sequencing fastq file and the sequences of the corresponding multiplexing index.
sampname <- read.delim('sampname.txt', # sampname provides a list of sample names, matching sequencing files and multiplexing index header = F, stringsAsFactor = F) colnames(sampname) <- c('sample', 'file', 'index')
bc_data <- generateSummaries(pat = pat, restriction = restriction, sampname = sampname, base_q = base_q, idx_mis = idx_mis, bb_mis = bb_mis, indels = indels) save(bc_data, 'bc_data.Rdata') # Keep this object, it is required for downstream processing!
This function call will generate (in the working directory) a table of raw counts for each sample, and a series of additional graphs and tables for quality control. The top ten sequences failing the index or barcode matching are listed in the textfiles containing the suffixes "_no_index" or "_no_bc_match" respectively. These files also show the alignment of the sequences with the index (respectively barcode pattern) to easily identify why it didn't match.
After careful QC review, you can provide a list of sequences that you think deserve to be included in the analysis (identified in the "_no_index" and "_no_bc_match" files). invited_idx contains sequences that failed the index matching but you believe are worthy to be included in the analysis (let's say if the index did not match because of one substitution), this dataframe contains 2 columns, with the sequence in the first column and the sample name in the second column. invided_bc contains sequences that failed the barcode matching (for example because of a big deletion), in the form of a vector (or in a data frame with one column). Be careful, this step directly modifies the raw table of counts, so make sure you only run it once!
invited_idx <- read.delim('invited_idx.txt', header = F, stringsAsFactor = F) # format: column 1 -> sequences, column 2 -> sample name invited_bc <- read.delim('invited_bc.txt', header = F, stringsAsFactor = F) # format: column 1 -> sequences addInvitedSeq(pat = pat, bc_data = bc_data, dir = getwd(), invited_idx = invited_idx, invited_bc = invited_bc)
This function will generate a new table of counts, where similar sequences are pooled together under one unique barcode sequence.
poolBC(sampname = sampname, dir = getwd(), dl = dl) poolingStatsPlot(dir = getwd()) # export plots
In this last step, all samples are merged together into one table to facilitate the analysis.
counts <- mergeSummaries(sampname = sampname, dir = getwd(), min.count = min.count, cleancol = F) freq <- countsToFreq(counts = counts) cpm <- countsToCPM(counts = counts) write.table(counts, 'merged_summary_pooled_count.txt', sep='\t', row.names = F) write.table(freq, 'merged_summary_pooled_freq.txt', sep='\t', row.names = F) write.table(cpm, 'merged_summary_pooled_cpm.txt', sep='\t', row.names = F) # export session info for reproducibility writeLines(capture.output(sessionInfo()), 'sessionInfo.txt')
This function starts a shiny app to explore the table of counts and frequencies generated in the previous step.
exploreBC(getwd())
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.