startQuest("Pseudo-alignment and read quantification with Kallisto")
h1("Pseudo-alignment and read quantification with Kallisto")
Kallisto is a modern read quantification software that makes heavy use of k-mer hash tables and De-Bruijn graphs to "pseudo-align" reads [@Bray2016]. This means that Kallisto does not map a read to a given position in a genome/transcriptome, but rather to an abstract representation of a transcript. Kallisto then performs a likelihood estimation and expectation maximization algorithms to calculate a count and - at request - bootstraps a confidence value.
Step 1 is to create the Kallisto index from a transcriptome:
```{bash, eval=FALSE} cd ~/results kallisto index -i Potra01-mRNA.idx \ ~/share/Day1/reference/fasta/Potra01-mRNA.fa.gz
We could modify the _k_-mer size using the `-k` option. This is - however - often a very sensitive parameter in _k_-mer/hash based applications. Lower values will provide higher sensitivity, but lower specificity and vice versa for higher values of _k_. The optimal choice for a _k_-mer would be the longest read length without a single mismatch due to natural polymorphisms or technical artefacts. In practise, this is very difficult to estimate unless you have access to a large pool of population genomics data. Next we can quantify the input data using this newly created index: ```{bash, eval=FALSE} mkdir -p ~/results/kallisto cd ~/results/kallisto find ../trimmomatic -name "*trimmomatic_[12].fq.gz" | sort | head -n 4 | while read FW_READ do read RV_READ FILEBASE=$(basename "${FW_READ%_1.fq.gz}") kallisto quant -i ~/results/Potra01-mRNA.idx -b 100 \ -o . -t 16 "$FW_READ" "$RV_READ" | tee $FILEBASE.log # Kallisto doesn't let us specify an output filename so we rename all # output files mv "abundance.tsv" $FILEBASE"_abundance.tsv" mv "abundance.h5" $FILEBASE"_abundance.h5" mv "run_info.json" $FILEBASE"_run_info.json" done
The option -b 100
performs a bootstrap confidence estimation using 100 iterations.
Kallisto outputs several files:
Now let us run multiqc a final couple times
multiqc
can read the kallisto log file
```{bash, eval=FALSE} ~/results/kallisto multiqc.
Finally, let's create a full, final directory with all logs to have a complete MultiQC report: ```{bash, eval=FALSE} mkdir ~/results/qc_report multiqc -o ~/results/qc_report ~/results
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.