knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(rkal)

Installation

rkal can be installed as follows:

remotes::install.packages('alexvpickering/rkal')

Overview of rkal

kallisto is the fastest program for accurately quantifying transcript abundance from bulk RNA-seq data. rkal provides an R interface to kallisto functions index (for building a genome index) and quant (for running pseudoalignment). rkal automates argument specification for fastq files downloaded with GEOfastq and provides a convenient GUI for personal fastq files. After pseudoalignment, counts can be imported into R into a data structure that is compatible with crossmeta (for differential expression and meta analyses) and dseqr (a GUI for bulk and single-cell exploratory analyses as well as connectivity mapping). Alternatively, counts can be imported for GEO samples pre-aligned as part of the ARCHS4 project.

Getting Started using rkal

Prior to pseudoalignment, an index of the transcriptome must first be built:

#This will build the human Ensembl94 index for kallisto in the working directory
#this only needs to be run once
indices_dir <- getwd()
build_kallisto_index(indices_dir = indices_dir,
                     species = 'homo_sapiens', release = '94')

Next, we download an example fastq file with GEOfastq:

library(GEOfastq)
data_dir <- tempdir()

# first get metadata needed and download example fastq file
srp_meta <- crawl_gsms('GSM4875733')
res <- get_fastqs(srp_meta, data_dir)

Next we collect fastq file metadata needed to run pseudoalignement (are fastq files paired or single-end? Are there any samples split across multiple files?):

# we can get the necessary info automatically for fastqs from GEOfastq
quant_meta <- get_quant_meta(srp_meta, data_dir)

# for personal fastq files, a GUI will request this info in the next step

We are now ready to run pseudoalignment:

# exclude quant_meta for personal fastqs (will invoke GUI)
res <- run_kallisto_bulk(indices_dir, data_dir, quant_meta)

If you plan to use crossmeta or dseqr, you can easily generate a suitably annotated ExpressionSet:

eset <- load_seq(data_dir)

# if you downloaded the ARCHS4 pre-aligned GEO data
archs4_file <- '/path/to/human_matrix_v*.h5'
# eset <- load_archs4_seq(archs4_file, 'GSM4875733')

Alternative, see e.g. the tximport vignette if you prefer to run limma or DESeq2 differential expression analyses.

Session info

The following package and versions were used in the production of this vignette.

sessionInfo()


alexvpickering/rkal documentation built on Nov. 27, 2022, 8:38 p.m.