extractor: Extract reads and output from Kraken
In rsahmi: Single-Cell Analysis of Host-Microbiome Interactions

extractor

R Documentation

Extract reads and output from Kraken

Description

Extract reads and output from Kraken

Usage

extract_taxids(
  kraken_report,
  taxon = c("d__Bacteria", "d__Fungi", "d__Viruses")
)

extract_kraken_output(
  kraken_out,
  taxids,
  odir,
  ofile = "kraken_microbiome_output.txt",
  ...
)

extract_kraken_reads(
  kraken_out,
  reads,
  ofile = NULL,
  odir = getwd(),
  threads = NULL,
  ...,
  envpath = NULL,
  seqkit = NULL
)

Arguments

`kraken_report`	The path to kraken report file.
`taxon`	An atomic character specify the taxa name wanted. Should follow the kraken style, connected by rank codes, two underscores, and the scientific name of the taxon (e.g., "d__Viruses")
`kraken_out`	The path to kraken output file.
`taxids`	A character specify NCBI taxonony identifier to extract.
`odir`	A string of directory to save the `ofile`.
`ofile`	A string of file save the kraken output of specified `taxids`.
`...`	`extract_kraken_output`: Additional arguments passed to `sink_csv()`. `extract_kraken_reads`: Additional arguments passed to `cmd_run()` method.
`reads`	The original fastq files (input in `kraken2`). You can pass two paired-end files directly.
`threads`	Number of threads to use, see `blit::cmd_help(blit::seqkit("grep"))`.
`envpath`	A string of path to be added to the environment variable `PATH`.
`seqkit`	A string of path to `seqkit` command.

Value

extract_taxids: An atomic character vector of taxon identifiers.

extract_kraken_output: A polars DataFrame.

extract_kraken_reads: Exit status invisiblely.

Examples

## Not run: 
# For 10x Genomic data, `fq1` only contain barcode and umi, but the official
# didn't give any information for this. In this way, I prefer using
# `umi-tools` to transform the `umi` into fq2 and then run `rsahmi` with
# only fq2.
blit::kraken2(
    fq1 = fq1,
    fq2 = fq2,
    classified_out = "classified.fq",
    # Number of threads to use
    blit::arg("--threads", 10L, format = "%d"),
    # the kraken database
    blit::arg("--db", kraken_db),
    "--use-names", "--report-minimizer-data",
) |> blit::cmd_run()

# `kraken_report` should be the output of `blit::kraken2()`
taxids <- extract_taxids(kraken_report = "kraken_report.txt")

# 1. `kraken_out` should be the output of `blit::kraken2()`
# 2. `taxids` should be the output of `extract_taxids()`
# 3. `odir`: the output directory
extract_kraken_output(
    kraken_out = "kraken_output.txt",
    taxids = taxids,
    odir = # specify the output directory
)

# 1. `kraken_out` should be the output of `extract_kraken_output()`
# 2. `fq1` and `fq2` should be the same with `blit::kraken2()`
extract_kraken_reads(
    kraken_out = "kraken_microbiome_output.txt",
    reads = c(fq1, fq2),
    threads = 10L, # Number of threads to use
    # try to change `seqkit` argument into your seqkit path. If `NULL`, the
    # internal will detect it in your `PATH` environment variable
    seqkit = NULL
)

## End(Not run)

rsahmi documentation built on April 4, 2025, 1:46 a.m.