validate_velocity_input: Validate input to get_velocity_files
In BUStools/BUSpaRse: kallisto | bustools R utilities

validate_velocity_input

R Documentation

Validate input to get_velocity_files

Description

Validate input to get_velocity_files

Usage

validate_velocity_input(
  L,
  Genome,
  Transcriptome,
  out_path,
  compress_fa,
  width,
  exon_option
)

Arguments

`L`	Length of the biological read. For instance, 10xv1: 98 nt, 10xv2: 98 nt, 10xv3: 91 nt, Drop-seq: 50 nt. If in doubt check read length in a fastq file for biological reads with the `bash` commands: If the fastq file is gzipped, then do `⁠zcat your_file.fastq.gz \| head⁠` on Linux. If on Mac, then `zcat < your_file.fastq.gz \| head`. Then you will see lines with nucleotide bases. Copy one of those lines and determine its length with `str_length` in R or `⁠echo -n <the sequence> \| wc -c⁠` in `bash`. Which file corresponds to biological reads depends on the particular technology.
`Genome`	Either a `BSgenome` or a `XStringSet` object of genomic sequences, where the intronic sequences will be extracted from. Use `genomeStyles` to check which styles are supported for your organism of interest; supported styles can be interconverted. If the style in your genome or annotation is not supported, then the style of chromosome names in the genome and annotation should be manually set to be consistent.
`Transcriptome`	A `XStringSet`, a path to a fasta file (can be gzipped) of the transcriptome which contains sequences of spliced transcripts, or `NULL`. The transcriptome here will be concatenated with the intronic sequences to give one fasta file. When `NULL`, the transriptome sequences will be extracted from the genome given the gene annotation, so it will be guaranteed that transcript IDs in the transcriptome and in the annotation match. Otherwise, the type of transcript ID in the transcriptome must match that in the gene annotation supplied via argument `X`.
`out_path`	Directory to save the outputs written to disk. If this directory does not exist, then it will be created. Defaults to the current working directory.
`compress_fa`	Logical, whether to compress the output fasta file. If `TRUE`, then the fasta file will be gzipped.
`width`	Maximum number of letters per line of sequence in the output fasta file. Must be an integer.
`exon_option`	Character, indicating how exonic sequences should be included in the kallisto index. Must be one of the following: full The full cDNA sequences, which include the full exonic sequences, will be used. This is the default. junction Only the exon-exon junctions, with L-1 bases on each side of the junctions, will be used.

Value

Will throw error if validation fails. Returns a named list whose first element is the normalized path to output directory, and whose second element is the normalized path to the transcriptome file if specified.

BUStools/BUSpaRse documentation built on Aug. 2, 2024, 5:07 a.m.