validate_velocity_input: Validate input to get_velocity_files

View source: R/velocity.R

validate_velocity_inputR Documentation

Validate input to get_velocity_files

Description

Validate input to get_velocity_files

Usage

validate_velocity_input(
  L,
  Genome,
  Transcriptome,
  out_path,
  compress_fa,
  width,
  exon_option
)

Arguments

L

Length of the biological read. For instance, 10xv1: 98 nt, 10xv2: 98 nt, 10xv3: 91 nt, Drop-seq: 50 nt. If in doubt check read length in a fastq file for biological reads with the bash commands: If the fastq file is gzipped, then do ⁠zcat your_file.fastq.gz | head⁠ on Linux. If on Mac, then zcat < your_file.fastq.gz | head. Then you will see lines with nucleotide bases. Copy one of those lines and determine its length with str_length in R or ⁠echo -n <the sequence> | wc -c⁠ in bash. Which file corresponds to biological reads depends on the particular technology.

Genome

Either a BSgenome or a XStringSet object of genomic sequences, where the intronic sequences will be extracted from. Use genomeStyles to check which styles are supported for your organism of interest; supported styles can be interconverted. If the style in your genome or annotation is not supported, then the style of chromosome names in the genome and annotation should be manually set to be consistent.

Transcriptome

A XStringSet, a path to a fasta file (can be gzipped) of the transcriptome which contains sequences of spliced transcripts, or NULL. The transcriptome here will be concatenated with the intronic sequences to give one fasta file. When NULL, the transriptome sequences will be extracted from the genome given the gene annotation, so it will be guaranteed that transcript IDs in the transcriptome and in the annotation match. Otherwise, the type of transcript ID in the transcriptome must match that in the gene annotation supplied via argument X.

out_path

Directory to save the outputs written to disk. If this directory does not exist, then it will be created. Defaults to the current working directory.

compress_fa

Logical, whether to compress the output fasta file. If TRUE, then the fasta file will be gzipped.

width

Maximum number of letters per line of sequence in the output fasta file. Must be an integer.

exon_option

Character, indicating how exonic sequences should be included in the kallisto index. Must be one of the following:

full

The full cDNA sequences, which include the full exonic sequences, will be used. This is the default.

junction

Only the exon-exon junctions, with L-1 bases on each side of the junctions, will be used.

Value

Will throw error if validation fails. Returns a named list whose first element is the normalized path to output directory, and whose second element is the normalized path to the transcriptome file if specified.


BUStools/BUSpaRse documentation built on Aug. 2, 2024, 5:07 a.m.