load_hittable: Load a table of scored matches

View source: R/io.R

load_hittableR Documentation

Load a table of scored matches

Description

This must be a TAB-delimited file with a header and the following columns:

Usage

load_hittable(filename, na_str = "N/A")

Arguments

filename

Filename or URL. This will be read by readr::read_tsv, which understands URLs and will handle decompression.

na_str

The characters that indicate missing data. NCBI-blast uses 'N/A', so that is the default here.

Details

  • qseqid - an identifier for the query sequence

  • evalue - the e-value reported by BLAST

  • score - the raw score (not the bitscore)

  • staxid - the subject taxon

Value

A data.frame

staxid

The staxid column must contain NCBI taxon ids. One hit may be associated with multiple taxon ids. In this case, we assume that all other fields are the same over the row, and create one new row for each taxa. This situation occurs, for example, in RefSeq databases, where identical sequences that are shared between multiple taxa are merged into one.

If any entries are missing taxon ids, then a warning is raised. If there are only a few missing ids, this may be fine. For example, some entries in RefSeq have no associated taxon ID. But if most or all of the ids are missing, then you probably need to reformat your BLAST database with a taxon table.


arendsee/phylostratr documentation built on Dec. 31, 2022, 10:22 a.m.