nifH_reference_taxonomy_v2: Taxonomic and supplementary information for _nifH_ reference...

Description Usage Format Source

Description

The dataset of nifH sequences used for: 1) constructing reference alignment and tree, 2) evaluating ppit accuracy, and 3) taxonomic inferencing. Sequences curated from GenBank using ARBitrator (Heller et al., 2014).

Usage

1

Format

Data frame containing 8876 rows and 26 columns.

Domain

Domain of source organism

Phylum

Phylum of source organism

Class

Class of source organism

Order

Order of source organism

Family

Family of source organism

Genus

Genus of source organism

Species

Species of source organism

Strain

Strain of source organism

Reference_strain

Type/reference strains marked with "X"

pid_ref

Sequences used in percent identity calculation marked with "X"

Used_in_reference_checking

Sequences used during taxonomic inferencing marked with "X"

Version_added

Database version in which sequence was added

Source_organism

Source organism of sequence

Tip_label

Tip label on nifH_reference_tree_v2

Nucleotide_accession

Nucleotide accession of source scaffold, genome, etc.

Nucleotide_accession_sequence_length..bp.

Length of source scaffold, genome, etc. (bp)

Creation_date

Date when sequence was deposited into GenBank

Nucleotide_sequence

Nucleotide sequence of reference nifH

CDS_start

Coding start position in nucleotide accession

CDS_stop

Coding stop position in nucleotide accession

Nucleotide_sequence_length

Length of nifH reference sequence (bp)

Gene_location

Location of nifH (i.e., chromosome, plasmid, undetermined)

Protein_accession

Protein accession number

ARBitrator_search_set

Sequences used for initial ARBitrator search marked with "X"

Alignment_seed_set

Sequences used MAFFT-DASH seed alignment

Suspected_NifH_homolog

Suspected NifH homologs marked with "X"

...

Source

BJ Kapili and AE Dekas. PPIT: an R package for inferring microbial taxonomy from nifH sequences. In. prep.


BKapili/ppit documentation built on July 23, 2020, 1:52 a.m.