fill_blast_result: Complement BLAST result

fill_blast_resultR Documentation

Complement BLAST result

Description

Provides subjects' GenInfo Identifiers if BLAST alignment result does not contain one.

Usage

fill_blast_results(
  blast.result,
  AcNum.column.name = "Racc",
  GI.column.name = "Rgi",
  delete.version = FALSE,
  version.sep = ".",
  add.gi = "DB",
  add.gi.df,
  temp.db = NULL,
  delete.temp = FALSE,
  add.gi.db = NULL,
  add.gi.table = NULL,
  add.gi.ac.column.name = "AC",
  add.gi.gi.column.name = "GI",
  mc.cores = 1,
  verbose = TRUE
)

delete_AcNum_version(ac.num.var, version.sep = ".", mc.cores = 1)

Arguments

blast.result

data frame; BLAST alignment result

AcNum.column.name, GI.column.name

character; name of column with subject accession numbers and GenInfo Identifier numbers from BLAST result data frame

delete.version

logical; remove version suffix from subject accession number

version.sep

character; accession number and version suffix separator (a dot for NCBI accession numbers)

add.gi

character; table with linked accession and GI numbers is taken from SQLite database ("DB") or data frame ("DF")

add.gi.df

data frame with table (used if add.gi = "DF")

temp.db

character; temporal SQLite database name and path

delete.temp

logical; delete created temporal SQLite database

add.gi.db, add.gi.table, add.gi.ac.column.name, add.gi.gi.column.name

SQLite database name and path, table name and name of columns with accession and GI numbers (used if add.gi = "DB")

mc.cores

integer; number of processors for parallel computation (not supported on Windows)

verbose

logical; show messages

ac.num.var

vector of accession numbers

Details

BLAST alignment, performed with local database, may not contain subject GI information. Also subject accession may contain version suffix. This can make it difficult to analyze the results further. This function adds subject GI and removes subject accession version suffix.

To add GI GenInfo Identifiers table with them linked to accession numbers must be provided as data frame or SQLite database table. add.gi.df must be a data frame with column one - accession numbers, column two - GenInfo Identifier numbers. If add.gi = "DF" temporal SQLite database is created.

SQLite database table with accession and GI numbers should not contain duplicated rows. It is also highly recommended to index accession numbers' variable in database.

delete.version executes in the first step, so if you use this option accession numbers in add.gi table must not contain version suffix.

AcNum.column.name, GI.column.name, add.gi.ac.column.name and dd.gi.gi.column.name must be column names exactly as in data frame.

Value

blast.result data frame with added GI and deleted accession version suffix.

Functions

  • fill_blast_results: Provides subjects' Genbank Identifiers if BALST alignment result does not contain one

  • delete_AcNum_version: Remove accession version suffix

Author(s)

Elena N. Filatova

Examples

path <- tempdir()
dir.create (path)
# load raw blast results
data (blast.raw)
#load meta.target with result (targets' sequences) GI and Acc.nums
data (meta.target)
blast.fill <- fill_blast_results(blast.result = blast.raw, delete.version = TRUE,
                                 add.gi = "DF", add.gi.df = meta.target[, c("GB_AcNum", "gi")],
                                 temp.db = paste0 (path, "/temp.db"), delete.temp = TRUE)


disprose documentation built on March 19, 2022, 2:15 a.m.