summarize_blast_result: Summarize BLAST result
In disprose: Discriminating Probes Selection

summarize_blast_result

R Documentation

Summarize BLAST result

Description

Summarize aligned, not aligned and undesirably aligned sequences

Usage

summarize_blast_result(
  sum.aligned = "sp",
  blast.probe.id.var,
  blast.res.id.var,
  blast.res.title.var,
  reference.id.var,
  reference.title.var,
  titles = FALSE,
  add.blast.info = FALSE,
  data.blast.info,
  check.blast.for.source = FALSE,
  source = NULL,
  switch.ids = FALSE,
  switch.table,
  mc.cores = 1,
  digits = 2,
  sep = ";",
  temp.db = NULL,
  delete.temp.db = TRUE,
  return = "summary",
  write.alignment = "DB",
  alignment.db = NULL,
  alignment.table.sp.aligned = NULL,
  alignment.table.sp.not.aligned = NULL,
  alignment.table.nonsp = NULL,
  change.colnames.dots = TRUE,
  file.sp.aligned = NULL,
  file.sp.not.aligned = NULL,
  file.nonsp = NULL,
  verbose = TRUE
)

Arguments

`sum.aligned`	character; summarize specific or not specific alignments; possible values are `"sp"` (aligned and not aligned specific subjects) and `"nonsp"` (aligned non specific subjects)
`blast.probe.id.var`	vector of query identification numbers from BLAST result data
`blast.res.id.var, blast.res.title.var`	vector of subject identification numbers and titles from BLAST result data
`reference.id.var, reference.title.var`	vector of identification numbers and titles of specific sequences that should be or might be aligned
`titles`	logical; include titles in alignment reports
`add.blast.info`	logical; add other BLAST results
`data.blast.info`	data frame; additional BLAST result from BLAST result data
`check.blast.for.source`	logical; delete queries that are not aligned with one obligatory sequence
`source`	identification number of obligatory sequence for alignment
`switch.ids`	logical; use different identification numbers for BLAST result's subjects
`switch.table`	data frame; table of old and new identification numbers (and new titles) linked by row
`mc.cores`	integer; number of processors for parallel computation (not supported on Windows)
`digits`	integer; number of decimal places to round the result
`sep`	character; the field separator character
`temp.db`	character; temporal SQLite database name and path
`delete.temp.db`	logical; delete temporal SQLite database afterwards
`return`	character; returned object; possible values are `"list"` (list of data frames with alignment summary and report for each probe) and `"summary"` (data frame with summary for all probes is returned and alignment reports are written into files or SQLite database tables)
`write.alignment`	character; write alignment reports into files (`"file"`) or SQLite database tables (`"DB"`; used if (`return = "summary"`))
`alignment.db, alignment.table.sp.aligned, alignment.table.sp.not.aligned, alignment.table.nonsp`	character; SQLite database name and path, tables names (used if `write.alignment = "DB"`)
`change.colnames.dots`	logical; change dots to underscore in data frame column names (used if `write.alignment = "DB"`)
`file.sp.aligned, file.sp.not.aligned, file.nonsp`	character; file names and path (used if `write.alignment = "file"`)
`verbose`	logical; show messages

Details

This function works with data frame created by blast_local function. It takes BLAST results, divides aligned subjects on specific (that should be aligned) and non specific (that should not be aligned) according to reference) values. Function summarizes amount of aligned and not aligned specific subjects and amount of aligned non specific subjects.

When sum.aligned = "sp" aligned and not aligned specific subjects are summarized and reference.id.var and reference.title.var should contain sequences that it is necessary to align with. When sum.aligned = "nonsp" aligned non specific subjects are summarized and reference.id.var should contain sequences that may be aligned (that are not considered as non specific), no titles needed.

When return = "summary", function returns summary (amount of aligned and not aligned subjects) and writes sorted alignments (alignment report) in file (write.alignment = "file") or SQLite database (write.alignment = "DB"). Usually only subjects' ids and (optionally) titles are returned, but you may add as many BLAST results as you like with add.blast.info and data.blast.info parameters. If you add some BLAST results, all alignments will present in alignment report, if not - duplicated subjects will be deleted.

By default result tables in database (if write.alignment = "DB") are "sp_aligned", "sp_not_aligned" and "nonsp", Results are written by appending, so if files or tables already exist, data will be added into them.

If subjects identification numbers in BLAST result data differ from those in reference.id.var you may use switch.ids = TRUE to change BLAST ids into new according to switch.table. switch.table must be a data frame with column one - old ids, column two - new ids and (optionally) column three - new titles. Do not use dots in column names.

When check.blast.for.source = TRUE probes that are non blasted for one special subject (usually the sequence that was cut for probes) are deleted. No check.blast.for.source is performed if sum.aligned = "nonsp". Check for source is performed after the possible id.switch, so source should be identification number of same type as reference.

Probe identification number must be character variable.

If alignment report is written into database, probe identification variable is indexed in all tables. Also it is highly recommended to set change.colnames.dots = TRUE to change possible dots to underscore within result data frame's column names and avoid further mistakes.

While working function saves data in temporal SQLite database. Function will stop if same database already exists, so deleting temporal database is highly recommended.

Value

List of data frames with alignment summary and report for each probe or data frame with summary for all probes (alignment reports are written into files or SQLite database tables).

Author(s)

Elena N. Filatova

Examples

path <- tempdir()
dir.create (path)
# load blast results with subject accession numbers
data(blast.fill)
#load metadata of all Chlamydia pneumoniae sequences - they are subjects that
# do not count as nonspecific and may be aligned
data(meta.all)
# load metadata with target Chlamydia pneumoniae sequences - they are specific subjects
# that must be aligned
# make new accession numbers to count all WGS sequences as one (see unite_NCBI_ac.nums ())
meta.target.new.ids <- unite_NCBI_ac.nums (data = meta.target,
                                          ac.num.var = meta.target$GB_AcNum,
                                          title.var = meta.target$title,
                                          db.var = meta.target$source_db,
                                          type = "shotgun", order = TRUE,
                                          new.titles = TRUE)
# summarize blast results, count aligned specific subjects with "switch ids" option
# (WGS sequences are counted as one). Add query cover information.
blast.sum.sp <- summarize_blast_result (sum.aligned = "sp",
                                       blast.probe.id.var = blast.fill$Qid,
                                       blast.res.id.var = blast.fill$Racc,
                                       blast.res.title.var = blast.fill$Rtitle,
                                       reference.id.var = meta.target.new.ids$new.id,
                                       reference.title.var = meta.target.new.ids$new.title,
                                       titles = TRUE,
                                       add.blast.info = TRUE,
                                       data.blast.info = data.frame(Qcover = blast.fill$Qcover),
                                       switch.ids = TRUE, switch.table = meta.target.new.ids,
                                       temp.db = paste0 (path, "/temp.db"), delete.temp.db = TRUE,
                                       return = "summary", write.alignment = "DB",
                                       alignment.db = paste0 (path, "/alig.db"))
# summarize nonspecific alignments (that are not in meta.all dataframe)
blast.sum.nonsp <- summarize_blast_result (sum.aligned = "nonsp",
                                          blast.probe.id.var = blast.fill$Qid,
                                          blast.res.id.var = blast.fill$Racc,
                                          blast.res.title.var = blast.fill$Rtitle,
                                          reference.id.var = meta.all$GB_AcNum,
                                          reference.title.var = meta.all$title,
                                          titles = TRUE, switch.ids = FALSE,
                                          add.blast.info = TRUE,
                                          data.blast.info = data.frame(Qcover = blast.fill$Qcover),
                                          temp.db = paste0 (path, "/temp.db"),
                                          delete.temp.db = TRUE,
                                          return = "summary", write.alignment = "DB",
                                          alignment.db = paste0 (path, "/alig.db"))
# all specific targets are aligned
sp.aligned <- read_from_DB(database = paste0 (path, "/alig.db"), table = "sp_aligned")
# no targets that are not aligned
sp.not.aligned <- read_from_DB(database = paste0 (path, "/alig.db"), table = "sp_not_aligned")
# No nonspecific alignments
nonsp <- read_from_DB(database = paste0 (path, "/alig.db"), table = "nonsp")
file.remove (paste0 (path, "/alig.db"))

disprose documentation built on March 19, 2022, 2:15 a.m.