summarize_blast_result | R Documentation |
Summarize aligned, not aligned and undesirably aligned sequences
summarize_blast_result( sum.aligned = "sp", blast.probe.id.var, blast.res.id.var, blast.res.title.var, reference.id.var, reference.title.var, titles = FALSE, add.blast.info = FALSE, data.blast.info, check.blast.for.source = FALSE, source = NULL, switch.ids = FALSE, switch.table, mc.cores = 1, digits = 2, sep = ";", temp.db = NULL, delete.temp.db = TRUE, return = "summary", write.alignment = "DB", alignment.db = NULL, alignment.table.sp.aligned = NULL, alignment.table.sp.not.aligned = NULL, alignment.table.nonsp = NULL, change.colnames.dots = TRUE, file.sp.aligned = NULL, file.sp.not.aligned = NULL, file.nonsp = NULL, verbose = TRUE )
sum.aligned |
character; summarize specific or not specific alignments; possible values are
|
blast.probe.id.var |
vector of query identification numbers from BLAST result data |
blast.res.id.var, blast.res.title.var |
vector of subject identification numbers and titles from BLAST result data |
reference.id.var, reference.title.var |
vector of identification numbers and titles of specific sequences that should be or might be aligned |
titles |
logical; include titles in alignment reports |
add.blast.info |
logical; add other BLAST results |
data.blast.info |
data frame; additional BLAST result from BLAST result data |
check.blast.for.source |
logical; delete queries that are not aligned with one obligatory sequence |
source |
identification number of obligatory sequence for alignment |
switch.ids |
logical; use different identification numbers for BLAST result's subjects |
switch.table |
data frame; table of old and new identification numbers (and new titles) linked by row |
mc.cores |
integer; number of processors for parallel computation (not supported on Windows) |
digits |
integer; number of decimal places to round the result |
sep |
character; the field separator character |
temp.db |
character; temporal SQLite database name and path |
delete.temp.db |
logical; delete temporal SQLite database afterwards |
return |
character; returned object; possible values are |
write.alignment |
character; write alignment reports into files ( |
alignment.db, alignment.table.sp.aligned, alignment.table.sp.not.aligned, alignment.table.nonsp |
character;
SQLite database name and path, tables names (used if |
change.colnames.dots |
logical; change dots to underscore in data frame column names
(used if |
file.sp.aligned, file.sp.not.aligned, file.nonsp |
character; file names and path (used if |
verbose |
logical; show messages |
This function works with data frame created by blast_local function.
It takes BLAST results, divides aligned subjects on specific (that should be aligned)
and non specific (that should not be aligned) according to reference
) values.
Function summarizes amount of aligned and not aligned specific subjects and amount of aligned non specific subjects.
When sum.aligned = "sp"
aligned and not aligned specific subjects are summarized and
reference.id.var
and reference.title.var
should contain sequences that it is necessary to align with.
When sum.aligned = "nonsp"
aligned non specific subjects are summarized and
reference.id.var
should contain sequences that may be aligned (that are not considered as non specific),
no titles needed.
When return = "summary"
, function returns summary (amount of aligned and not aligned subjects) and writes
sorted alignments (alignment report) in file (write.alignment = "file"
) or SQLite database (write.alignment = "DB"
).
Usually only subjects' ids and (optionally) titles are returned, but you may add as many BLAST results as you like
with add.blast.info
and data.blast.info
parameters.
If you add some BLAST results, all alignments will present in alignment report,
if not - duplicated subjects will be deleted.
By default result tables in database (if write.alignment = "DB"
) are
"sp_aligned", "sp_not_aligned" and "nonsp",
Results are written by appending, so if files or tables already exist, data will be added into them.
If subjects identification numbers in BLAST result data differ from those in reference.id.var
you may use switch.ids = TRUE
to change BLAST ids into new according to switch.table
.
switch.table
must be a data frame with column one - old ids, column two - new ids and (optionally)
column three - new titles. Do not use dots in column names.
When check.blast.for.source = TRUE
probes that are non blasted for one special subject
(usually the sequence that was cut for probes) are deleted.
No check.blast.for.source
is performed if sum.aligned = "nonsp"
.
Check for source is performed after the possible id.switch
, so source
should be identification number of
same type as reference
.
Probe identification number must be character variable.
If alignment report is written into database, probe identification variable is indexed in all tables.
Also it is highly recommended to set change.colnames.dots = TRUE
to change possible dots to underscore
within result data frame's column names and avoid further mistakes.
While working function saves data in temporal SQLite database. Function will stop if same database already exists, so deleting temporal database is highly recommended.
List of data frames with alignment summary and report for each probe or data frame with summary for all probes (alignment reports are written into files or SQLite database tables).
Elena N. Filatova
path <- tempdir() dir.create (path) # load blast results with subject accession numbers data(blast.fill) #load metadata of all Chlamydia pneumoniae sequences - they are subjects that # do not count as nonspecific and may be aligned data(meta.all) # load metadata with target Chlamydia pneumoniae sequences - they are specific subjects # that must be aligned # make new accession numbers to count all WGS sequences as one (see unite_NCBI_ac.nums ()) meta.target.new.ids <- unite_NCBI_ac.nums (data = meta.target, ac.num.var = meta.target$GB_AcNum, title.var = meta.target$title, db.var = meta.target$source_db, type = "shotgun", order = TRUE, new.titles = TRUE) # summarize blast results, count aligned specific subjects with "switch ids" option # (WGS sequences are counted as one). Add query cover information. blast.sum.sp <- summarize_blast_result (sum.aligned = "sp", blast.probe.id.var = blast.fill$Qid, blast.res.id.var = blast.fill$Racc, blast.res.title.var = blast.fill$Rtitle, reference.id.var = meta.target.new.ids$new.id, reference.title.var = meta.target.new.ids$new.title, titles = TRUE, add.blast.info = TRUE, data.blast.info = data.frame(Qcover = blast.fill$Qcover), switch.ids = TRUE, switch.table = meta.target.new.ids, temp.db = paste0 (path, "/temp.db"), delete.temp.db = TRUE, return = "summary", write.alignment = "DB", alignment.db = paste0 (path, "/alig.db")) # summarize nonspecific alignments (that are not in meta.all dataframe) blast.sum.nonsp <- summarize_blast_result (sum.aligned = "nonsp", blast.probe.id.var = blast.fill$Qid, blast.res.id.var = blast.fill$Racc, blast.res.title.var = blast.fill$Rtitle, reference.id.var = meta.all$GB_AcNum, reference.title.var = meta.all$title, titles = TRUE, switch.ids = FALSE, add.blast.info = TRUE, data.blast.info = data.frame(Qcover = blast.fill$Qcover), temp.db = paste0 (path, "/temp.db"), delete.temp.db = TRUE, return = "summary", write.alignment = "DB", alignment.db = paste0 (path, "/alig.db")) # all specific targets are aligned sp.aligned <- read_from_DB(database = paste0 (path, "/alig.db"), table = "sp_aligned") # no targets that are not aligned sp.not.aligned <- read_from_DB(database = paste0 (path, "/alig.db"), table = "sp_not_aligned") # No nonspecific alignments nonsp <- read_from_DB(database = paste0 (path, "/alig.db"), table = "nonsp") file.remove (paste0 (path, "/alig.db"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.