View source: R/subset_methods.R
subsetORFs | R Documentation |
Create a SQM object containing only the requested ORFs, and the contigs and bins that contain them. Internally, all the other subset functions in this package end up calling subsetORFs
to do the work for them.
subsetORFs(
SQM,
orfs,
tax_source = "orfs",
trusted_functions_only = FALSE,
ignore_unclassified_functions = FALSE,
rescale_tpm = FALSE,
rescale_copy_number = FALSE,
recalculate_bin_stats = TRUE,
contigs_override = NULL,
allow_empty = FALSE
)
SQM |
SQM object to be subsetted. |
orfs |
character. Vector of ORFs to be selected. |
tax_source |
character. Features used for calculating aggregated abundances at the different taxonomic ranks. Either |
trusted_functions_only |
logical. If |
ignore_unclassified_functions |
logical. If |
rescale_tpm |
logical. If |
rescale_copy_number |
logical. If |
recalculate_bin_stats |
logical. If |
contigs_override |
character. Optional vector of contigs to be included in the subsetted object. |
allow_empty |
(internal use only). |
SQM object containing the requested ORFs.
While this function selects the contigs and bins that contain the desired orfs, it DOES NOT recalculate contig abundance and statistics based on the selected ORFs only. This means that the abundances presented in tables such as SQM$contig$abund
will still refer to the complete contigs, regardless of whether only a fraction of their ORFs are actually present in the returned SQM object. This is also true for the statistics presented in SQM$contigs$table
. Bin statistics may be recalculated if rescale_copy_number
is set to TRUE
, but recalculation will be based on contigs, not ORFs.
data(Hadza)
# Select the 100 most abundant ORFs in our dataset.
mostAbundantORFnames = names(sort(rowSums(Hadza$orfs$tpm), decreasing=TRUE))[1:100]
mostAbundantORFs = subsetORFs(Hadza, mostAbundantORFnames)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.