View source: R/outlier_process.R
outlier_aggregate | R Documentation |
The set of functions prefixed with "outlier_" are used to detect
outliers. They are designed to be run after you have extracted your
junctions and coverage based features, in the order outlier_detect
,
outlier_aggregate
. Or, alternatively the wrapper function
outlier_process
can be used to run the 2 functions stated above in one
go. For more details of the individual functions, see "Details".
outlier_aggregate(
junctions,
samp_id_col = "samp_id",
bp_param = BiocParallel::SerialParam()
)
outlier_detect(
junctions,
feature_names = c("score", "coverage_score"),
bp_param = BiocParallel::SerialParam(),
...
)
outlier_process(
junctions,
feature_names = c("score", "coverage_score"),
samp_id_col = "samp_id",
bp_param = BiocParallel::SerialParam(),
...
)
junctions |
junction data as a RangedSummarizedExperiment-class object. |
samp_id_col |
name of the column in the SummarizedExperiment that details the sample ids. |
bp_param |
a BiocParallelParam-class instance denoting whether to parallelise the calculating of outlier scores across samples. |
feature_names |
names of assays in |
... |
additional arguments passed to the outlier detection model (isolation forest) for setting parameters. |
outlier_process
wraps all "outlier_" prefixed functions in
dasper. This is designed to simplify processing of the detecting outlier
junctions for those familiar or uninterested with the intermediates.
outlier_detect
will use the features in
assays named
feature_names
as input into an unsupervised outlier detection algorithm
to score each junction based on how outlier-y it looks in relation to other
junctions in the patient. The default expected score
and coverage_score
features can be calculated using the junction_process and
coverage_process respectively.
outlier_aggregate
will aggregate the outlier scores into a cluster-level.
It will then rank each cluster based on this aggregated score and annotate
each cluster with it's associated gene and transcript.
DataFrame
with one row per cluster detailing each cluster's
associated junctions, outlier scores, ranks and genes.
outlier_aggregate
: Aggregate outlier scores from per junction to
cluster-level
outlier_detect
: Detecting outlier junctions
for more details on the isolation forest model used: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html
##### Set up txdb #####
# use GenomicState to load txdb (GENCODE v31)
ref <- GenomicState::GenomicStateHub(
version = "31",
genome = "hg38",
filetype = "TxDb"
)[[1]]
##### Set up BigWig #####
# obtain path to example bw on recount2
bw_path <- recount::download_study(
project = "SRP012682",
type = "samples",
download = FALSE
)[[1]]
# cache the bw for speed in later
# examples/testing during R CMD Check
bw_path <- dasper:::.file_cache(bw_path)
##### junction_process #####
junctions_processed <- junction_process(
junctions_example,
ref,
types = c("ambig_gene", "unannotated"),
)
##### coverage_process #####
junctions_w_coverage <- coverage_process(
junctions_processed,
ref,
coverage_paths_case = rep(bw_path, 2),
coverage_paths_control = rep(bw_path, 3)
)
##### outlier_detect #####
junctions_w_outliers <- outlier_detect(junctions_w_coverage)
##### outlier_aggregate #####
outlier_scores <- outlier_aggregate(junctions_w_outliers)
##### outlier_process #####
# this wrapper will obtain outlier scores identical to those
# obtained through running the individual wrapped functions shown below
outlier_processed <- outlier_process(junctions_w_coverage)
# the two objects are equivalent
all.equal(outlier_processed, outlier_scores, check.attributes = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.