Description Usage Arguments Fields Methods Examples
EDAMatrix is a helper class for wrapping data matrices, with optional support for row and column datadata. Methods are provided for common exploratory data analysis summary statistics, transformations, and visualizations.
1 2 3 4 5 6 |
dat
: An m x n dataset.
row_mdata
: A matrix or data frame with rows corresponding to the row
names of dat
col_mdata
: A matrix or data frame with rows corresponding to the
column names of dat
row_names
: Column name or number containing row identifiers. If set to
rownames
(default), row names will be used as identifiers.
col_names
: Column name or number containing column identifiers. If set to
colnames
(default), column names will be used as identifiers.
row_mdata_rownames
: Column name or number containing row metadata row
identifiers. If set to rownames
(default), row names will be used
as identifiers.
col_mdata_rownames
: Column name or number containing col metadata row
identifiers. If set to rownames
(default), row names will be used
as identifiers.
row_color
: Row metadata field to use for coloring rowwise plot elements.
row_shape
: Row metadata field to use for determine rowwise plot
element shape.
row_label
: Row metadata field to use when labeling plot points or
other elements.
col_color
: Column metadata field to use for coloring columnwise plot elements.
col_shape
: Column metadata field to use for determine columnwise plot
element shape.
col_label
: Column metadata field to use when labeling plot points or
other elements.
color_pal
: Color palette to use for relevant plotting methods
(default: Set1
).
title
: Text to use as a title or subtitle for plots.
ggplot_theme
: Default theme to use for ggplot2 plots
(default: theme_bw
).
dat
: Underlying data matrix
row_mdata
: Dataframe containing row metadata
col_mdata
: Dataframe containing column metadata
clear_cache()
: Clears EDAMatrix cache.
clone()
: Creates a copy of the EDAMatrix instance.
cluster_tsne(k=10, ...)
: Clusters rows in dataset using a combination
of t-SNE and k-means clustering.
detect_col_outliers(num_sd=2, ctend='median', meas='pearson')
:
Measures average pairwise similarities between all columns in the dataset.
Outliers are considered to be those columns who mean similarity to
all other columns is greater than num_sd
standard deviations from the
average of averages.
detect_row_outliers(num_sd=2, ctend='median', meas='pearson')
:
Measures average pairwise similarities between all rows in the dataset.
Outliers are considered to be those rows who mean similarity to
all other rows is greater than num_sd
standard deviations from the
average of averages.
feature_cor()
: Detects dependencies between column metadata entries
(features) and dataset rows.
filter_col_outliers(num_sd=2, ctend='median', meas='pearson')
:
Removes column outliers from the dataset. See detect_col_outliers()
for details of outlier detection approach.
filter_row_outliers(num_sd=2, ctend='median', meas='pearson')
:
Removes row outliers from the dataset. See detect_row_outliers()
for details of outlier detection approach.
filter_cols(mask)
: Accepts a logical vector of length ncol(obj$dat)
and returns a new EDAMatrix instance with only the columns associated
with TRUE
values in the mask.
filter_rows(mask)
: Accepts a logical vector of length nrow(obj$dat)
and returns a new EDAMatrix instance with only the rowsumns associated
with TRUE
values in the mask.
impute(method='knn')
: Imputes missing values in the dataset and stores
the result in-place. Currently only k-Nearest Neighbors (kNN)
imputation is supported.
log(base=exp(1), offset=0)
: Log-transforms data.
log1p()
: Log(x + 1)-transforms data.
pca(...)
: Performs principle component analysis (PCA) on the dataset
and returns a new EDAMatrix instance of the projected data points.
Any additional arguements specified are passed to the prcomp()
function.
pca_feature_cor(meas='pearson', ...)
: Measures correlation between
dataset features (column metadata fields) and dataset principle
components.
plot_cor_heatmap(meas='pearson', interactive=TRUE, ...)
: Plots a
correlation heatmap of the dataset.
plot_densities(color=NULL, title="", ...)
: Plots densities for each
column in the dataset.
plot_feature_cor(meas='pearson', color_scale=c('green', 'red')
:
Creates a tile plot of projected data / feature correlations. See
feature_cor()
function.
plot_heatmap(interactive=TRUE, ...)
: Generates a heatmap plot of the
dataset
plot_pairwise_column_cors(color=NULL, title="", meas='pearson', mar=c(12,6,4,6))
:
Plot median pairwise column correlations for each variable (column)
in the dataset.
plot_pca(pcx=1, pcy=2, scale=FALSE, color=NULL, shape=NULL, title=NULL, text_labels=FALSE, ...)
:
Generates a two-dimensional PCA plot from the dataset.
plot_tsne(color=NULL, shape=NULL, title=NULL, text_labels=FALSE, ...)
:
Generates a two-dimensional t-SNE plot from the dataset.
print()
: Prints an overview of the object instance.
subsample(row_n=NULL, col_n=NULL, row_ratio=NULL, col_ratio=NULL)
:
Subsamples dataset rows and/or columns.
summary(markdown=FALSE, num_digits=2)
: Summarizes overall
characteristics of a dataset.
t()
: Transposes dataset rows and columns.
tsne(...)
: Performs T-distributed stochastic neighbor embedding (t-SNE)
on the dataset and returns a new EDAMatrix instance of the projected
data points. Any additional arguements specified are passed to the
Rtsne()
function.
tsne_feature_cor(meas='pearson', ...)
: Measures correlation between
dataset features (column metadata fields) and dataset t-SNE projected
axes.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.