knn_pairplots: DML pairplots with k-NN.

Description Usage Arguments Value

Description

This function allows multiple 2D-scatter plots for different pairs of attributes of the same dataset, and to plot regions defined by a k-NN classifier and a distance. The distance can be provided by a metric PSD matrix, a matrix of a linear transformation, or by a distance metric learning algorithm, that can learn the distance during the plotting, or it can be fitted previously.

Usage

1
2
3
4
5
6
7
8
knn_pairplots(X, y, k = 1, attrs = NULL, xattrs = NULL,
  yattrs = NULL, diag = "hist", sections = "mean", metric = NULL,
  transformer = NULL, dml = NULL, dml_fitted = FALSE, title = NULL,
  grid_split = c(400, 400), grid_step = c(0.1, 0.1),
  label_legend = TRUE, legend_loc = "center right", cmap = NULL,
  label_colors = NULL, plot_points = TRUE, plot_regions = TRUE,
  region_intensity = 0.4, legend_plot_points = TRUE,
  legend_plot_regions = TRUE, legend_on_axis = FALSE, ...)

Arguments

X

array-like of size (N x d), where N is the number of samples, and d is the number of features.

y

array-like of size N, where N is the number of samples.

k

The number of neighbors for the k-NN classifier. Integer.

attrs

A list specifying the dataset attributes to show in the scatter plot. The items can be the keys, if X is a pandas dataset, or integer indexes with the attribute position. If None, and xattrs and yattrs are NULL, all the attributes will be taken.

xattrs

A list specifying the dataset attributes to show in X axis. The items can be the keys, if X is a pandas dataset, or integer indexes with the attribute position. Ignored if attrs is specified.

yattrs

A list specifying the dataset attributes to show in Y axis. The items can be the keys, if X is a pandas dataset, or integer indexes with the attribute position. Ignored if attrs is specified.

diag

What to plot on the diagonal subplots. Allowed options are: - "hist" : An histogram of the data will be plot for the attribute.

sections

It specifies how to take sections in the features space, if there are more than two features in the dataset. It is used to plot the classifier fixing the non-ploting attributes in this space section. Allowed values are: - 'mean' : takes the mean of the remaining attributes to plot the classifier region. - 'zeros' : takes the remaining attributes as zeros to plot the classifier region.

metric

A positive semidefinite matrix of size (d x d), where d is the number of features. Ignored if dml or transformer is specified.

transformer

A matrix of size (d' x d), where d is the number of features and d' is the desired dimension. Ignored if dml is specified.

dml

A distance metric learning algorithm constructed from a function in 'dml'. If metric, transformer and dml are None, no distances are used in the plot.

dml_fitted

Specifies if the DML algorithm is already fitted. If True, the algorithm's fit method will not be called. Boolean.

title

An optional title for the plot.

grid_split

A list with two items, specifying the number of partitions, in the X and Y axis, to make in the plot to paint the classifier region. Each split will define a point where the predict method of the classifier is evaluated. It can be None. In this case, the 'grid_step' parameter will be considered.

grid_step

A list with two items, specifying the distance between the points in the grid that defines the classifier plot. Each created point in this way will define a point where the predict method of the classifier is evaluated. It is ignored if the parameter 'grid_split' is not NULL.

label_legend

If True, a legend with the labels and its colors will be ploted.

legend_loc

Specifies the legend position. Ignored if legend is not plotted. Allowed values are: 'best' (0), 'upper right' (1), 'upper left' (2), 'lower left' (3), 'lower right' (4), 'right' (5), 'center left' (6), 'center right' (7), 'lower center' (8), 'upper center' (9), 'center' (10). Alternatively can be a 2-tuple giving x, y of the lower-left corner of the legend in axes coordinates.

cmap

A string defining a python's matplotlib colormap.

label_colors

A list of size C with matplotlib colors, or strings specitying a color, where C is the number of classes in y. Each class will be plotted with the corresponding color. If cmap is NULL and label_colors is NULL, a default Colormap is used.

plot_points

If True, points will be plotted.

plot_regions

If True, the classifier regions will be plotted.

region_intensity

A float between 0 and 1, indicating the transparency of the colors in the classifier regions respect the point colors.

legend_plot_points

If True, points are plotted in the legend.

legend_plot_regions

If True, classifier regions are plotted in the legend.

legend_on_axis

If True, the legend is plotted inside the scatter plot. Else, it is plotted out of the scatter plot.

...

Additional arguments for ‘Matplotlib.suplots' python’s method.

transform

If True, projects the data by the learned transformer and plots the transform data. Else, the classifier region will be ploted with the original data, but the regions will change according to the learned distance.

xrange

A list with two items, specifying the minimum and maximum range to plot in the X axis. If None, it will be calculated according to the maximum and minimum of the X feature.

yrange

A list with two items, specifying the minimum and maximum range to plot in the Y axis. If None, it will be calculated according to the maximum and minimum of the Y feature.

xlabel

An optional title for the X axis.

ylabel

An optional title for the Y axis.

Value

A Python's 'matplotlib.figure.Figure' object with the plot.


jlsuarezdiaz/rDML documentation built on May 24, 2019, 12:35 a.m.