imageClusterPipeline: Generate and plot a color distance matrix from a set of...

View source: R/06_pipeline.R

imageClusterPipelineR Documentation

Generate and plot a color distance matrix from a set of images


Takes images, computes color clusters for each image, and calculates distance matrix/dendrogram from those clusters.


  cluster.method = "hist",
  distance.method = "emd",
  lower = c(0, 140/255, 0),
  upper = c(60/255, 1, 60/255),
  hist.bins = 3,
  kmeans.bins = 27,
  bin.avg = TRUE,
  norm.pix = FALSE,
  plot.bins = FALSE,
  pausing = TRUE, = "rgb",
  from = "sRGB",
  bounds = c(0, 1),
  sample.size = 20000,
  iter.max = 50,
  nstart = 5,
  img.type = FALSE,
  ordering = "default",
  size.weight = 0.5,
  color.weight = 0.5,
  plot.heatmap = TRUE,
  return.distance.matrix = TRUE,
  save.tree = FALSE,
  save.distance.matrix = FALSE,
  a.bounds = c(-127, 128),
  b.bounds = c(-127, 128)



Character vector of directories, image paths, or both.


Which method for getting color clusters from each image should be used? Must be either "hist" (predetermined bins generated by dividing each channel with equidistant bounds; calls getHistList) or "kmeans" (determine clusters using kmeans fitting on pixels; calls getKMeansList).


One of four possible comparison methods for calculating the color distances: "emd" (uses EMDistance, recommended), "chisq" (uses chisqDistance), "color.dist" (uses colorDistance; not appropriate if bin.avg=F), or "weighted.pairs" (weightedPairsDistance).


RGB or HSV triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]).


RGB or HSV triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:

  • Black: lower=c(0, 0, 0); upper=c(0.1, 0.1, 0.1)

  • White: lower=c(0.8, 0.8, 0.8); upper=c(1, 1, 1)

  • Green: lower=c(0, 0.55, 0); upper=c(0.24, 1, 0.24)

  • Blue: lower=c(0, 0, 0.55); upper=c(0.24, 0.24, 1)

If no background filtering is needed, set bounds to some non-numeric value (NULL, FALSE, "off", etc); any non-numeric value is interpreted as NULL.


Only applicable if cluster.method="hist". Number of bins for each channel OR a vector of length 3 with bins for each channel. Bins=3 will result in 3^3 = 27 bins; bins=c(2, 2, 3) will result in 2*2*3=12 bins (2 red, 2 green, 3 blue), etc. Passed to getHistList.


Only applicable if cluster.method="kmeans". Number of KMeans clusters to fit. Unlike getImageHist, this represents the actual final number of bins, rather than the number of breaks in each channel.


Logical. Should the color clusters used for the distance matrix be the average of the pixels in that bin (bin.avg=TRUE) or the center of the bin (FALSE)? If a bin is empty, the center of the bin is returned as the cluster color regardless. Only applicable if cluster.method="hist", since kmeans clusters are at the center of their assigned pixel clouds by definition.


Logical. Should RGB or HSV cluster values be normalized using normalizeRGB?


Logical. Should the bins for each image be plotted as they are calculated?


Logical. If plot.bins=TRUE, pause and wait for user keystroke before plotting bins for next image?

The color space ("rgb", "hsv", or "lab") in which to plot pixels.


The reference white passed to convertColorSpace; must be specified if using = "lab".


Display color space of image if clustering in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer.


Upper and lower limits for the channels; R reads in images with intensities on a 0-1 scale, but 0-255 is common.


Only applicable if cluster.method="kmeans". Number of pixels to be randomly sampled from filtered pixel array for performing fit. If set to FALSE, all pixels are fit, but this can be time-consuming, especially for large images. Passed to getKMeansList.


Only applicable if cluster.method="kmeans". Inherited from kmeans. The maximum number of iterations allowed during kmeans fitting. Passed to getKMeansList.


Only applicable if cluster.method="kmeans". Inherited from kmeans. How many random sets should be chosen? Passed to getKMeansList.


Logical. Should file extensions be retained with labels?


Logical if not left as "default". Should the color clusters in the list be reordered to minimize the distances between the pairs? If left as default, ordering depends on distance method: "emd" and "chisq" do not order clusters ("emd" orders on a case-by-case in the EMDistance function itself and reordering by size similarity would make chi-squared meaningless); "color.dist" and "weighted.pairs" use ordering. To override defaults, set to either T (for ordering) or F (for no ordering).


Weight of size similarity in determining overall score and ordering (if ordering=T).


Weight of color similarity in determining overall score and ordering (if ordering=T). Color and size weights do not necessarily have to sum to 1.


Logical. Should a heatmap of the distance matrix be plotted?


Logical. Should the distance matrix be returned to the R environment or just plotted?


Either logical or a filepath for saving the tree; default if set to TRUE is to save in current working directory as "ColorTree.newick".


Either logical or filepath for saving distance matrix; default if set to TRUE is to save in current working directory as "ColorDistanceMatrix.csv"

a.bounds, b.bounds

Passed to getLabHistList.Numeric ranges for the a (green-red) and b (blue-yellow) channels of Lab color space. Technically, a and b have infinite range, but in practice nearly all values fall between -128 and 127 (the default). Many images will have an even narrower range than this, depending on the lighting conditions and conversion; setting narrower ranges will result in finer-scale binning, without generating empty bins at the edges of the channels.


Color distance matrix, heatmap, and saved distance matrix and tree files if saving is TRUE.


This is the fastest way to get a distance matrix for color similarity starting from a folder of images. Essentially, it just calls in a series of other package functions in order: input images -> getImagePaths -> getHistList or getKMeansList followed by extractClusters -> getColorDistanceMatrix -> plotting -> return/save distance matrix. Sort of railroads you, but good for testing different combinations of clustering methods and distance metrics.


## Not run: 
colordistance::imageClusterPipeline(dir(system.file("extdata", "Heliconius/",
package="colordistance"), full.names=TRUE),"hsv", lower=rep(0.8,
3), upper=rep(1, 3), cluster.method="hist", distance.method="emd",
hist.bins=3, plot.bins=TRUE, save.tree="example_tree.newick",

## End(Not run)

hiweller/colordistance documentation built on Feb. 1, 2024, 7:49 p.m.