clustSimFunc: clustSimFunc

Description Usage Arguments Value Examples

View source: R/clustSimFunc.R

Description

clustSimFunc takes as input a data.frame object or a data.frame object name (dfData), a number (nClust), a numeric vector (ylimPlot) of two numbers indicating te lower and upper limits of the y-axis of the plot, a character string indicating the name of the output subdirectory (subDir), a character string (main) indicating the title of the plot, three positive numbers indicating the width (weight), height (height) and resolution (res) of the output plot, and calculates the silhouette values ​​for the number of "clusters" n in range 2 to nClust (maximum 10), for the data in the data.frame object dfData using the Gower clustering algorithm from the function daisy from package cluster (cluster::daisy). The silhouette values are for the number of clusters 2 to 10 are saved in a .txt file in the subdirectory subDir inside the "output" directory within the current working directory. A plot showing the average silhouette width against the number of clusters (2 to nClust) is saved as a .png file in the subdirectory subDir inside the "plot" directory within the current working directory. "output" and/or "plot" directories are created in the current working directory if not present already. Similarly, if subDir is specified, a subdirectory with the name subDir is created within both output/ and plot/ if not already present, and the outputs are saved in that subdirectory. If a subdirectory is not specified (i.e. missing subDir), then the output .txt file is saved in output/ and the plot is saved in plot/.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
clustSimFunc(
  dfData,
  nClust,
  envir = .GlobalEnv,
  ylimPlot = NULL,
  subDir,
  main = NULL,
  width = 1200,
  height = 600,
  res = 125,
  ...
)

Arguments

dfData

a data.frame object or a character string indicating the name of the data.frame object.

nClust

a number indicating the number of clusters upto which the clustering is to be tested starting from number of clusters = 2.

envir

a variable indicating the environment where the output data.frame object should be saved.

ylimPlot

a numeric vector containing two values indicating the lower and upper limits of the y-axis.

subDir

a character string indicating the name of the subdirectory within "output" and "plot" directories to save the output data.frame object (as a .txt file) and plot (as a .png file) respectively. If a subdirectory with the given name does not exist within output and/or plot, then it is created. If not specified, the outputs are saved in output/ and plot/.

main

a character string (default: NULL) indicating an overall title for the plot.

width

a number (default: 1200) indicating the width of the output plot.

height

a number (default: 600) indicating the height of the output plot.

res

a number (default: 125) indicating the resolution of the output plot.

Value

clustSimFunc calculates the silhouette values ​​for the number of "clusters" n in range 2 to nClust (maximum 10), both inclusive, which are obtained for the data in the data.frame object dfData using the Gower clustering algorithm from the function daisy from package cluster (cluster::daisy). It saves the silhouette values ​​of 2 to 10 clusters in a .txt file saved in the "output" directory in the current working directory. It also creates a plot showing the average silhouette width against the number of clusters (2 to nClust)considered for clustering and saves it as a .png file in the subDir subdirectory within the directory "plot" inside the current working directory. It creates "output" and/or "plot" directories in the current working directory if not present already. Similarly, if subDir is specified, it creates a subdirectory with the name subDir within both output/ and plot/ if not already present, and saves the outputs in the respective subdirectories. If a subdirectory is not specified (i.e. missing subDir), then it saves the output .txt file in output/ and the plot in plot/. It also saves the output data.frame object in the ". GlobalEnv" environment.

Examples

1
2
3
4
5
tab1 = xlsx::read.xlsx("./sample-data.xlsx",sheetName = "data")
tab1Vars <- c("i..id" , "age" ,"area" , "paddArea" , "paddyFld" , "date")
tab1Var <- selectExclude(tab1,tab1Vars)
clustSimFunc(tab1Var,4)
clustSimFunc(tab1Var,4,,c(0,0.5))

lwTools/agriTrf documentation built on March 26, 2020, 12:09 a.m.