scoreHVT | R Documentation |
This function scores each data point in the test dataset based on a trained hierarchical Voronoi tessellations model.
scoreHVT(
dataset,
hvt.results.model,
child.level = 1,
mad.threshold = 0.2,
line.width = 0.6,
color.vec = c("navyblue", "slateblue", "lavender"),
normalize = TRUE,
distance_metric = "L1_Norm",
error_metric = "max",
yVar = NULL,
analysis.plots = FALSE,
names.column = NULL
)
dataset |
Data frame. A data frame which to be scored. Can have categorical columns if 'analysis.plots' are required. |
hvt.results.model |
List. A list obtained from the trainHVT function |
child.level |
Numeric. A number indicating the depth for which the heat map is to be plotted. |
mad.threshold |
Numeric. A numeric value indicating the permissible Mean Absolute Deviation. |
line.width |
Vector. A vector indicating the line widths of the tessellation boundaries for each layer. |
color.vec |
Vector. A vector indicating the colors of the tessellation boundaries at each layer. |
normalize |
Logical. A logical value indicating if the dataset should be normalized. When set to TRUE, the data (testing dataset) is standardized by ‘mean’ and ‘sd’ of the training dataset referred from the trainHVT(). When set to FALSE, the data is used as such without any changes. |
distance_metric |
Character. The distance metric can be L1_Norm(Manhattan) or L2_Norm(Eucledian). L1_Norm is selected by default. The distance metric is used to calculate the distance between an n dimensional point and centroid. The distance metric can be different from the one used during training. |
error_metric |
Character. The error metric can be mean or max. max is selected by default. max will return the max of m values and mean will take mean of m values where each value is a distance between a point and centroid of the cell. |
yVar |
Character. A character or a vector representing the name of the dependent variable(s) |
analysis.plots |
Logical. A logical value indicating that the scored plot should be plotted or not. If TRUE, the identifier column(character column) name should be supplied in 'names.column' argument. The output will be a 2D heatmap plotly which gives info on the cell id and the observations of a cell. |
names.column |
Character. A character or a vector representing the name of the identifier column/character column. |
Dataframe containing scored data, plots and summary
Shubhra Prakash <shubhra.prakash@mu-sigma.com>, Sangeet Moy Das <sangeet.das@mu-sigma.com> , Vishwavani <vishwavani@mu-sigma.com>
trainHVT
plotHVT
data("EuStockMarkets")
dataset <- data.frame(date = as.numeric(time(EuStockMarkets)),
DAX = EuStockMarkets[, "DAX"],
SMI = EuStockMarkets[, "SMI"],
CAC = EuStockMarkets[, "CAC"],
FTSE = EuStockMarkets[, "FTSE"])
rownames(EuStockMarkets) <- dataset$date
# Split in train and test
train <- EuStockMarkets[1:1302, ]
test <- EuStockMarkets[1303:1860, ]
#model training
hvt.results<- trainHVT(train,n_cells = 60, depth = 1, quant.err = 0.1,
distance_metric = "L1_Norm", error_metric = "max",
normalize = TRUE,quant_method = "kmeans")
scoring <- scoreHVT(test, hvt.results)
data_scored <- scoring$scoredPredictedData
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.