factor_sim: Compute Similarity Matrix for Factors in a Data Frame

View source: R/knn_weights.R

factor_simR Documentation

Compute Similarity Matrix for Factors in a Data Frame

Description

Calculate the similarity matrix for a set of factors in a data frame using various similarity methods.

Usage

factor_sim(des, method = c("Jaccard", "Rogers", "simple matching", "Dice"))

Arguments

des

A data frame containing factors for which the similarity matrix will be computed.

method

A character vector specifying the method used for computing the similarity. The available methods are:

  • "Jaccard" - Jaccard similarity coefficient

  • "Rogers" - Rogers and Tanimoto similarity coefficient

  • "simple matching" - Simple matching coefficient

  • "Dice" - Dice similarity coefficient

Details

The factor_sim function computes the similarity matrix for a set of factors in a data frame using the chosen method. The function first converts the data frame into a model matrix, then calculates the similarity matrix using the proxy::simil function from the proxy package.

The function supports four similarity methods: Jaccard, Rogers, simple matching, and Dice. The choice of method depends on the specific use case and the desired properties of the similarity measure.

Value

A similarity matrix computed using the specified method for the factors in the data frame.

Examples

# Sample data
des <- data.frame(
  var1 = factor(c("a", "b", "a", "b", "a")),
  var2 = factor(c("c", "c", "d", "d", "d"))
)

# Compute similarity matrix using Jaccard method
sim_jaccard <- factor_sim(des, method = "Jaccard")

# Compute similarity matrix using Dice method
sim_dice <- factor_sim(des, method = "Dice")

bbuchsbaum/neighborweights documentation built on April 29, 2023, 5:34 p.m.