Home

/

CRAN

/

Basic Functionality of UAHDataScienceSC"

Basic Functionality of UAHDataScienceSC"
In UAHDataScienceSC: Learn Supervised Classification Methods Through Examples and Code

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(cli)
library(UAHDataScienceSC)

Introduction

UAHDataScienceSC provides an educational framework for learning supervised classification through hands-on implementation and visualization. The package combines algorithm implementations with interactive learning features, visualization tools, and carefully curated test datasets to facilitate understanding of machine learning concepts.

1. Installing and Loading the Package

If UAHDataScienceSC is on CRAN:

install.packages("UAHDataScienceSC")

Then, load it into your R session:

library(UAHDataScienceSC)

2. Built-in Datasets

The package includes several datasets designed to demonstrate different aspects of machine learning algorithms. Each dataset serves specific educational purposes and highlights particular challenges in data analysis.

Flower Classification

The flower classification dataset (db_flowers) contains measurements of flower characteristics including petal length, petal width, sepal length, and sepal width. These measurements are used to classify flowers into three distinct species (setosa, versicolor, virginica), with additional unknown samples provided for testing purposes. The dataset maintains a balanced distribution of classes, making it particularly suitable for initial classification exercises.

data("db_flowers")
head(db_flowers)

Logic Gate Datasets

The logic gate datasets simulate binary classification problems with varying complexity. These variations help illustrate the capabilities and limitations of different classification algorithms.

AND Gate Dataset

The AND gate dataset (db_per_and) demonstrates basic binary classification with three input variables and a single output that follows logical AND rules. This dataset proves especially useful for understanding perceptron training on linearly separable patterns.

data("db_per_and.rda")
head(db_per_and)

OR Gate Dataset

The OR gate dataset (db_per_or) extends the binary classification concept with OR logic.

data("db_per_or.rda")
head(db_per_or)

XOR Gate Dataset

The XOR gate dataset (db_per_xor) presents a more challenging non-linearly separable problem.

data("db_per_xor.rda")
head(db_per_xor)

Vehicle Classification

The vehicle classification dataset (db2) presents a real-world application scenario combining categorical and numerical features. The dataset uses license types, wheel counts, and passenger capacity to classify vehicles into categories such as cars, motorcycles, bicycles, and trucks. This mixed-type data structure provides practical experience with handling diverse input features.

data(db2)
head(db2)

Extended Vehicle Classification

The extended vehicle dataset (db3) builds upon db2 by introducing additional complexity through new vehicle types and relationships, making it particularly suitable for exploring decision tree depth impacts and algorithm scalability.

data(db3)
head(db3)

Regression Test

The regression test dataset (db1rl) incorporates various mathematical relationships including linear, exponential, logarithmic, and sinusoidal patterns. This diversity allows users to compare the effectiveness of different regression approaches and understand their appropriateness for various data patterns.

data("db1rl")
head(db1rl)

3. Algorithm Implementations

K-Nearest Neighbors (KNN)

The KNN implementation supports various distance calculation methods to accommodate different data types and relationship patterns. The algorithm can employ Euclidean distance for standard numerical data, Manhattan distance for grid-like patterns, cosine similarity for angular relationships, and specialized metrics like Hamming distance for categorical data. The choice of distance method significantly impacts classification results and should be selected based on data characteristics and problem requirements.

result <- knn(
  data = db_flowers,
  ClassLabel = "ClassLabel",
  p1 = c(4.7, 1.2, 5.3, 2.1),
  d_method = "euclidean",
  k = 3
)
print(result)

The interactive learning mode provides step-by-step visualization of the classification process:

result <- knn(
  data = db_flowers,
  ClassLabel = "ClassLabel",
  p1 = c(4.7, 1.2, 5.3, 2.1),
  d_method = "euclidean",
  k = 3,
  learn = TRUE,
  waiting = FALSE
)

Decision Trees

The decision tree implementation offers multiple impurity measures for node splitting decisions. The entropy method bases decisions on information theory principles, while the Gini method considers misclassification probability. The error rate method provides a direct measure of classification accuracy. Each method may produce different tree structures, offering insights into various approaches to data partitioning.

tree <- decision_tree(
  data = db2,
  classy = "VehicleType",
  m = 4,
  method = "gini",
  learn = TRUE
)
print(tree)

Perceptron

The perceptron implementation includes several activation functions to model different decision boundaries. The step function provides basic binary thresholding, while continuous functions like sine, tangent, and ReLU offer smoother transitions. Advanced functions such as GELU and Swish incorporate modern neural network concepts.

weights <- perceptron(
  training_data = db_per_and,
  to_clasify = c(0, 0, 1),
  activation_method = "swish",
  max_iter = 1000,
  learning_rate = 0.1,
  learn = TRUE
)

Regression Analysis

Linear and polynomial regression implementations allow for modeling relationships of varying complexity. The linear regression handles straightforward proportional relationships, while polynomial regression captures more complex patterns through higher-degree equations.

# Linear regression
linear_model <- multivariate_linear_regression(
  data = db1rl,
  learn = TRUE
)

# Polynomial regression
poly_model <- polynomial_regression(
  data = db1rl,
  degree = 4,
  learn = TRUE
)

Any scripts or data that you put into this service are public.

UAHDataScienceSC documentation built on April 3, 2025, 8:58 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

UAHDataScienceSC
Learn Supervised Classification Methods Through Examples and Code

Basic Functionality of UAHDataScienceSC"
In UAHDataScienceSC: Learn Supervised Classification Methods Through Examples and Code

Introduction

1. Installing and Loading the Package

2. Built-in Datasets

Flower Classification

Logic Gate Datasets

AND Gate Dataset

OR Gate Dataset

XOR Gate Dataset

Vehicle Classification

Extended Vehicle Classification

Regression Test

3. Algorithm Implementations

K-Nearest Neighbors (KNN)

Decision Trees

Perceptron

Regression Analysis

Try the UAHDataScienceSC package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

UAHDataScienceSC Learn Supervised Classification Methods Through Examples and Code

Basic Functionality of UAHDataScienceSC" In UAHDataScienceSC: Learn Supervised Classification Methods Through Examples and Code

Introduction

1. Installing and Loading the Package

2. Built-in Datasets

Flower Classification

Logic Gate Datasets

AND Gate Dataset

OR Gate Dataset

XOR Gate Dataset

Vehicle Classification

Extended Vehicle Classification

Regression Test

3. Algorithm Implementations

K-Nearest Neighbors (KNN)

Decision Trees

Perceptron

Regression Analysis

Try the UAHDataScienceSC package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

UAHDataScienceSC
Learn Supervised Classification Methods Through Examples and Code

Basic Functionality of UAHDataScienceSC"
In UAHDataScienceSC: Learn Supervised Classification Methods Through Examples and Code