preprocess_describe: Descriptive Statistics
In mlpack: 'Rcpp' Integration for the 'mlpack' Library

preprocess_describe

R Documentation

Descriptive Statistics

Description

A utility for printing descriptive statistics about a dataset. This prints a number of details about a dataset in a tabular format.

Usage

preprocess_describe(
  input,
  dimension = NA,
  population = FALSE,
  precision = NA,
  row_major = FALSE,
  verbose = getOption("mlpack.verbose", FALSE),
  width = NA
)

Arguments

`input`	Matrix containing data (numeric matrix).
`dimension`	Dimension of the data. Use this to specify a dimensio. Default value "0" (integer).
`population`	If specified, the program will calculate statistics assuming the dataset is the population. By default, the program will assume the dataset as a sample. Default value "FALSE" (logical).
`precision`	Precision of the output statistics. Default value "4" (integer).
`row_major`	If specified, the program will calculate statistics across rows, not across columns. (Remember that in mlpack, a column represents a point, so this option is generally not necessary.. Default value "FALSE" (logical).
`verbose`	Display informational messages and the full list of parameters and timers at the end of execution. Default value "getOption("mlpack.verbose", FALSE)" (logical).
`width`	Width of the output table. Default value "8" (integer).

Details

This utility takes a dataset and prints out the descriptive statistics of the data. Descriptive statistics is the discipline of quantitatively describing the main features of a collection of information, or the quantitative description itself. The program does not modify the original file, but instead prints out the statistics to the console. The printed result will look like a table.

Optionally, width and precision of the output can be adjusted by a user using the "width" and "precision" parameters. A user can also select a specific dimension to analyze if there are too many dimensions. The "population" parameter can be specified when the dataset should be considered as a population. Otherwise, the dataset will be considered as a sample.

Author(s)

mlpack developers

Examples

# So, a simple example where we want to print out statistical facts about the
# dataset "X" using the default settings, we could run 

## Not run: 
preprocess_describe(input=X, verbose=TRUE)

## End(Not run)

# If we want to customize the width to 10 and precision to 5 and consider the
# dataset as a population, we could run

## Not run: 
preprocess_describe(input=X, width=10, precision=5, verbose=TRUE)

## End(Not run)

mlpack documentation built on June 8, 2025, 10:47 a.m.