data-normalization: Functions to normalize, transform, measure distance between...

Description Usage Arguments Details Value Examples

Description

dc_cosine is the cosine transformation. dc_logistic is the logistic transformation. dc_zscore is the zscore transformation. dc_dist_canberra computes the Canberra distance between 2 numeric vectors. dc_dist_cosine computes the cosine angle distance between 2 numeric vectors. dc_dist_euclidean compute the Euclidience distance between 2 numeric vectors. dc_dist_pearson compute the Pearson correlation distance between 2 numeric vectors.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
dc_cosine(x, max = 100)

dc_logistic(x, max = 100)

dc_zscore(x)

dc_dist_canberra(x, y)

dc_dist_cosine(x, y)

dc_dist_euclidean(x, y)

dc_dist_pearson(x, y)

dc_trim_outlier(x, fraction = 0.01)

dc_normalize_ptile(x, fraction = 0.01)

get_confidence_interval(x, level = 0.95)

dc_decile_band(x, n = NA)

dc_decile_ptile(x, band_ptile = c(seq(0, 0.95, 0.05)))

dc_rank_ptile(x, level_rank = c(1, 2, 3, 4, seq(5, 100, 5)))

dc_mode(x, na.rm = FALSE)

dc_ceiling(x, digits = 0, na.rm = FALSE)

Arguments

x

A numeric vector

max

A numeric value

y

A numeric vector

fraction

The percentile value (0 to 0.5) to trim out

level

The CI level (0.5 to 1.0) of observations to be measured.

band_ptile

The percentail band (0.0 to 1.0)

level_rank

The rank level (0.0 to 1.0) for calculating percentile

na.rm

A logical value indicating whether NA values should be stripped before the computation proceeds.

digits

similar to rbase::round() which is integer indicating the number of decimal places (round) or significant digits (signif) to be used. Negative values are allowed

Details

dc_ceiling similar to rbase::ceiling() with support decimal round up dc_mode compute the stats mode

dc_rank_ptile add columns with ranked percentiles dc_decile_band add columns with decile bands dc_decile_ptile add columns with decile percentiles

Value

returns a numeric vector after normaliztion or distance between 2 vectors.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
library(dacol)
library(dplyr)

max = 30
dta1 = tibble(x1 = seq(-1.2*max, 1.2*max, length.out = 200),
              x2 = seq(1, max, length.out = 200),
              x3 = sample(200))

dta1 = mutate(dta1,

              # Transformation
              y_cosine   = dc_cosine(x1, max),
              y_logistic = dc_logistic(x2, max),
              y_zcore    = dc_zscore(x2),

              # Distant between 2 vector columns
              y_dist_canb = dc_dist_canberra(x2, x3),
              y_dist_cos  = dc_dist_cosine(x2, y_zcore),
              y_dist_euc  = dc_dist_euclidean(x2, y_zcore),
              y_dist_pear = dc_dist_pearson(x2, y_zcore),

              # Manage outliers
              y_trim = dc_trim_outlier(x3, 0.01),
              y_norm = dc_normalize_ptile(x3, 0.01),

              # Stats measures
              y_mode = dc_mode(x3),
              y_ceil = dc_ceiling(x3, -1),

              # Band segmentation
              y_dec_band1 = dc_decile_band(x3),
              y_dec_band2 = dc_decile_band(x3, c(seq(0, 0.9, 0.1))),
              y_dec_ptile1 = dc_decile_ptile(x3),
              y_dec_ptile2 = dc_decile_ptile(x3, c(seq(0, 0.9, 0.1)))
              )

ldanai/dacol documentation built on May 15, 2020, 5:05 p.m.