knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(pdqr) set.seed(101)
Package 'pdqr' supports two types of distributions:
Note that all distributions assume finite support (output values are bounded from below and above) and finite values of density function (density function in case of "continuous" type can't go to infinity).
All new_*()
functions create a pdqr-function of certain type ("discrete" or "continuous") based on sample or data frame of appropriate structure:
density()
function if input has at least 2 elements. For 1 element special "dirac-like" pdqr-function is created: an approximation single number with triangular distribution of very narrow support (1e-8 of magnitude). Basically, sample input is converted into data frame of appropriate structure that defines distribution (see next list item).We will use the following data frame inputs in examples:
# For type "discrete" dis_df <- data.frame(x = 1:4, prob = 4:1 / 10) # For type "continuous" con_df <- data.frame(x = 1:4, y = c(0, 1, 1, 1))
This vignette is organized as follows:
density()
arguments" describes how to use density()
arguments to tweak smoothing during creation of "continuous" pdqr-functions.P-function (analogue of p*()
functions in base R) represents a cumulative distribution function of distribution.
# Treating input as discrete p_mpg_dis <- new_p(mtcars$mpg, type = "discrete") p_mpg_dis # Treating input as continuous p_mpg_con <- new_p(mtcars$mpg, type = "continuous") p_mpg_con # Outputs are actually vectorized functions p_mpg_dis(15:20) p_mpg_con(15:20) # You can plot them directly using base `plot()` and `lines()` plot(p_mpg_con, main = "P-functions from sample") lines(p_mpg_dis, col = "blue")
p_df_dis <- new_p(dis_df, type = "discrete") p_df_dis p_df_con <- new_p(con_df, type = "continuous") p_df_con plot(p_df_con, main = "P-functions from data frame") lines(p_df_dis, col = "blue")
D-function (analogue of d*()
functions in base R) represents a probability mass function for "discrete" type and density function for "continuous":
# Treating input as discrete d_mpg_dis <- new_d(mtcars$mpg, type = "discrete") d_mpg_dis # Treating input as continuous d_mpg_con <- new_d(mtcars$mpg, type = "continuous") d_mpg_con # Outputs are actually vectorized functions d_mpg_dis(15:20) d_mpg_con(15:20) # You can plot them directly using base `plot()` and `lines()` op <- par(mfrow = c(1, 2)) plot(d_mpg_con, main = '"continuous" d-function\nfrom sample') plot(d_mpg_dis, main = '"discrete" d-function\nfrom sample', col = "blue") par(op)
d_df_dis <- new_d(dis_df, type = "discrete") d_df_dis d_df_con <- new_d(con_df, type = "continuous") d_df_con op <- par(mfrow = c(1, 2)) plot(d_df_con, main = '"continuous" d-function\nfrom data frame') plot(d_df_dis, main = '"discrete" d-function\nfrom data frame', col = "blue") par(op)
Q-function (analogue of q*()
functions in base R) represents a quantile function, an inverse of corresponding p-function:
# Treating input as discrete q_mpg_dis <- new_q(mtcars$mpg, type = "discrete") q_mpg_dis # Treating input as continuous q_mpg_con <- new_q(mtcars$mpg, type = "continuous") q_mpg_con # Outputs are actually vectorized functions q_mpg_dis(c(0.1, 0.3, 0.7, 1.5)) q_mpg_con(c(0.1, 0.3, 0.7, 1.5)) # You can plot them directly using base `plot()` and `lines()` plot(q_mpg_con, main = "Q-functions from sample") lines(q_mpg_dis, col = "blue")
q_df_dis <- new_q(dis_df, type = "discrete") q_df_dis q_df_con <- new_q(con_df, type = "continuous") q_df_con plot(q_df_con, main = "Q-functions from data frame") lines(q_df_dis, col = "blue")
R-function (analogue of r*()
functions in base R) represents a random generation function. For "discrete" type it will generate only values present in input. For "continuous" function it will generate values from distribution corresponding to one estimated with density()
.
# Treating input as discrete r_mpg_dis <- new_r(mtcars$mpg, type = "discrete") r_mpg_dis # Treating input as continuous r_mpg_con <- new_r(mtcars$mpg, type = "continuous") r_mpg_con # Outputs are actually functions r_mpg_dis(5) r_mpg_con(5) # You can plot them directly using base `plot()` and `lines()` op <- par(mfrow = c(1, 2)) plot(r_mpg_con, main = '"continuous" r-function\nfrom sample') plot(r_mpg_dis, main = '"discrete" r-function\nfrom sample', col = "blue") par(op)
r_df_dis <- new_r(dis_df, type = "discrete") r_df_dis r_df_con <- new_r(con_df, type = "continuous") r_df_con op <- par(mfrow = c(1, 2)) plot(r_df_con, main = '"continuous" r-function\nfrom data frame') plot(r_df_dis, main = '"discrete" r-function\nfrom data frame', col = "blue") par(op)
When creating "continuous" pdqr-function with new_*()
from single number, a special "dirac-like" pdqr-function is created. It is an approximation of single number with triangular distribution of very narrow support (1e-8 of magnitude):
r_dirac <- new_r(3.14, type = "continuous") r_dirac r_dirac(4) # Outputs aren't exactly but approximately equal dput(r_dirac(4))
Boolean pdqr-function is a special case of "discrete" function, which values are exactly 0 and 1. Those functions are usually created after transformations involving logical operators (see vignette on transformation for more details). It is assumed that 0 represents that some expression is false, and 1 is for being true. Corresponding probabilities describe distribution of expression's logical values. The only difference from other "discrete" pdqr-functions is in more detailed printing.
new_d(data.frame(x = c(0, 1), prob = c(0.25, 0.75)), type = "discrete")
density()
argumentsWhen creating pdqr-function of "continuous" type, density()
is used to estimate density. To tweak its performance, supply its extra arguments directly to new_*()
functions. Here are some examples:
plot( new_d(mtcars$mpg, "continuous"), lwd = 3, main = "Examples of `density()` options" ) # Argument `adjust` of `density()` helps to define smoothing bandwidth lines(new_d(mtcars$mpg, "continuous", adj = 0.3), col = "blue") # Argument `n` defines number of points to be used in piecewise-linear # approximation lines(new_d(mtcars$mpg, "continuous", n = 5), col = "green") # Argument `cut` defines the "extending" property of density estimation. # Using `cut = 0` assumes that density can't go outside of input's range lines(new_d(mtcars$mpg, "continuous", cut = 0), col = "magenta")
Every pdqr-function has metadata, information which describes underline distribution and pdqr-function. Family of meta_*()
functions are implemented to extract that information:
meta_x_tbl()
) completely defines distribution. It is a data frame with structure depending on type of pdqr-function:meta_class()
) - class of pdqr-function. This can be one of "p", "d", "q", "r". Represents how pdqr-function describes underlying distribution.meta_type()
) - type of pdqr-function. This can be one of "discrete" or "continuous". Represents type of underlying distribution.meta_support()
) - support of distribution. This is a range of "x" column from "x_tbl" metadata.# Type "discrete" d_dis <- new_d(1:4, type = "discrete") meta_x_tbl(d_dis) meta_class(d_dis) meta_type(d_dis) meta_support(d_dis) # Type "continuous" p_con <- new_p(1:4, type = "continuous") head(meta_x_tbl(p_con)) meta_class(p_con) meta_type(p_con) meta_support(p_con) # Dirac-like "continuous" function r_dirac <- new_r(1, type = "continuous") dput(meta_x_tbl(r_dirac)) dput(meta_support(r_dirac)) # `meta_all()` returns all metadata in a single list meta_all(d_dis)
For more details go to help page of meta_all()
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.