# DescTools-package: Tools for Descriptive Statistics and Exploratory Data... In DescTools: Tools for Descriptive Statistics

## Description

DescTools is an extensive collection of miscellaneous basic statistics functions and comfort wrappers not available in the R basic system for efficient description of data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel.
A considerable part of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, `NA` handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'CamelStyle' was consequently applied to functions borrowed from contributed R packages as well.

Feedback, feature requests, bugreports and other suggestions are welcome! Please report problems to Stack Overflow mentioning DescTools or directly to the maintainer.

## Details

A grouped list of the functions:

 Operators, calculus, transformations: %()% Between operators determine if a value lies within a range [a,b] %)(% Outside operators: %)(%, %](%, %)[%, %][% %nin% "not in" operator %overlaps% Do two collections have common elements? %like%, %like any% Simple operator to search for a specified pattern %^% Calculate powers of matrices Interval Calculate the number of days of the overlapping part of two date periods AUC Calculate area under the curve Primes Find all primes less than n Factorize Prime factorization of integers GCD Calculate the greatest common divisor LCM Calculate the least common multiple Permn Determine all possible permutations of a set Fibonacci Generates single Fibonacci numbers or a Fibonacci sequence DigitSum Digit sum of a number Frac Return the fractional part of a numeric value Ndec Count decimal places of a number MaxDigits Maximum used digits for a vector of numbers Prec Precision of a number BoxCox, BoxCoxInv Box Cox transformation and its inverse transformation BoxCoxLambda Return the optimal lambda for a BoxCox transformation LogSt, LogStInv Calculate started logarithmic transformation and it's inverse Logit, LogitInv Generalized logit and inverse logit function LinScale Simple linear scaling of a vector x Winsorize Data cleaning by winsorization Trim Trim data by omitting outlying observations CutQ Cut a numeric variable into quartiles or other quantiles Recode Recode a factor with altered levels Rename Change name(s) of a named object Sort Sort extension for matrices and data.frames SortMixed, OrderMixed Mixed sort order DenseRank Calculate ranks in consecutive order (no ties) PercentRank Calculate the percent rank RoundTo Round to a multiple Large, Small Returns the kth largest, resp. smallest values HighLow Combines `Large` and `Small`. Rev Reverses the order of rows and/or columns of a matrix or a data.frame Untable Recreates original list based on a n-dimensional frequency table CollapseTable Collapse some rows/columns in a table. Dummy Generate dummy codes for a factor FisherZ, FisherZInv Fisher's z-transformation and its inverse Midx Calculate sequentially the midpoints of the elements of a vector UnitConv Return the most common unit conversions Unwhich Inverse function to `which`, create a logical vector/matrix from indices Vigenere Implements a Vigenere cypher, both encryption and decryption BinTree, PlotBinTree Create and plot a binary tree structure with a given length Information and manipulation functions: AllDuplicated Find all values involved in ties Closest Return the value in a vector being closest to a given one Coalesce Return the first value in a vector not being `NA` ZeroIfNA, NAIfZero Replace NAs by 0, resp. vice versa Impute Replace NAs by the median or another value LOCF Imputation of datapoints following the "last observation carried forward" rule CombN Returns the number of subsets out of a list of elements CombSet Generates all possible subsets out of a list of elements CombPairs Generates all pairs out of one or two sets of elements SampleTwins Create sample using stratifying groups RndPairs Create pairs of correlated random numbers RndWord Produce random combinations of characters IsNumeric Check a vector for being numeric, zero Or a whole number IsWhole Is x a whole number? IsDichotomous Check if x contains exactly 2 values IsOdd Is x even or odd? IsPrime Is x a prime number? IsZero Is numeric(x) == 0, say x < machine.eps? IsEuclid Check if a distance matrix is euclidean Label, Unit Get or set the `label`, resp. `unit`, attribute of an object Abind Bind matrices to n-dimensional arrays Append Append elements to several classes of objects VecRot, VecShift Shift the elements of a vector in a circular mode to the right or to the left by n characters. Clockwise Transform angles from counter clock into clockwise mode split.formula A formula interface for the base function split reorder.factor Reorder the levels of a factor Lookup Simple lookup if merge seems cumbersome ToLong, ToWide Simple reshaping of a vector SetNames Set the names, rownames or columnnames in an object and return it Some Return some randomly chosen elements of an object SplitAt Split a vector into several pieces at given positions SplitPath Split a path string in drive, path, filename Str Compactly display the structure of any R object TextToTable Converts a string to a table String functions: StrCountW Count the words in a string StrTrim Delete white spaces from a string StrTrunc Truncate string on a given length and add ellipses if it really was truncated StrLeft, StrRight Returns the left/right part or the a string. StrAlign Align strings to the left/right/center or to a given character StrAbbr Abbreviates a string StrCap Capitalize the first letter of a string StrPad Fill a string with defined characters to fit a given length StrRev Reverse a string StrChop Split a string by a fixed number of characters. StrExtract Extract a part of a string, defined as regular expression. StrVal Extract numeric values from a string StrIsNumeric Check whether a string does only contain numeric data StrPos Find position of first occurrence of a string in another one StrDist Compute Levenshtein or Hamming distance between strings FixToTable Create table out of a running text, by using columns of spaces as delimiter Conversion functions: AscToChar, CharToAsc Converts ASCII codes to characters and vice versa DecToBin, BinToDec Converts numbers from binmode to decimal and vice versa DecToHex, HexToDec Converts numbers from hexmode to decimal and vice versa DecToOct, OctToDec Converts numbers from octmode to decimal and vice versa DegToRad, RadToDeg Convert degrees to radians and vice versa CartToPol, PolToCart Transform cartesian to polar coordinates and vice versa CartToSph, SphToCart Transform cartesian to spherical coordinates and vice versa RomanToInt Convert roman numerals to integers RgbToLong, LongToRgb Convert a rgb color to a long number and vice versa ColToGray, ColToGrey Convert colors to grey/grayscale Colors: SetAlpha Add transperancy (alpha channel) to a color. ColorDlg Display the system's color dialog to choose a color ColPicker Display R colors in a dialog including a locator for selection ColorLegend Add a color legend to a plot ColToGray, ColToGrey Convert colors to gcrey/grayscale ColToHex, HexToCol Convert a color into hex string HexToRgb Convert a hexnumber to an RGB-color ColToHsv R color to HSV conversion ColToRgb, RgbToCol Color to RGB conversion and back FindColor Get color on a defined color range MixColor Get the mix of two colors TextContrastColor Choose textcolor depending on background color Pal Some custom color palettes Plots: Canvas Canvas for geometric plotting Mar Set margins more comfortably. Asp Return aspect ratio of the current plot LineToUser Convert line coordinates to user coordinates lines.loess Add a loess smoother and its CIs to an existing plot lines.lm Add the prediction of linear model and its CIs to a plot lines.smooth.spline Add the prediction of a smooth.spline and its CIs to a plot BubbleLegend Add a legend for bubbles to a bubble plot TitleRect Add a main title to a plot surrounded by a rectangular box BarText Add the value labels to a barplot ErrBars Add horizontal or vertical error bars to an existing plot DrawArc, DrawRegPolygon Draw elliptic, circular arc(s) or regular polygon(s) DrawCircle, DrawEllipse Draw a circle, a circle annulus or a sector or an annulus DrawBezier Draw a Bezier curve DrawBand Draw confidence band BoxedText Add text surrounded by a box to a plot Rotate Rotate a geometric structure SpreadOut Spread out a vector of numbers so that there is a minimum interval between any two elements. This can be used to place textlabels in a plot so that they do not overlap. IdentifyA Helps identifying all the points in a specific area. identify.formula Formula interface for `identify`. PtInPoly Identify all the points within a polygon. ConnLines Calculate and insert connecting lines in a barplot AxisBreak Place a break mark on an axis PlotACF, PlotGACF Create a combined plot of a time series including its autocorrelation and partial autocorrelation PlotMonth Plot seasonal effects of a univariate time series PlotArea Create an area plot PlotBag Create a two-dimensional boxplot PlotBagPairs Produce pairwise 2-dimensional boxplots (bagplot) PlotBubble Draw a bubble plot PlotCandlestick Plot candlestick chart PlotCirc Create a circular plot PlotCorr Plot a correlation matrix PlotDot Plot a dotchart with confidence intervals PlotFaces Produce a plot of Chernoff faces PlotFdist Frequency distribution plot, combination of histogram, boxplot and ecdf.plot PlotMarDens Scatterplot with marginal densities PlotMultiDens Plot multiple density curves PlotPolar Plot values on a circular grid PlotFun Plot mathematical expression or a function PolarGrid Plot a grid in polar coordinates PlotPyramid Pyramid plot (back-back histogram) PlotTreemap Plot of a treemap. PlotVenn Plot a Venn diagram PlotViolin Plot violins instead of boxplots PlotQQ QQ-plot for an optional distribution PlotWeb Create a web plot PlotTernary Create a triangle or ternary plot PlotMiss Plot missing values PlotDev Simple convenience wrapper for producing TIF-Files PlotECDF Plot empirical cumulative distribution function PlotLinesA Plot the columns of one matrix against the columns of another PlotLog Create a plot with logarithmic axis and log grid PlotMosaic Plots a mosaic describing a contingency table in array form Shade Produce a shaded curve Stamp Stamp the current plot with Date/Time/Directory or any other expression Distributions: _Benf Benford distribution, including qBenf, dBenf, rBenf _ExtrVal Extreme value distribution (dExtrVal) _Frechet Frechet distribution (dFrechet) _GenExtrVal Generalized Extreme Value Distribution (dGenExtrVal) _GenPareto Generalized Pareto Distribution (dGenPareto) _Gompertz Gompertz distribution (dGompertz) _Gumbel Gumbel distribution (dGumbel) _NegWeibull Negative Weibull distribution (dNegWeibull) _Order Distributions of Order Statistics (dOrder) _RevGumbel Reverse Gumbel distribution (dRevGumbel), _RevGumbelExp Expontial reverse Gumbel distribution (quantile only) _RevWeibull Reverse Weibull distribution (dRevWeibull) Statistics: Freq Univariate frequency table PercTable Bivariate percentage table Margins (Extended) margin tables of a table ExpFreq Expected frequencies of a n-dimensional table Mode Mode, the most frequent value Gmean, Gsd Geometric mean and geometric standard deviation Hmean Harmonic Mean Median Extended median function supporting weights and ordered factors HuberM, TukeyBiweight Huber M-estimator of location and Tukey's biweight robust mean HodgesLehmann the Hodges-Lehmann estimator HoeffD Hoeffding's D statistic MeanSE Standard error of mean MeanCI, MedianCI Confidence interval for the mean and median MeanDiffCI Confidence interval for the difference of two means MoveAvg Moving average MeanAD Mean absolute deviation VarCI Confidence interval for the variance CoefVar Coefficient of variation and its confidence interval RobScale Robust data standardization Range (Robust) range BinomCI, MultinomCI Confidence intervals for binomial and multinomial proportions BinomDiffCI Calculate confidence interval for a risk difference BinomRatioCI Calculate confidence interval for the ratio of binomial proportions. PoissonCI Confidence interval for a Poisson lambda Skew, Kurt Skewness and kurtosis YuleQ, YuleY Yule's Q and Yule's Y TschuprowT Tschuprow's T Phi, ContCoef, CramerV Phi, Pearson's Contingency Coefficient and Cramer's V GoodmanKruskalGamma Goodman Kruskal's gamma KendallTauA Kendall's tau-a KendallTauB Kendall's tau-b StuartTauC Stuart's tau-c SomersDelta Somers' delta Lambda Goodman Kruskal's lambda GoodmanKruskalTau Goodman Kruskal's tau UncertCoef Uncertainty coefficient Entropy, MutInf Shannon's entropy, mutual information DivCoef, DivCoefMax Rao's diversity coefficient ("quadratic entropy") TheilU Theil's U1 and U2 coefficient Assocs Combines the association measures above. OddsRatio, RelRisk Odds ratio and relative risk ORToRelRisk Transform odds ratio to relative risk CohenKappa, KappaM Cohen's Kappa, weighted Kappa and Kappa for more than 2 raters CronbachAlpha Cronbach's alpha ICC Intraclass correlations KrippAlpha Return Kripp's alpha coefficient KendallW Compute the Kendall coefficient of concordance Lc Calculate and plot Lorenz curve Gini, Atkinson Gini- and Atkinson coefficient Herfindahl, Rosenbluth Herfindahl- and Rosenbluth coefficient GiniSimpson Compute Gini-Simpson Coefficient CorCI Confidence interval for Pearson's correlation coefficient CorPart Find the correlations for a set x of variables with set y removed CorPolychor Polychoric correlation coefficient SpearmanRho Spearman rank correlation and its confidence intervals ConDisPairs Return concordant and discordant pairs of two vectors FindCorr Determine highly correlated variables CohenD Cohen's Effect Size EtaSq Effect size calculations for ANOVAs Contrasts Generate pairwise contrasts for using in a post-hoc test Strata Stratified sampling with equal/unequal probabilities Outlier Outliers following Tukey's boxplot definition LOF Local outlier factor BrierScore Brier score, assessing the quality of predictions of binary events Cstat C statistic, equivalent to the area under the ROC curve) CCC Lin's concordance correlation coef for agreement on a continuous measure MAE, MAPE Mean absolute error and ean absolute percentage error MSE, RMSE Mean squared error and root mean squared error NMAE, NMSE Normalized mean absolute and mean squared error Conf Confusion matrix, a cross-tabulation of observed and predicted classes with associated statistics Sens, Spec Sensitivity and specificity PseudoR2 Variants of pseudo R squared statistics: McFadden, Aldrich-Nelson, Nagelkerke, CoxSnell, Effron, McKelvey-Zavoina, Tjur Mean, SD, Var Variants of base statistics, allowing to define weights: Mean, Quantile, MAD, Cor standard deviation, variance, quantile, mad, correlation VIF, StdCoef Variance inflationary factors and standardised coefficents for linear models Tests: SignTest Signtest to test whether two groups are equally sized ZTest Z--test for known population variance JonckheereTerpstraTest Jonckheere-Terpstra trend test for medians PageTest Page test for ordered alternatives CochranQTest Cochran's Q-test to find differences in matched sets of three or more frequencies or proportions. SiegelTukeyTest Siegel-Tukey test for equality in variability SiegelTukeyRank Calculate Siegel-Tukey's ranks (auxiliary function) LeveneTest Levene's test for homogeneity of variance MosesTest Moses Test of extreme reactions RunsTest Runs test for detecting non-randomness DurbinWatsonTest Durbin-Watson test for autocorrelation BartelsRankTest Bartels rank test for randomness JarqueBeraTest Jarque-Bera Test for normality AndersonDarlingTest Anderson-Darling test for normality CramerVonMisesTest Cramer-von Mises test for normality LillieTest Lilliefors (Kolmogorov-Smirnov) test for normality PearsonTest Pearson chi-square test for normality ShapiroFranciaTest Shapiro-Francia test for normality MHChisqTest Mantel-Haenszel Chisquare test StuartMaxwellTest Stuart-Maxwell marginal homogeneity test LehmacherTest Lehmacher marginal homogeneity test CochranArmitageTest Cochran-Armitage test for trend in binomial proportions BreslowDayTest, WoolfTest Test for homogeneity on 2x2xk tables over strata PostHocTest Post hoc tests by Scheffe, LSD, Tukey for a aov-object ScheffeTest Multiple comparisons Scheffe test DunnTest Dunn's test of multiple comparisons DunnettTest Dunnett's test of multiple comparisons HotellingsT2Test Hotelling's T2 test for the one and two sample case YuenTTest Yuen's robust t-Test with trimmed means and winsorized variances BarnardTest Barnard's test for 2x2 tables BreuschGodfreyTest Breusch-Godfrey test for higher-order serial correlation. ConoverTest Conover's test of multiple comparisons (following a kruskal test) GTest Chi-squared contingency table test and goodness-of-fit test HosmerLemeshowTest Hosmer-Lemeshow goodness of fit tests NemenyiTest Nemenyi's test of multiple comparisons TTestA Student's t-test based on sample statistics VarTest ChiSquare test for one variance and F test for two variances VonNeumannTest Von Neumann's successive difference test Date functions: day.name, day.abb Defined names of the days AddMonths, AddMonthsYM Add a number of months to a given date IsDate Check whether x is a date object IsWeekend Check whether x falls on a weekend IsLeapYear Check whether x is a leap year LastDayOfMonth Return the last day of the month of the date x DiffDays360 Calculate the difference of two dates using the 360-days system Date Create a date from numeric representation of year, month, day Day, Month, Year Extract part of a date Hour, Minute, Second Extract part of time Week, Weekday Returns ISO week and weekday of a date Quarter Quarter of a date Timezone Timezone of a POSIXct/POSIXlt date YearDay, YearMonth The day in the year of a date Now, Today Get current date or date-time HmsToSec, SecToHms Convert h:m:s times to seconds and vice versa Overlap Determine if and how extensively two date ranges overlap Zodiac The zodiac sign of a date :-) Finance functions: OPR One period returns (simple and log returns) NPV Net present value NPVFixBond Net present value for fix bonds IRR Internal rate of return YTM Return yield to maturity for a bond GUI-Helpers: FileOpenCmd Get path of a data file to be opened ImportFileDlg Dialog for importing SPSS, Stata, SAS, Minitab or Systat files SaveAsDlg Save a data object by dialog ModelDlg Helps to compose a model formula in a dialog SelectVarDlg Select elements of a set by click PasswordDlg Display a dialog containing an edit field, showing only ***. PlotPar Display the R plot parameters in a dialog PlotPch Plot point characters for information Xplore A breeze of interactive plotting Reporting, InOut: CatTable Print a table with the option to have controlled linebreaks Format, Fmt Easy format for numbers and dates Desc Produce a rich description of an object Abstract Display compact overview of the structure of a data frame TMod Create comparison table for (general) linear models TOne Create "Table One"" describing baseline characteristics GetNewWrd, GetNewXL, GetNewPP Create a new Word, Excel or PowerPoint Instance GetCurrWrd, GetCurrXL, GetCurrPP Get a handle to a running Word, Excel or PowerPoint instance WrdKill, XLKill Ends a (possibly hidden) Word/Excel process IsValidWrd Check if the handle to a Word instance is valid or outdated WrdCaption Insert a title in Word WrdFont Get and set the font for the current selection in Word WrdParagraphFormat Get and set the paragraph format WrdTable Create a table in Word WrdCellRange Select a cell range of a table in Word WrdMergeCells Merge cells of a table in Word WrdFormatCells Format selected cells of a table in word WrdTableBorders Set or edit table border style of a table in Word ToWrd Mord flexible wrapper to send diverse objects to Word WrdPlot Insert the active plot to Word WrdInsertBookmark Insert a new bookmark in a Word document WrdGoto Place cursor to a specific bookmark, or another text position. WrdUpdateBookmark Update the text of a bookmark's range WrdStyle Get and set the style of a paragraph in Word XLDateToPOSIXct Convert XL-Date format to POSIXct format XLGetRange Get the values of one or several cell range(s) in Excel XLGetWorkbook Get the values of all sheets of an Excel workbook XLView Use Excel as viewer for a data.frame PpPlot Insert active plot to PowerPoint PpAddSlide Adds a slide to a PowerPoint presentation PpText Adds a textbox with text to a PP-presentation ParseSASDatalines Parse a SAS "datalines" statement to read data Tools: PairApply Helper for calculating functions pairwise LsFct, LsObj List the functions (or the data, all objects) of a package FctArgs Retrieve the arguments of a functions InDots Check if an argument is contained in ... argument and return it's value ParseFormula Parse a formula and return the splitted parts of if Recycle Recycle a list of elements to the maximal found dimension Keywords Get the keywords of a man page SysInfo Get some more information about system and environment DescToolsOptions Get the DescTools specific options PDFManual Get the pdf-manual of any package on CRAN and open it Data: d.pizza Synthetic dataset created for testing the description d.whisky Classification of Scotch Single Malts d.units, d.prefix Unit Conversion d.periodic Periodic Table of Elements d.countries ISO 3166-1 Country Codes roulette, cards, tarot Dataset for Probabilistic Simulation

## Warning

This package is still under development. Although the code seems meanwhile quite stable, until release of version 1.0 (which is expected in, ... ok, I think we're ready: fall 2018?) you should be aware that everything in the package might be subject to change. Backward compatibility is not yet guaranteed. Functions may be deleted or renamed and new syntax may be inconsistent with earlier versions. By release of version 1.0 the "deprecated-defunct process" will be installed.

## MS-Office

To make use of MS-Office features you must have Office in one of its variants installed. All Wrd*, XL* and Pp* functions require as well the package RDCOMClient to be installed. Hence the use of these functions is restricted to Windows systems. RDCOMClient is available at:
http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.2/ and can be installed with
`install.packages("RDCOMClient", repos="http://www.stats.ox.ac.uk/pub/RWin")`
or if not available there in
`install.packages("RDCOMClient", repos="http://www.omegahat.net/R")`
RDCOMClient does not exist for Mac or Linux, sorry.

## Author(s)

Andri Signorell
Helsana Versicherungen AG, Health Sciences, Zurich

Includes R source code and/or documentation previously published by (in alphabetical order):
Ken Aho, Nanina Anderegg, Tomas Aragon, Antti Arppe, Adrian Baddeley, Kamil Barton, Ben Bolker, Frederico Caeiro, Stephane Champely, Daniel Chessel, Leanne Chhay, Clint Cummins, Michael Dewey, Harold C. Doran, Stephane Dray, Charles Dupont, Dirk Eddelbuettel, Jeff Enos, Claus Ekstrom, Martin Elff, Kamil Erguler, Richard W. Farebrother, John Fox, Romain Francois, Michael Friendly, Tal Galili, Matthias Gamer, Joseph L. Gastwirth, Yulia R. Gel, Juergen Gross, Gabor Grothendieck, Frank E. Harrell Jr, Richard Heiberger, Michael Hoehle, Christian W. Hoffmann, Torsten Hothorn, Markus Huerzeler, Wallace W. Hui, Pete Hurd, Rob J. Hyndman, Pablo J. Villacorta Iglesias, Christopher Jackson, Matthias Kohl, Mikko Korpela, Max Kuhn, Detlew Labes, Friederich Leisch, Jim Lemon, Dong Li, Martin Maechler, Arni Magnusson, Daniel Malter, George Marsaglia, John Marsaglia, Alina Matei, David Meyer, Weiwen Miao, Giovanni Millo, Yongyi Min, David Mitchell, Markus Naepflin, Daniel Navarro, Henric Nilsson, Klaus Nordhausen, Derek Ogle, Hong Ooi, Nick Parsons, Sandrine Pavoine, Tony Plate, Roland Rapold, William Revelle, Tyler Rinker, Brian D. Ripley, Caroline Rodriguez, Nathan Russell, Venkatraman E. Seshan, Greg Snow, Michael Smithson, Werner A. Stahel, Alec Stephenson, Mark Stevenson, Terry Therneau, Yves Tille, Adrian Trapletti, Kevin Ushey, Jeremy VanDerWal, Bill Venables, John Verzani, Gregory R. Warnes, Stefan Wellek, Hadley Wickham, Rand R. Wilcox, Peter Wolf, Daniel Wollschlaeger, Thomas Yee, Achim Zeileis

Special thanks go to Beat Bruengger, Mathias Frueh, Daniel Wollschlaeger for their valuable contributions and testing.

The good things come from all these guys, any problems are likely due to my tweaking. Thank you all!

Maintainer: Andri Signorell <[email protected]>

## Examples

 ```1 2 3 4 5 6 7``` ```# ****************************************************** # There are no examples defined here. But see the demos: # # demo(describe) # demo(plots)) # # ****************************************************** ```

### Example output

```
```

DescTools documentation built on Aug. 14, 2018, 5:05 p.m.