getenumCI: Summarizes veris enumerations from verisr objects

Description Usage Arguments Details Value Examples

View source: R/getenumCI.R

Description

This is the primary analysis function for veris. It conducts binomial hypothesistests on veris data to enumerate the frequency of a given enumeration or set of enumerations within a feature. (For example, 'Malware', 'Hacking', etc within 'action').

The 'by' parameter allows enumerating one feature by another, (for example to count the frequency of each action by year).

Usage

1
2
3
getenumCI(veris, enum, by = NULL, na.rm = NULL, unk = FALSE,
  short.names = TRUE, ci.method = c(), ci.level = 0.95,
  round.freq = 5, na = NULL, ...)

Arguments

veris

A verisr object

enum

A veris feature or enumeration to summarize

by

A veris feature or enumeration to group by

na.rm

A boolean of whether to include not applicable in the sample set. This is REQUIRED if enum has a potential value of NA as there is no 'default' method for handling NAs. Instead, it depends on the hypothesis being tested.

unk

A boolean referring whether to include 'unknown' in the sample. The default is 'FALSE' and should rarely be overwritten.

short.names

A boolean identifying whether to use the full enumeration name or just the last section. (i.e. action.hacking.variety.SQLi vs just SQLi.)

ci.method

A confidence interval method to use. Current supported methods are any from binom.confint() or "multinomial". If unsure which to use, use "wilson".

ci.level

A number from 0 to 1 representing the width of the confidence interval. (default = 0.95)

round.freq

An integer indicating how many places to round the frequency value to. (default = 5)

na

DEPRECIATED! Use 'na.rm' parameter.

...

A catch all for functions using arguments from previous versions of getenum.

Details

Unknowns are generally excluded as 'not tested'. If 'NA' is an enumeration in the feature being enumerated, it must be specified with the 'na.rm' parameter as whether NA should be included or not is highly dependent on the hypothesis being tested.

This function accurately enumerates single logical columns, character feature columns, and features spanning multiple logical columns (such as action.*). It cannot enumerate free-form text columns. It accurately calculates the sample size 'n' as the number of rows (independent of the number of enumerations present in the feature).

GetenumCI() can also provide binomial confidence intervals for the enumerations tested within the features. See the parameters for details.

While getenumCI() may work on other types of dataframes, it was designed for verisr dataframes and data.tables. It is not tested nor recommended for any other type.

Value

A data frame summarizing the enumeration

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
tmp <- tempfile(fileext = ".dat")
download.file("https://github.com/vz-risk/VCDB/raw/master/data/verisr/vcdb.dat", tmp, quiet=TRUE)
load(tmp, verbose=TRUE)
chunk <- getenumCI(vcdb, "action.hacking.variety")
chunk
chunk <- getenumCI(vcdb, "action.hacking.variety", by="timeline.incident.year")
chunk
chunk <- getenumCI(vcdb, 
                   "action.hacking.variety", 
                   by="timeline.incident.year") 
reshape2::acast(chunk, by~enum, fill=0)
getenumCI(vcdb, "action")
getenumCI(vcdb, "asset.variety")
getenumCI(vcdb, "asset.assets.variety")
getenumCI(vcdb, "asset.assets.variety", ci.method="wilson")
getenumCI(vcdb, "asset.cloud")
getenumCI(vcdb, "action.social.variety.Phishing")
getenumCI(vcdb, "actor.*.motive", ci.method="wilson", na=FALSE)

vz-risk/verisr documentation built on Dec. 11, 2018, 1:33 a.m.