verisr

This package is to support data analysis within the VERIS framework (http://veriscommunity.net). It is intended to work directly with raw JSON and can be used against the VERIS Community Database (VCDB) found at (http://veriscommunity.net/doku.php?id=public) and (https://github.com/vz-risk/VCDB).

This package has two purposes. First is to convert one or more directories of VERIS (JSON) files into a usable object (in this version it is currently a data.table, but I hope to move to a dplyr object). Second, it offers a set of convenience functions for doing basic information retrieval from the object.

Install it from straight from github:

savetime <- proc.time()
# install devtools from https://github.com/hadley/devtools
devtools::install_github("jayjacobs/verisr")

To begin, load the package and point it at a directory of JSON files storing VERIS data.

library(verisr)
vcdb.dir <- "../VCDB/data/json/"
# may optionally load a custom json schema file.
if (interactive()) { # show progress bar if the session is interactive
  vcdb <- json2veris(vcdb.dir, progressbar=TRUE)
} else {
  vcdb <- json2veris(vcdb.dir)  
}

You can also use a vector of directory names to load files from multiple sources

library(verisr)
data_dirs <- c("../VCDB/data/json", "private_data")
veris <- json2veris(data_dirs)

What json2veris() returns is a plain data.table object, which enables you (the developer) to work directly with the data.

class(vcdb)
dim(vcdb)

There are several convenience functions to get a feel for what's in the current verisr object.

summary(vcdb)
library(ggplot2)
plot(vcdb)

Let's look for a specific variable by getting the data aggregated on a VERIS enumeration. In this case the variety of external actor.

ext.variety <- getenum(vcdb, "actor.external.variety")
print(ext.variety)

You can see this returns the enumeration (enum), the count of that enumeration (x), the sample size (n) of the enumeration class (external actor in this case) and the frequency (freq = x/n). From that, you could create a barplot with ggplot:

gg <- ggplot(ext.variety, aes(x=enum, y=x))
gg <- gg + geom_bar(stat="identity", fill="steelblue")
gg <- gg + coord_flip() + theme_bw()
print(gg)

or use a built-in function to do the same thing (but a little prettier).

print(simplebar(ext.variety, "Variety of Hacking Actions"))

Filters have changed

The way filters are handled are different. The old function of getfilter() has been removed, it would just return a vector of logicals the same length as the verisr object which would indicate which records to use. Since you have the data (the verisr object is just a data.table) and all the enumerations are logical values, it should be trivial to create a filter. For example, to filter on all the incidents with confirmed data loss, and then further filter for hacking vector of web appliation...

# see the docs on data.table for getting columns like this
ddfilter <- vcdb[["attribute.confidentiality.data_disclosure.Yes"]]
webfilter <- vcdb[["action.hacking.vector.Web application"]]
# now we can combine with | or & ("or" and "and" respectively)
# to filter incidents with confirmed data loss and web vector:
ddweb <- ddfilter & webfilter

Since these are just logical vectors now, we can use sum() to see how many matches.

cat("Confirmed data loss events:", sum(ddfilter), "\n")
cat("Hacking vector of web apps:", sum(webfilter), "\n")
cat("Both data loss and web app:", sum(ddweb), "\n")

Special names added to verisr object

Most of the names to query are obvious from the schema. Things like "actor.external.motive" for example is relatively intuitive. But when the verisr object is created there are several more fields dervied from the data to make queries easier. Those are:

If you come across any more that you'd like added, please reach out.

Querying Multiple Enumerations

One rather fun feature of the lastest version is the ability to query for an enumeration as it relates to one or more other enumerations. For example, if you wanted to create a A2 grid, which compares the action categories to the asset categories, it's a single query:

a2 <- getenum(vcdb, c("action", "asset.variety"))
head(a2)

In previous versions there was a getenum and getenumby function for one enumeration or multiple enumerations respectively. However, as of version 2.1, getenumby is an alias to getenum and both calls have the same functionality.

And we can now just visualize that with ggplot in a nice 2x2 grid

slim.aa <- a2[which(a2$x>0), ]
gg <- ggplot(a2, aes(x = enum, y = enum1, fill = x, label = x))
gg <- gg + geom_tile(fill="white", color="gray80")
gg <- gg + geom_tile(data=slim.aa, color="gray80") + geom_text(data=slim.aa)
gg <- gg + scale_fill_gradient(low = "#D8EEFE", high = "#4682B4")
gg <- gg + scale_x_discrete(expand=c(0,0)) + scale_y_discrete(expand=c(0,0))
gg <- gg + ylab("") + xlab("")
gg <- gg + ggtitle("A2 Grid: Assets and Actions (confirmed data loss)")
gg <- gg + theme_bw()
gg <- gg + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5),
                 axis.text = element_text(size=14),
                 axis.ticks = element_blank(),
                 legend.position = "none",
                 plot.background = element_blank(),
                 panel.grid.major = element_blank(),
                 panel.grid.minor = element_blank(),
                 panel.border = element_rect(colour = "gray30"),
                 panel.background = element_blank())
print(gg)
print(proc.time() - savetime)


jayjacobs/verisr documentation built on May 18, 2019, 5:57 p.m.