README.md

dfgraph

Why visualize R code you ask? Well now, we data scientists don't exactly write beautiful production-quality code. Let's be honest, it's more likely to be bewildering spaghetti code. And if you're like me, dear reader, you've experienced the exquisite torture of deciphering an existing "workflow", perhaps one that you yourself created! Why not profit from my past suffering with a tool to help navigate this quagmire of confounding computerspeak?

Disclaimer

This package is in an experimental, pre-alpha state. There are certain (probably fundamental) limitations to parsing dependencies from an R script. Also, packages that visualize within-script dependencies have been built before (e.g, CodeDepends), and it's not clear that they've caught on. However, I have plans to implement an interactive, exploratory interface, which may make for a more compelling feature set.

Installation

install.packages("remotes")
remotes::install_github("dkary/dfgraph")

Usage

Run dfgraph::graph("path_to_R_or_Rmd_file") (which leverages the DiagrammeR package with a DOT format under the hood).

dfgraph::graph(
    "testdat/svy-weight.R",
    # exclude diagnostic checks from the plot
    prune_labels = c("count", "summary", "sapply", "glimpse", "all.equal")
)

Customize

Prune Interim Nodes

Some nodes have only one dependency (referred to as "mutates"), and we can collapse these into their parent nodes with prune_types = "mutate":

dfgraph::graph(
    "testdat/svy-weight.R", prune_types = c("function", "mutate"),
    prune_labels = c("count", "summary", "sapply", "glimpse", "all.equal")
)

Focus on one Node

Focus on the network of a specified node using focus_node (which you can reveal interactively by hovering over a node):

dfgraph::graph(
    "testdat/svy-weight.R", prune_types = c("function", "mutate"), focus_node = 20, 
)

Display more Info

We can also display both assignment and primary function for each node using label_option = "both":

dfgraph::graph(
    "testdat/svy-weight.R", prune_types = c("function", "mutate"), focus_node = 20, 
    label_option = "both"
)

Limitations

The most obvious limitation is that code is inherently flexible, and I won't be able to capture all the ways people might program. For example:

However, I suspect that I can capture enough of the common data science coding patterns for the package to nonetheless be useful (more details in Proof of Concept).

Dry Feature List (Yawn)



dkary/dfgraph documentation built on Dec. 20, 2021, 12:07 a.m.