README.md

aprof (0.4.1) Release notes.

cran version rstudio mirror downloads rstudio mirror downloads

Amdahl's profiler, directed optimization. Assists the evaluation of whether and where to focus code optimization, using Amdahl's law and visual aids based on line profiling. Amdahl's profiler organises profiling output files, including memory profiling, in a visually appealing way. It is meant to help to balance development vs. execution time by helping to identify the most promising sections of code to optimize and projecting potential gains. The package is an addition to R's standard profiling tools and is not a wrapper for them.

Quicklinks

Quick start and tutorials

A "10 minute" quickstart guide is supplied on the aprof github-pages site here. Detailed tutorials on aprof usage are supplied in this PLOS Computational Biology paper on efficient programming in R. Text S1 supplies a step-by step guide for R users of any level.

Installation

There is a release on CRAN, but to install a more recent developmental version from github you can download the most recent version as zip or tar ball. To install decompress these and run R CMD INSTALL on the contents of the archives, or use the devtools package to install the current development version from R.

## devtools is required
require(devtools)
install_github("MarcoDVisser/aprof")

Dependencies

aprof is meant to be light and has no other dependencies other than the base R installation.

Examples

The code below defines a simple function "foo" that illustrates the inefficiency of growing data. We then run R's profiler to time the execution of the function and its various components, aprof is then used to analyse the profiling data (see below under "Examples of output").

require(aprof)
# create function to profile
     foo <- function(N){
             preallocate<-numeric(N)
             grow<-NULL
              for(i in 1:N){
                  preallocate[i]<-N/(i+1)
                  grow<-c(grow,N/(i+1))
                 }
            }

     #save function to a source file and reload
     dump("foo",file="foo.R")
     source("foo.R")

     # create file to save profiler output
     tmp<-tempfile()

     # Profile the function
     Rprof(tmp,line.profiling=TRUE)
     foo(5e4)
     Rprof(append=FALSE)

     # Create a aprof object
     fooaprof<-aprof("foo.R",tmp)
     plot(fooaprof)

Examples of output

The standard aprof plot. It shows the execution density for each line in a source code file.

    # From above example:
    # Create a aprof object
    fooaprof<-aprof("foo.R",tmp)
    profileplot(fooaprof)

A profile plot, which can be used for large source files. It uses the profiler samples to attempt to reconstruct the progression through the program lines. The left panel shows the progression through time, while the largest bottleneck is indicated in red. The right panel gives the density of the line calls.

Printing any aprof object will return basic information:

fooaprof
Source file:
foo.R (9 lines).

 Call Density and Execution time per line number:

      Line  Call Density  Time Density (s)
[1,]  7     282           5.64            
[2,]  6     4             0.08            

 Totals:
 Calls       287 
 Time (s)    5.76   (interval =      0.02 (s))

Using "summary" gives projections of potential code optimization gains. A table is returned with the theoretical maximal improvement in execution time for the entire profiled program when a given line of code is sped-up. See ?summary.aprof for more details.

summary(fooaprof)

Largest attainable speed-up factor for the entire program

        when 1 line is sped-up with factor (S): 

     Speed up factor (S) of a line 
            1      2      4      8      16     S -> Inf**
Line*: 7 :   1.00   1.96   3.76   6.98  12.19  48.00     
Line*: 6 :   1.00   1.01   1.01   1.01   1.01   1.01     

Lowest attainable execution time for the entire program when

             lines are sped-up with factor (S):

     Speed up factor (S) of a line  
            1      2      4      8      16   
All lines   5.760  2.880  1.440  0.720  0.360
Line*: 7 :  5.760  2.940  1.530  0.825  0.472
Line*: 6 :  5.760  5.720  5.700  5.690  5.685

    Total sampling time:  5.76  seconds
 *  Expected improvement at current scaling
 ** Asymtotic max. improvement at current scaling

One useful feature is the "targetedSummary" function. This will give a detailed summary of the time taken by each function in a given line. In the example, a call to "c" ("combine" function) in line 7, takes most time. When the option "findParent" is set to "TRUE", aprof will attempt to detect any parent functions (functions nested within other functions) and report the parent and child functions. In this case the function c is only nested within the code in L7 (line 7), and has no further parent calls.

targetedSummary(fooaprof,target=7,findParent=TRUE)
Function Parent Calls Time
c          L7   168 3.36

Memory statisics

Using the previous function foo (made above), we can set "memory.profiling=TRUE", and obtain some basic memory profiling statistics. Statistics are summarized in Megabytes and included all operations (allocations and releases).

     # Profile the function
     Rprof(tmp,line.profiling=TRUE,memory.profiling=TRUE)
     foo(5e4)
     Rprof(append=FALSE)

     # Create a aprof object
     fooaprof<-aprof("foo.R",tmp)
     plot(fooaprof)

Printing a aprof object will now include basic memory usage:

fooaprof
Source file:
foo.R (9 lines).

 Call Density and Execution time per line number:

      Line  Call Density  Time Density (s)
[1,]  7     282           5.64            
[2,]  6     4             0.08            

 Totals:
 Calls       287 
 Time (s)    5.76   (interval =      0.02 (s))

 Memory statistics time per line number:

      Line   MB   
[1,]  6      0.259
[2,]  7      2.951

 Total MBs (allocated and released).

to do list

Fix memprofiler bug Make a pretty line extraction tool as suggested by "readLines("foursail.R")[38]"

Thanks

Special thanks to Tyler Rinker, Dason Kurkiewicz, Caspar Hallmann, Angel Rubio, Mark Miller and Diego Mayer-Cantu for comments, bugreports, commits and additions to this package. Sean M. McMahon, Cory Merow, Philip Dixon, Sydne Record and Eelke Jongejans thanks for all the suggestions, comments and testing while I was developing this package.



MarcoDVisser/aprof documentation built on Jan. 18, 2020, 9:15 p.m.