archivedb: CRAN archive (CRAN-archive.html + archivedb)

CRAN archive (CRAN-archive.html + archivedb)


The following functions deal with the packages archived in CRAN. The html file downloaded from CRAN contains the regular packages that have been updated once and the packages that have been removed from CRAN by CRAN administrators. It does not contain the first version of the packages uploaded to CRAN and never updated. These files and the files removed from CRAN index can be guessed through a comparison with crandb.

archivedb_down downloads from CRAN the html file of the archived packages, saves it on the disk under the name filename, extracts from it and loads in .GlobalEnv a data.frame named archivedb.

archivedb_load reads the html file filename saved on the disk, extracts from it and loads in .GlobalEnv a data.frame named archivedb.

archivedb_npkgs returns the number of packages listed each category: number of packages in crandb, in archivedb, at first version, at subsequent version and removed from crandb (CRAN index).

archivedb_pkgs returns the packages listed in CRAN archive (= archivedb).

archivedb_rempkgs returns the packages removed from CRAN but available in CRAN archive. The result can be combined with p_check to display the last CRAN check performed (if available). See the example.

archivedb_list compares the data.frame archivedb and crandb and returns a list with the following items:

  • pkgs_crandb: the packages listed in crandb.

  • pkgs_archivedb: the packages listed in archivedb.

  • pkgs_first: the packages in first version in crandb.

  • pkgs_updated: the packages with more than one version in crandb.

  • pkgs_removed: the archived packages removed from CRAN regular index, i.e. not listed in crandb.

  • dfr_crandb: data.frame pkgs_crandb + Published date.

  • dfr_archivedb: data.frame pkgs_archivedb + Archived date.

  • dfr_first: data.frame pkgs_first + Published date.

  • dfr_updated: data.frame pkgs_updated + Published date.

  • dfr_removed: data.frame pkgs_removed+ Archived date.

  • npkgs: the number of packages in each category.

Use p_archive_lst to list the package versions stored in CRAN archive.

Use p_downarch to download packages from CRAN archive, either the latest version or a specific version number.


archivedb_down(filename = "CRAN-archive.html", dir = ".",
  url = "")

archivedb_load(filename = "CRAN-archive.html")

archivedb_npkgs(archivedb = get("archivedb", envir = .GlobalEnv),
  crandb = get("crandb", envir = .GlobalEnv))

archivedb_pkgs(archivedb = get("archivedb", envir = .GlobalEnv))

archivedb_rempkgs(archivedb = get("archivedb", envir = .GlobalEnv),
  crandb = get("crandb", envir = .GlobalEnv))

archivedb_list(archivedb = get("archivedb", envir = .GlobalEnv),
  crandb = get("crandb", envir = .GlobalEnv))



character. The path to file "CRAN-archive.html" (or equivalent).


character. The directory where filename or tar.gz files are saved. Default value "." is the current directory.


character. The url address of CRAN archive html file.


data.frame archivedb. The archivedb data.frame format loaded in memory by archivedb_down or archivedb_load.


data.frame crandb. The data.frame of CRAN packages.


### DOWNLOAD archivedb AND COMPARE IT WITH crandb.
## In real life, download archivedb and crandb from CRAN
## with the functions archivedb_down() and crandb_down().
## In this example, we load two small files.

crandb_load(system.file("data", "zcrandb.rda", package = "RWsearch"))
archivedb_load(system.file("aabb", "zCRAN-archive.html", package = "RWsearch"))

lst <- archivedb_list()
lapply(lst, head)
lapply(lst, tail)

xlim <- as.Date(range(lst$dfr_archivedb$Archived)) ; xlim
op <- par(mfrow = c(2,1))
     breaks = 12, freq = TRUE, las = 1, xlim = xlim)
     breaks = 12, freq = TRUE, las = 1, xlim = xlim)

