While you can install ipumsPMA on any machine (including your home computer), most of its functions require server access. When working remotely, you can set this up by establishing a VPN connection and mounting the ipums drive: this works well in some cases, but larger jobs may run slowly.

Consider using RStudio on the GP1 server. It's advantages are:

Disadvantages include:

Initial set-up: create an .Rprofile

After login, RStudio will open to your home directory (mine is /home/gunth031), under which you'll find pkg/ipums/pma. Before installing ipumsPMA or any other packages, you'll want to create a file called .Rprofile in your home directory: this file will tell R to use a specified path to your package library.

(Note: if you're new to R, a package library holds all of the files for every package you add to R. It's normally fine to use a default location on your computer, but in this case we want to specify a location on the server where you have strong user permissions. Package libraries are version-specific, so you'll see a new folder associated with the major and minor version numbers for each version of R that is installed; when R is updated, you may have to rebuild your library.)

This will open blank .Rprofile document in the RStudio file editor:

usethis::edit_r_profile()

Paste the following into the document but don't save and close just yet:

if (Sys.info()["nodename"] == "f1ffcada775c") {
  lib_location <- file.path(
    "~/R", 
    paste0(R.version$platform,"-library"), 
    paste0(R.version$major, ".", strsplit(R.version$minor, "\\.")[[1]][1])
  )
  Sys.setenv(R_LIBS_USER = lib_location)
  rm(lib_location)
  options(repos = c(CRAN = "https://cloud.r-project.org"))
}

In the first line, you'll have to replace f1ffcada775c with your own specific node name. To find it, run this and copy / paste the output you see under "nodename":

Sys.info()["nodename"]
c(nodename = "f1ffcada775c")

Now, save the .Rprofile to your home directory. In the RStudio "files" tab, you should see a file called .Rprofile if you click the link that says "Home". Restart R to ensure that your changes take effect.

Check your library path:

.libPaths()
c("/home/gunth031/R/x86_64-pc-linux-gnu-library/3.6",
  "/usr/local/lib/R/site-library",
  "/usr/local/lib/R/library")

The first option should look like mine, but with your own user name and current version of R at the end. If so, you're ready to start installing packages! (If not, contact Derek Burk or the ISRDI r-users channel for help with this setup).

Installing ipumsPMA

While many R packages can be downloaded from CRAN with the function install.packages(), ipumsPMA cannot! Instead, you have to install ipumsPMA from its home on GitHub using a package called devtools and use the function devtools::install_github().

1) Install devtools

Install devtools from CRAN (the easy way!)

install.packages("devtools")

If RStudio asks you to restart R, click "yes".

2) Get a personal access token from GitHub

Head over here to get an access token associated with your account. This should reveal a character string that you can assign to an object called auth in R, for example:

auth <- "123456789abcdefghijklmnopqrstuvwxyz"

3) Use your GitHub access token to install ipumsPMA

Notice that I've set the following function to also install "dependecies" with dependencies = TRUE. This will ensure that, along with ipumsPMA, R will install any packages you'll need that you don't already have: if you've just finished creating a new (empty) package library following the instructions in this guide, you'll be installing a lot of new pacakges! If that's the case, be prepared for this to take some time...

devtools::install_github(
  repo = "gunth031/ipumsPMA", 
  host = "github.umn.edu/api/v3",
  auth_token = auth,
  dependencies = TRUE
)

Troubleshooting package installation: FAQ

I get a message including a numbered list of packages. It asks whether I want to install some or all of them from CRAN or from source.

Find the response option that installs all of them from CRAN, if possible.

I get a message that says: There are binary versions available but the source versions are later. Do you want to install from sources the package which needs compilation?

This usually happens when a package is updated on CRAN, but the binary isn't available on your platform. Respond "no": you won't get the latest and greatest version, but you won't have to do anything special to install from the pacakge source location

I get a fatal error that looks like: Error: (converted from warning) package 'ipumsr' was built under R version 3.6.3 folowed by Error: Failed to install 'ipumsPMA' from GitHub

Try this: Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true"), and then attempt to install ipumsPMA again.

I get a pop-up message telling me that a package is already loaded, and that a restart is required (OR) I get an error suggesting that a prior version of a package cannot be removed.

Restart R and try installing that package (e.g. rlang) with install.packages("rlang"). Then, attempt to install ipumsPMA again.

4) Install the ipums-metadata python library

If you've successfully installed ipumsPMA and all of its package dependencies, you should now have a package called reticulate, which contains functions that will allow R to run Python functions. We need to use it to install the ipums-metadata Python library as follows:

reticulate::conda_install(
  packages = "ipums-metadata",
  channel = "~/ipums/programming/conda/mpc/"
)

If the installation was successful, you should now be able to load ipumsPMA and access ipums-metadata functions through the object py:

library(ipumsPMA)
tt <- py$TranslationTable("aborev", "pma")
tt$samples()

Troubleshooting ipums-metadata installation: FAQ

When I try to call a ipums-metadata function, I get an error saying the module is not available / does not exist.

It's not uncommon for users to have multiple versions of Python installed. If so, we may need to choose one for reticulate to use when installing / reading packages. To see what versions of Python you have installed, run:

reticulate::py_config()

At the top of the output, the field python: shows the version of Python that reticulate is now using. The bottom section python versions found shows the versions that are installed. Look for a version that includes the environment name "r-reticulate".

In order to permanently change the version used by reticulate, open the R file .Renviron in the RStudio file editor:

usethis::edit_r_environ()

In that file, insert a new line setting the reticulate Python path to match the version you want it to use. For example, mine would be: RETICULATE_PYTHON="/home/gunth031/miniconda/envs/r-reticulate/bin/python". Save and close this file, then restart R for changes to take effect. Run reticulate::py_config() again to see if the new path has been adopted.

When I run reticulate::py_config(), I don't see a Python version in an environment called "r-reticulate".

Create the environment manually:

reticulate::conda_create("r-reticulate")

Now follow the steps above to ensure that reticulate uses this Python in this environment in every session.

I get a message asking if I want to install Miniconda (OR) I get a warning saying Minicoda is not installed.

You can install Miniconda with the following function, and then try installing ipums-metadata again:

reticulate::install_miniconda()

I get a warning saying that another Python package (e.g. Pandas) is not installed / could not be found.

You can install Python packages with reticulate, if needed:

reticulate::py_install("pandas")


mgunther87/ipumsPMA documentation built on Aug. 1, 2020, 12:22 a.m.