R Virtual environments

Resources

Renv https://FrauBSD.org/Renv
vcr https://FrauBSD.org/vcr
pkgcenter-newR https://FrauBSD.org/pkgcenter-newR

Table of Contents


About

R was the 27th computer programming language I learned. When I learned Python, virtual environments did not exist. Pip (the eponymous circular acronym for the “pip installs python” utility) likewise did not exist. So it can be said that I learned R before I learned virtual environments, but not that I learned R before Python. That being said, we are poised to dive into my story of how I developed a custom solution for R virtual environments built upon my experience with 26 different programming languages over a couple dozen years of programming experience.


Requirements

From my current understanding, a virtual environment is akin to a project directory.

This means a collection of:

  • Various scripts* collected into a single top-level directory (TLD) defining a project
  • All dependencies required by said scripts and utilities** needed by the project

* The term scripts is used here because we are referring to interpreted language. For compiled language, I will use the term binary.
** The term utilities is used here synonymously with executables to refer to executable scripts or binaries (but not libraries, for example).

The immediate question that comes to mind if you are going to call something into existence worth using is, “what are the different ways one can bootstrap a virtual environment given the above definition?”

To answer this question, I asked another question: “what features do we want/need to support at a high level?”

Basic requirements (Introducing Renv):

  • Support truly executable scripts that do not have to be invoked as an argument to a wrapper/interpreter
  • Move scripts without modification to another project directory for testing
  • Have the project directory live on NFS/DFS so it can be accessed cluster-wide with ease via SGE

Library selection:

  • Provide seamless migration of project dependencies to be provided locally instead of NFS
  • Allow virtual environment dependencies to supersede localized CI/CD pipelined dependencies

Interpreter selection:

  • Be able to use different versions of the desired interpreter (R-3.1.1 versus R-3.3.1 versus R-3.5.2 for example)

Architecture migration:

  • Make it easy to convert library dependencies compiled on one architecture to another
  • Enable cross-platform testing
  • Produce the means to allow heterogenous cluster nodes between architectures allowing smooth/rolling migrations

Pipelining:

  • Prevent build errors, simple mistakes, and adhoc pathname dependencies in scripts
  • Does not require devtools to build the virtual environment (support R versions older than 3.0.2)

That is a tall order and I had to think long and hard about how to achieve all this. Supporting R documentation, talking with co-workers, and individual research led me to draw some conclusions about how one should not approach the problem.
From R development standards and well-known practices:

IMPORTANT
Using source() instead of library() creates pathname dependencies in scripts that cannot be managed by manipulating the R session.

A virtual environment modifies where library() pulls libraries from the filesystem (the so-called R session’s library path).

However, requiring an external script to generate the library path means we cannot support truly executable scripts.

At the same time, you would want to avoid the following for the above-stated reasons:

.libPaths(c("library", .libPaths()))

While that will pick up the “library” directory in the current working directory (CWD) when invoked directly, it relies on the script being invoked as “./scriptname.R“. Attempting to run it via full path or another directory, for example when executing via cron, you may not pick up the library directory or worse, pick up the wrong library directory (for whatever value of “wrong” you may wish to define).

This behavior puts the same pathname dependency in your executable script as using source() and should be avoided to properly implement flexible virtual environments (at least, if we want to have a chance at supporting that lofty list of wants/needs above).
The solution that I arrived at was a new R interpreter. One that will:

  • Automatically bootstrap your virtual environment before invoking your script
  • Not rely on environment variables
  • Support interpreter indirection
  • Be easily overridden by invoking the interpreter directly or using a wrapper
  • Be invoked interactively for testing in the virtual environment
    Supports all of the above previously-stated requirements
  • Eliminates path dependencies from scripts as required by common R development standards and practices

Then there is the little matter of a name for this new interpreter.


Introducing Renv

If you are familiar with /usr/bin/env as a wrapper for accessing your desired interpreter (be it sh, bash, R, python, awk, perl or other) you should know how to use Renv.
Instead of using an invocation line like below (suggested by standard R development practices and tutorials):

#!/usr/bin/env Rscript

Use the following for a script that wants to utilize virtual environment features:

#!/usr/bin/Renv Rscript
INFORMATION
Not all systems have “Rscript”. A quick way to check for what you can use is to type “Rscript” (without quotes) and then TAB twice to invoke command-line auto-completion to suggest all the possible completions, such as Rscript-3.1.1, Rscript-3.3.1, etc.
CAUTION
This is the foot of the rabbit hole. It’s a lot deeper than you think. Don’t let the simplistic addition of a single character here (changing /usr/bin/env to /usr/bin/Renv) fool you. Things are about to get much deeper from here onward.

This tells Renv to:

  1. Look for a “library” directory in the same directory the script lives in (irrespective of CWD)
  2. Launch Rscript with a modified .libPaths() if the “library” directory exists

But that’s not all.

From the above requirements, we now have:

  • Support truly executable scripts that do not have to be invoked as an argument to a wrapper/interpreter
    • The invocation line can bootstrap the execution environment
    • The invocation line is treated as a comment when script passed as argument to wrapper/interpreter
  • Move scripts without modification to another project directory for testing
    • Moving the script to another project directory will have it load the library contents from the new project
  • Have the project directory live on NFS/DFS so it can be accessed cluster-wide with ease via SGE or other scheduling software
    • Where the project directory lives (e.g., where the NFS mount is attached) is immaterial


Library Selection

  • Provide seamless migration of project dependencies to be provided locally instead of NFS
  • Allow virtual environment dependencies to supersede localized CI/CD pipelined dependencies

A project directory living on NFS/DFS is doomed to sacrifice performance when the I/O pipeline becomes saturated at either the sending or receiving end. One cannot rely on caching to save you from slow access times for executable code because cache elements are commonly evicted to make room for user-land memory pressure. While it makes sense to use NFS to satisfy project dependencies at small scale, at larger scale you end up reloading dependencies in high-memory conditions because you can’t hold a filesystem cache. We call this “living on the wire” and it puts tremendous load on the sync-store (your NAS, SAN, REST, or whatever) backend. We commonly see loads of over 500 (10x higher than desired) for system load averages on the NAS providing access to user home directories.

The problem therein becomes how to package virtual environment contents for distribution to the local disk. However, once you solve that, you still need a way of seamlessly accessing the newly deployed localized binaries.

Renv can check for more than just a “library” directory and it checks in more than one place.

Previously we saw how Renv (without any command-line options) will look for a “library” directory in the same directory as the script. In that scenario, if a script were to examine .libPaths(), one would get something similar to:

[1] "/data/homes/raidb/home/dteske/library"
[2] "/opt/R/3.3.1/lib64/R/library"

When we pipeline the virtual environment library into RPMs (for efficiency to live on the local disk), we plan to have the RPM install to:

/opt/R/<version>/lib64/R/<library>

But this creates a conflict with:

  • /opt/R/<version>/lib64/R/library

which provides the base libraries for R outside the virtual environment.

For this, I provide command-line options to Renv:

#!/usr/bin/Renv -l library.project7 Rscript

This will instruct Renv to instead look for a directory named “library.project7” in the same directory as the script.

It will also check for a “library.project7” directory in /opt/R//lib64/R/

Examining .libPaths() in this scenario would yield:

[1] "/data/homes/raidb/home/dteske/library.project7"
[2] "/opt/R/3.3.1/lib64/R/library.project7"
[3] "/opt/R/3.3.1/lib64/R/library"

Directories above:

  1. Inside the project directory (aka virtual environment)
  2. Inside the R distribution
  3. Base R library

When you say:

library(name)

It will look first in the library.project7 directory in the script’s directory, then in the library.project7 directory that is distributed via the Continuous Integration (CI) and Continuous Deployment (CD) pipeline, and lastly in the base R library collection.

Moving the project library directory (library.project7 in this scenario) aside or using a full pathname to the library directory (example below) will cause only the localized (non-NFS, non-DFS) libraries to be loaded:

#!/usr/bin/Renv -l /opt/R/3.3.1/lib64/R/library.project7 Rscript

Resulting in the following .libPaths():

[1] "/opt/R/3.3.1/lib64/R/library.project7"
[2] "/opt/R/3.3.1/lib64/R/library"

However, using a full path precludes you from potentially augmenting or overriding the pipelined binaries, should the need arise.
Renv intelligently adjusts .libPaths() based on the existence of multiple “-l dir” option-arguments which can be relative or absolute.

For large projects using virtual environments that want to separate scripts into separate directories, such as “bin“, you can use paths relative to the directory the script lives in without fear that the current working directory will effect the loading of the proper library collections.

#!/usr/bin/Renv -l ../library.project7 Rscript

Inspecting .libPaths() inside a script launched in this scenario (if all the below directories exist) would be:

[1] "/data/homes/raidb/home/dteske/library.project7"
[2] "/opt/R/3.3.1/lib64/R/library.project7"
[3] "/opt/R/3.3.1/lib64/R/library"
INFORMATION
Despite the “../” prefix on the relative library reference, Renv knew to add a “library.project7” directory to “.libPaths()” if it existed in the same directory as the base R library.
If you choose to name your library directory “library” then pipelining the executables into RPM will require putting them into the base R library collection – which means stricter guidelines to prevent regressions against scripts running in the global context outside of virtual environments.

Renv orders the directories that it finds specifically to enable the ability to slip RPM-distributed binaries into the search path, but allow forward progression through divergence at the upper layer. By allowing the specified library at the project-level to override everything, a project maintainer can make changes to the library, test code, and then – when it comes time – feed a snapshot of the virtual environment into the RPM pipeline for automated integration and deployment, bringing the RPM layer in-sync with the project layer.


Interpreter Selection

  • Be able to use different versions of the desired interpreter (R-3.1.1 versus R-3.3.1 versus R-3.5.2 for example)

Just as /usr/bin/env allows you to select the interpreter based on the $PATH environment variable, Renv allows this and more for your R scripts.

A few key features:

  • You can use R or Rscript (including R-<version> and Rscript-<version>) interchangeably
  • You can create symbolic links to Renv to make referencing specific versions of R more convenient

When you install any of the R*-fraubsd RPMs, such as R331-fraubsd, a symbolic link is created:

  • /usr/bin/Renv-3.3.1 -> Renv

When invoked with the name Renv-3.3.1, Renv will default to using R-3.3.1 in the scope of $PATH to interpret your script. This allows you to use as an invocation line:

#!/usr/bin/Renv-3.3.1

or

#!/usr/bin/Renv-3.3.1 -l library.project7

On the command-line, you can invoke Renv interactively.

Renv-3.3.1

Without the aid of the symlink:

Renv R-3.3.1

This method also works with direct references to NFS based R distributions.

Renv /share/bin/R-3.3.1
IMPORTANT
This should be avoided unless there is a specific reason to avoid $PATH expansion.

If changes are required to the way virtual environments are utilized, Renv is a great tool for centralizing that logic as an intermediary between the script and interpreter.


Architecture Migration

  • Make it easy to convert library dependencies compiled on one architecture to another
  • Enable cross-platform testing
  • Produce the means to allow heterogenous cluster nodes between architectures allowing smooth/rolling migrations

Renv cannot help with these items. For that, we must look to the Virtual Comprehensive R (VCR) utility.

Taking an existing library directory that was compiled on one platform (say, CentOS 6) and trying to use it on another (say, CentOS 7) is usually difficult, but vcr aims to make it as painless, fast, and easy as possible.

Given a particular virtual environment library, the following command will record a “playlist”.

vcr list

If there is a “library” dir in the current working directory, this will produce a list of libraries required to build that environment.

If you need to list a different directory:

vcr list path/to/library

You will want to capture the output into a text file. While not strictly necessary, if problems occur, it is good to be able to modify the output with a text editor before trying again (for example, if software has changed location).

vcr list > reqs.txt
IMPORTANT
Use a more descriptive name than “reqs.txt”

You can then play this list on the target architecture (say, CentOS 7).

vcr play -d library.newarch reqs.txt
IMPORTANT
Suggest using something like “el6” or “el7” instead of “newarch”

If your library only contains items from CRAN, it will go smoothly. In such cases, you could even tie the process into a single command:

vcr list | vcr play -d library.new /dev/stdin
IMPORTANT
If your library collection contains libraries that come from GitHub or other non-CRAN source and you did not build the library directory using vcr, you will have to modify the output from “vcr list”. Commands such as “vcr install” which takes a URL to a CRAN-looking package record the URL provided so that a later “vcr list” will accurately describe where the software comes from.

A script that is using Renv to access a “library” directory might have an invocation line of:

#!/usr/bin/Renv-3.3.1

To make this script load the “library.new” directory containing CentOS 7 binaries instead of the “library” directory containing CentOS 6 binaries (for example):

Renv-3.3.1 -l library.new script [args]

This enables efficient cross-platform testing, but is not designed to be the end of the road. Allowing heterogeneous operation wherein your scripts can function properly on both CentOS 6 and 7 requires the ability to access a library that is architecture specific without the need to reference two different directory names.

That is why Renv checks for a named directory in the base R directory. In the absence of a “library.xxx” directory from such an invocation line in a script:

#!/usr/bin/Renv-3.3.1 -l library.xxx

When your script runs it will pick up will either be:

/opt/R/3.3.1/lib64/R/library.xxx

or

/software/R-3.3.1/lib64/R/library.xxx

Depending on what “R-3.3.1” evaluates (be it /usr/bin/R-3.3.1 symlink pointed to /opt/R/3.3.1/bin/R or perhaps instead /software/bin/R-3.3.1 symlinked to /software/R-3.3.1/bin/R)

IMPORTANT
The ability to pass an absolute path for the library location to Renv allows you to ignore a local directory in the virtual environment that may be the incorrect architecture due to NFS wherein, for example, a script may be running on CentOS 7 but the library directory was compiled on CentOS 6.

In this scenario when a script hosted on NFS run on various platforms will pick up the right binaries loaded onto the local disk. This not only solves the issues of targeting platform specific binaries but also improves performance by reducing reliance on NFS.


Pipelining

  • Prevent build errors, simple mistakes, and adhoc pathname dependencies in scripts
  • Does not require devtools to build the virtual environment (support R versions older than 3.0.2)

To pipeline a virtual environment so that you get CentOS 6/7 specific binaries in /opt/R/<version>/lib64/R/library.<name> (respectively) is not hard.
Run:

vcr play -d library.preflight project_reqs.txt

Where “project_reqs.txt” is the output of “vcr list” on the library you want pipelined and “library.preflight” is just a temporary directory name that we will later delete after testing.

IMPORTANT
Make sure library.preflight doesn’t already exist at the start of your preflight test.

When you can successfully complete a preflight run of your project requirements, you can proceed to pipelining and automation.

The “project_reqs.txt” file should be stored in git so that Jenkins can automate integration efforts whenever the file is modified. You should also create an automation user (for example, named “rbot“) for all R-related pipelining tasks.

You should check the requirements file into a git repository where a “git tag” operation can version the reqs file against the code that relies on it. For example, if you are pipelining a virtual environment for code that lives in the foobar git repository, the requirements file should live there.

As in all good things, the most important thing may be the naming of your library collection.

Your reqs file, for the purpose of pipelining, should be named:

*_<r-version>.lock
WARNING
The “.lock” suffix is required by the pipeline. The “<r-version>” value should be “3.3.1” for example.

The prefix (what comes before “_<r-version>.lock” does not dictate the name of the library collection in “/opt/R/<r-version>/lib64/R/” nor does it dictate the prefix of RPMs produced by the pipeline.

The naming of the RPMs is governed in one of a couple different ways.

However, before we talk about naming, I need to talk about the pipeline framework.

The “pkgcenter-newR” git repository is a skeleton repo that exists as a template for each individual pipeline. Each virtual environment deployment pipeline should be a copy of this git repository at the onset.
https://FrauBSD.org/pkgcenter-newR

IMPORTANT
Don’t fork the repo unless you intend to upstream changes to the framework that affect future deployments from the pkgcenter-newR skeleton repo. Instead, use the below process for copying the framework into a new repo.

First create the new repo. Here is where the naming comes into play discussed earlier.

By default (this behavior can be overridden) the name of the repository minus the “.git” suffix and “pkgcenter-” prefix becomes the name of RPMs produced. That is to say that the git repository named “pkgcenter-newR” will generate RPMs named “newR-*.rpm” that installs R libraries to “/opt/R/<version>/lib64/R/library.newR“.

Create new repository: https://github.com/new

Let’s say we want to package a virtual environment for a project named “project7” which will have RPMs named project7R* (e.g., project7R311-*.rpm, project7R331-*.rpm, etc.)

IMPORTANT
It is highly recommend that aside from making a new repo with name beginning with “pkgcenter-“, the name should end with “R” because the R version will be appended to the repo name. A git repo named “pkgcenter-project7R” will produce RPMs named “project7R331-*” for R-3.3.1 compiles; “project7R311-*” for R-3.1.1 compiles; etc.

You can override the default behavior of taking the git repository name as the prefix for RPMs. When we copy the framework into t he new git repo, we will configure the build.conf file which has an option for hard-coding the prefix of RPM names.

Steps to copy the pkgcenter-newR build framework into a new git repository named pkgcenter-project7R:

git clone https://github.com/FrauBSD/pkgcenter-newR.git
git clone https://github.com/FrauBSD/pkgcenter-project7R.git
rsync -av --exclude=.git{,/*} pkgcenter-newR/ pkgcenter-project7R/
IMPORTANT
Always put a trailing-slash (/) after each pathname (in rsync parlance the trailing-slash should be pronounced “contents-of” to remind you that if you forget it, you may end up with a copy of the src inside the dest argument (and if you are using the --delete flag with rsync, this could be fatal or at least potentially hazardous to your health).

Import the changes to your new repo.

cd pkgcenter-project7R
.git-hooks/install.sh
# Enter proper username/email address if prompted
git add .
git commit -m 'Import pkgcenter-newR framework'
git push origin master

Inside the ./depend/jenkins directory is a file named build.conf which contains all the details for building RPMs in an automated fashion from the output of “vcr list” (which is a pip-like “.lock” file).

For our mythical “project7” secret project, we need to put together some libraries.

Our secret project depends on the following:

  • bender version 0.1.0
  • blender (latest version)
  • catboost (latest version)

The first two are available from CRAN and the last is available from GitHub.

Using the previously discussed techniques for managing a virtual environment library, let’s get ready for a preflight test. Before reaching preflight you need a runway which means installing the software for an initial test.

vcr-3.3.1 install bender==0.1.0
vcr-3.3.1 install permute vegan==2.5-4 blender
vcr-3.3.1 install https://github.com/catboost/catboost/releases/download/v0.15.2/catboost-R-Linux-0.15.2.tgz
INFORMATION
blender requires vegan which requires permute. The latest version of vegan requires a newer version of R, so we specify the last version that works with R-3.3.1 (just one minor revision older than the latest version). The vcr utility does not automatically solve dependencies for you. You must install dependencies first. Above, you see that we have provided permute and vegan on the same line before blender.
IMPORTANT
While vcr can take any URL, the name of the tarball file should be in CRAN format (<name>_<version>.tar.gz). This is important for pipelining into binaries so that the CI/CD pipeline can determine the version of the package from the filename during the build process.

Taking the URL directly from GitHub, such as:
https://github.com/catboost/catboost/releases/download/v0.15.2/catboost-R-Linux-0.15.2.tgz

The filename will not work with the CI/CD pipeline which expects CRAN-like names.
Download the tarball, change the name to be in the format of “<name>_<version>tar.gz” and upload it to Artifactory or similar HTTP-based yum repository.

Upon completion, we should be able to generate the requirements file and then toss the test library.

vcr list | tee project7_3.3.1.lock
rm -Rf library

It is recommended you then stash the secret project7_3.3.1.lock file over in ~/src/configs/proj/r_reqs/ (potentially part of a git repo):

mkdir -p ~/src/configs/proj/r_reqs
mv project7_3.3.1.lock ~/src/configs/proj/r_reqs/

Configure depend/jenkins/build.conf with details such as (actual excerpt; all defaults):

#
# RPM name prefix
#
# The default behavior is to take the repo name and remove `pkgcenter-'.
# A value of `R' for example will produce:
#       R311-{vcran,gsl,etc.}
#       R331-{vcran,catboost,etc.}
#       etc.
#
# You can override the prefix and hard-code a value by setting RPMPREFIX.
#
#?RPMPREFIX=R

#
# Location of checked-out config repo
# NB: If unset or NULL, download.sh cannot `git pull' to update the repo
# NB: Ignored/Unused if download.sh is given `-n' to prevent `git pull'
#
CONFIG_REPO=~/src/configs

#
# Git remote settings for cloning config repo (if required)
# NB: If $CONFIG_REPO dir does not exist, download.sh will clone $CONFIG_REMOTE
# NB: If unset or NULL, download.sh will error out unless either `-n' given
#     (to prevent `git pull' in $CONFIG_REPO) or $CONFIG_REPO exists
#
CONFIG_REMOTE=git@github.com:SomeProj/configs.git

#
# Location of lock files within config repo
# NB: Only used in this file
#
R_REQS=proj/r_reqs

Note the potential override to prevent the git repository name from influencing RPM names, should that be desired.

For each target distribution of R, you will need a section like (also in depend/jenkins/build.conf; actual excerpt; all defaults; changed in following steps):

############################################################ R-3.1.1

BUILD1_NAME=reqs_3.1.1.lock
BUILD1_PATH="$CONFIG_REPO/$R_REQS/$BUILD1_NAME"

# CentOS 6
BUILD1_RHEL6_VCRAN=vcran-3.1.1-rhel6-x86_64.conf
BUILD1_RHEL6_VCRAN_XFORM="$VCRAN_XFORM"

# CentOS 7
BUILD1_RHEL7_VCRAN=vcran-3.1.1-rhel7-x86_64.conf
BUILD1_RHEL7_VCRAN_XFORM="$VCRAN_XFORM
        # NAME          BAD_VERSION     GOOD_VERSION
        gsl             1.9-10          1.9-10.1
" # END-QUOTE

The default build.conf contains a section for R-3.1.1 (shown above) and R-3.3.1.

IMPORTANT
We are only going to automate a single build (for R-3.3.1) in this demo, and the builds are sequentially numbered (BUILD1, BUILD2, etc.), so only the first section should be used and others following should be deleted.
IMPORTANT
You must make “BUILD<n>_NAME” and “BUILD<n>_PATH” end in “_<r-version>.lock” where “<r-version>” matches the R version (for example, “3.3.1” or “3.1.1” or other).

Above is the default state of the file, but for our super secret skunk-works project, “project7”, we will be changing BUILD1_NAME to “project7_3.3.1.lock” for the R-3.3.1 build.

I am not going to actually commit this file to any repo, but I faked it. Placing the file where it should be but not actually committing it, the framework can do everything you expect it to without prematurely polluting the config repo.

Configured project build.conf:

############################################################ R-3.3.1

BUILD1_NAME=project7_3.3.1.lock
BUILD1_PATH="$SWENG/$R_REQS/$BUILD1_NAME"

# CentOS 6
BUILD1_RHEL6_VCRAN=vcran-3.3.1-rhel6-x86_64.conf
BUILD1_RHEL6_VCRAN_XFORM="$VCRAN_XFORM"

# CentOS 7
BUILD1_RHEL7_VCRAN=vcran-3.3.1-rhel7-x86_64.conf
BUILD1_RHEL7_VCRAN_XFORM="$VCRAN_XFORM"

Earlier (above), we moved (but did not yet “git add”) the “project7_3.3.1.lock” file to ~/src/configs/proj/r_reqs/. If all goes well we can add/commit/push the file as a sign of reaching the first pipelined version for the virtual environment.
All there is left to do to turn your “.lock” file into RPMs is (in the ./depend/jenkins/ directory of your “pkgcenter-project7R” git repository):

vi ~/.netrc
# Configure credentials for artifactory as-documented in build.conf
./build_fraubsd.sh -a

If your Artifactory credentials are correct, you will be able to then turn around say:

sudo yum clean expire-cache
sudo yum install project7R331-vcran
sudo yum install project7R331-catboost

Notice how the URL elements of the “.lock” file were split out into their own RPMs. This is because GitHub projects and CRAN elements are released in separate cycles independently from each other and as-such are pipelined separately to help maintain that relationship.

If for some reason your Artifactory credentials are not working, you can expect to find local copies of the RPMs in your home directory under ~/jenkins/.

$ ls -ltr ~/jenkins/
total 8
drwxrwxr-x 2 dteske users 4096 Jul 12 15:50 rhel7-x86_64/
drwxrwxr-x 3 dteske users 4096 Jul 14 10:47 rhel6-x86_64/
$ ls -ltr ~/jenkins/*/project7*
-rw-rw-r-- 1 dteske users  4491288 Jul 14 10:32 /home/dteske/jenkins/rhel6-x86_64/project7R331-vcran-1.0-2.x86_64.rpm
-rw-rw-r-- 1 dteske users 38307456 Jul 14 10:46 /home/dteske/jenkins/rhel6-x86_64/project7R331-catboost-0.15.1-1.x86_64.rpm

Notice how our pipeline only produced rhel6-x86_64 RPMs. To get the rhel7-x86_64 RPMs you have to compile on CentOS 7.

There is a one-time task at the start of each project, which is to tie the vcran package and ancillary external packages together using a meta package. That is covered below. in the next section.


Combining

The above process creates the RPMs but you can optionally, for convenience, combine a “meta-RPM” which installs them for you.

Following the above examples, for our secret “project7” virtual environment that we pipelined into RPMs, we got 2 RPMs:

  • project7R331-vcran-1.0-1.x86_64.rpm
    • Contains all the CRAN libraries (bender, permute, vegan, blender)
  • project7R331-catboost-0.15.1-1.x86_64.rpm
    • This comes from GitHub and we rename the tarball and put it in Artifactory

We want to create a “project7R331-fraubsd” RPM that installs the above two RPMs.
In the pkgcenter-project7R repository, starting from the TLD of the git repo:

cd redhat/rhel6-x86_64
../create.sh Applications/Engineering/project7R331-fraubsd
cd Applications/Engineering/project7R331-fraubsd

Edit the SPECFILE

Change this line (fix “First Last” and “flast@“) toward the top in the “HEADER” section:

Packager: First Last <flast@fraubsd.org>

Still in the “HEADER” section, in the blank space after:

BuildRoot: %{_tmppath}/src

but before the next section (“CONFIGURATION”), add:

Requires: R331-fraubsd
Requires: project7R331-vcran
Requires: project7R331-catboost
IMPORTANT
Don’t forget to add “Requires: R331-fraubsd” (for R-3.3.1). It provides “/opt/R/3.3.1“, all the CRAN libraries at the specific versions required, “/usr/bin/R-3.3.1“, and “/usr/bin/Rscript-3.3.1“. It also brings in Renv for you.

ASIDE: You can use “vcr diff /share/R-3.3.1 /opt/R/3.3.1” to confirm that /opt/R/3.3.1 is a properly localized version of a shared or other version of R at, in example, /share/R-3.3.1/.

Change the line (fix “First Last” and “flast@“) at the bottom after “%changelog“:

* Sun Jul 14 2019 First Last <flast@fraubsd.org> 1.0-1

Save the SPECFILE and exit your editor.

Remove the “src” directory because this is a meta package and we don’t need it.

rmdir src

Make sure the “src” directory stays deleted by editing pkgcenter.conf and adding it to the list of directories which are cleaned when you, or a script, says “make distclean“:

#
# Directories to create before (and clean up after) creating the package.
# NOTE: Be careful to list sub-directories in depth-first order.
#
DIRS="
# Directory
$SRCDIR
"

Do a test compile of our meta RPM before you upload, commit, tag, and push your changes.

make

You should now have a file named “project7R331-fraubsd-1.0-1.noarch.rpm” in the current working directory. This is usually enough to tell me that I did not make any mistakes and that I can proceed to uploading everything.

mv project7R331-fraubsd-1.0-1.noarch.rpm ~/jenkins/rhel6-x86_64/
make distclean
ls # Pedantic
make autoimport
git push origin master

Upload it to Artifactory:

afput ~/jenkins/rhel6-x86_64/project7R331-fraubsd-1.0-1.noarch.rpm # option 1
afput ~/jenkins/rhel6-x86_64/project7R331-fraubsd-1.0-1.noarch.rpm # option 2

For CentOS 6 we run it twice and upload to two different. artifactory repositories. For scripts, you can do:

# Alternate approach to above (for scripts)
afput -r yum-fraubsd/centos/6/x86_64/x86_64/ ~/jenkins/rhel6-x86_64/project7R331-fraubsd-1.0-1.noarch.rpm
afput -r yum-fraubsd-el6-x86_64/ ~/jenkins/rhel6-x86_64/project7R331-fraubsd-1.0-1.noarch.rpm

After anywhere between 1 and 30 seconds, the following should suffice to install all the required library elements:

sudo yum clean expire-cache # else you may have to wait up to a day
sudo yum install project7R331-fraubsd

This will have populated local files on disk for your virtual environment to make use of (see next section).

rpm -qal | grep project7 | sort

The project7R331-fraubsd RPM we created does not have versioned “Requires” statements in the SPECFILE so as changes are made to the “.lock” file (“vcr list” output for your virtual environment), “yum upgrade” will continue to upgrade the various components that are already installed. Only if you add a new external (non-CRAN) library (one that uses “-u URL” in the “.lock” file) will result in a requirement to modify the “*-fraubsd” RPM SPECFILE, adding the new dependency.


Conclusion

With our “project7” RPMs installed and providing the directory “/opt/R/3.3.1/lib64/R/library.project7R” we can now use Renv with the “-l library.project7R” command-line argument.

In scripts:

#!/usr/bin/Renv-3.3.1 -l library.project7R

or

#!/usr/bin/Renv -l library.project7R R-3.3.1

or

#!/usr/bin/Renv -l library.project7R Rscript-3.3.1

Running scripts from the command-line

Renv-3.3.1 -l library.project7R path/to/script

or

Renv -l library.project7R R-3.3.1 path/to/script

or

Renv -l library.project7R Rscript-3.3.1 path/to/script

IMPORTANT
Running a script as an argument on the command-line instead of executing it directly will cause the invocation line at the top of it to be ignored. This may be exactly what you desire in some cases and not others. Caveat emptor.

Launching an interactive R session in the virtual environment from the command-line:

Renv-3.3.1 -l library.project7R

or

Renv -l library.project7R R-3.3.1

As discussed in earlier sections, if you wish to then override the “library.project7R” directory that was populated via RPMs and picked up via Renv, you need just create a “library.project7R” directory in same directory as scripts requiring it and Renv will still pick up the RPM based one but allow you to override it locally.

If changes are needed, they can be tested in a local override, merged back to the “library.project7R” directory by-way of the above pipeline.


BONUS: .Renv profile

Renv will automatically source a file named “.Renv” if it exists in the same directory as a script.

The “-p file” argument to Renv can override this behavior to source a specific file instead.

For example, on the invocation line of a script:

#!/usr/bin/Renv-3.3.1 -p .Renv-project7

or on the command-line:

Renv-3.3.1 -p .Renv-project7 path/to/script [args ...]

This allows you to centralize project-specific R commands to be utilized by virtual environments.


EXTRA: Variables

Renv creates a few variables for you:

Renv.file.path The full path to your script (such as “/home/dteske/foo.Rscript“)
Renv.file.name The basename of your script (such as “foo.Rscript“)
Renv.profile.path If found, the full path to Renv profile script (usually “.Renv” unless given “-p file“)
Renv.profile.name If found, the name of the Renv profile script (such as “.Renv“)