R was the 27th computer programming language I learned. When I learned Python, virtual environments did not exist. Pip (the eponymous circular acronym for the “pip installs python” utility) likewise did not exist. So it can be said that I learned R before I learned virtual environments, but not that I learned R before Python. That being said, we are poised to dive into my story of how I developed a custom solution for R virtual environments built upon my experience with 26 different programming languages over a couple dozen years of programming experience.
Requirements
From my current understanding, a virtual environment is akin to a project directory.
This means a collection of:
Various scripts* collected into a single top-level directory (TLD) defining a project
All dependencies required by said scripts and utilities** needed by the project
* The term scripts is used here because we are referring to interpreted language. For compiled language, I will use the term binary.
** The term utilities is used here synonymously with executables to refer to executable scripts or binaries (but not libraries, for example).
The immediate question that comes to mind if you are going to call something into existence worth using is, “what are the different ways one can bootstrap a virtual environment given the above definition?”
To answer this question, I asked another question: “what features do we want/need to support at a high level?”
Basic requirements (Introducing Renv):
Support truly executable scripts that do not have to be invoked as an argument to a wrapper/interpreter
Move scripts without modification to another project directory for testing
Have the project directory live on NFS/DFS so it can be accessed cluster-wide with ease via SGE
Library selection:
Provide seamless migration of project dependencies to be provided locally instead of NFS
Allow virtual environment dependencies to supersede localized CI/CD pipelined dependencies
Interpreter selection:
Be able to use different versions of the desired interpreter (R-3.1.1 versus R-3.3.1 versus R-3.5.2 for example)
Architecture migration:
Make it easy to convert library dependencies compiled on one architecture to another
Enable cross-platform testing
Produce the means to allow heterogenous cluster nodes between architectures allowing smooth/rolling migrations
Pipelining:
Prevent build errors, simple mistakes, and adhoc pathname dependencies in scripts
Does not require devtools to build the virtual environment (support R versions older than 3.0.2)
That is a tall order and I had to think long and hard about how to achieve all this. Supporting R documentation, talking with co-workers, and individual research led me to draw some conclusions about how one should not approach the problem.
From R development standards and well-known practices:
IMPORTANT
Using source() instead of library() creates pathname dependencies in scripts that cannot be managed by manipulating the R session.
A virtual environment modifies where library() pulls libraries from the filesystem (the so-called R session’s library path).
However, requiring an external script to generate the library path means we cannot support truly executable scripts.
At the same time, you would want to avoid the following for the above-stated reasons:
.libPaths(c("library", .libPaths()))
While that will pick up the “library” directory in the current working directory (CWD) when invoked directly, it relies on the script being invoked as “./scriptname.R“. Attempting to run it via full path or another directory, for example when executing via cron, you may not pick up the library directory or worse, pick up the wrong library directory (for whatever value of “wrong” you may wish to define).
This behavior puts the same pathname dependency in your executable script as using source() and should be avoided to properly implement flexible virtual environments (at least, if we want to have a chance at supporting that lofty list of wants/needs above).
The solution that I arrived at was a new R interpreter. One that will:
Automatically bootstrap your virtual environment before invoking your script
Not rely on environment variables
Support interpreter indirection
Be easily overridden by invoking the interpreter directly or using a wrapper
Be invoked interactively for testing in the virtual environment
Supports all of the above previously-stated requirements
Eliminates path dependencies from scripts as required by common R development standards and practices
Then there is the little matter of a name for this new interpreter.
Introducing Renv
If you are familiar with /usr/bin/env as a wrapper for accessing your desired interpreter (be it sh, bash, R, python, awk, perl or other) you should know how to use Renv.
Instead of using an invocation line like below (suggested by standard R development practices and tutorials):
#!/usr/bin/env Rscript
Use the following for a script that wants to utilize virtual environment features:
#!/usr/bin/Renv Rscript
INFORMATION
Not all systems have “Rscript”. A quick way to check for what you can use is to type “Rscript” (without quotes) and then TAB twice to invoke command-line auto-completion to suggest all the possible completions, such as Rscript-3.1.1, Rscript-3.3.1, etc.
CAUTION
This is the foot of the rabbit hole. It’s a lot deeper than you think. Don’t let the simplistic addition of a single character here (changing /usr/bin/env to /usr/bin/Renv) fool you. Things are about to get much deeper from here onward.
This tells Renv to:
Look for a “library” directory in the same directory the script lives in (irrespective of CWD)
Launch Rscript with a modified .libPaths() if the “library” directory exists
But that’s not all.
From the above requirements, we now have:
Support truly executable scripts that do not have to be invoked as an argument to a wrapper/interpreter
The invocation line can bootstrap the execution environment
The invocation line is treated as a comment when script passed as argument to wrapper/interpreter
Move scripts without modification to another project directory for testing
Moving the script to another project directory will have it load the library contents from the new project
Have the project directory live on NFS/DFS so it can be accessed cluster-wide with ease via SGE or other scheduling software
Where the project directory lives (e.g., where the NFS mount is attached) is immaterial
Library Selection
Provide seamless migration of project dependencies to be provided locally instead of NFS
Allow virtual environment dependencies to supersede localized CI/CD pipelined dependencies
A project directory living on NFS/DFS is doomed to sacrifice performance when the I/O pipeline becomes saturated at either the sending or receiving end. One cannot rely on caching to save you from slow access times for executable code because cache elements are commonly evicted to make room for user-land memory pressure. While it makes sense to use NFS to satisfy project dependencies at small scale, at larger scale you end up reloading dependencies in high-memory conditions because you can’t hold a filesystem cache. We call this “living on the wire” and it puts tremendous load on the sync-store (your NAS, SAN, REST, or whatever) backend. We commonly see loads of over 500 (10x higher than desired) for system load averages on the NAS providing access to user home directories.
The problem therein becomes how to package virtual environment contents for distribution to the local disk. However, once you solve that, you still need a way of seamlessly accessing the newly deployed localized binaries.
Renv can check for more than just a “library” directory and it checks in more than one place.
Previously we saw how Renv (without any command-line options) will look for a “library” directory in the same directory as the script. In that scenario, if a script were to examine .libPaths(), one would get something similar to:
Inside the project directory (aka virtual environment)
Inside the R distribution
Base R library
When you say:
library(name)
It will look first in the library.project7 directory in the script’s directory, then in the library.project7 directory that is distributed via the Continuous Integration (CI) and Continuous Deployment (CD) pipeline, and lastly in the base R library collection.
Moving the project library directory (library.project7 in this scenario) aside or using a full pathname to the library directory (example below) will cause only the localized (non-NFS, non-DFS) libraries to be loaded:
However, using a full path precludes you from potentially augmenting or overriding the pipelined binaries, should the need arise. Renv intelligently adjusts .libPaths() based on the existence of multiple “-l dir” option-arguments which can be relative or absolute.
For large projects using virtual environments that want to separate scripts into separate directories, such as “bin“, you can use paths relative to the directory the script lives in without fear that the current working directory will effect the loading of the proper library collections.
#!/usr/bin/Renv -l ../library.project7 Rscript
Inspecting .libPaths() inside a script launched in this scenario (if all the below directories exist) would be:
INFORMATION
Despite the “../” prefix on the relative library reference, Renv knew to add a “library.project7” directory to “.libPaths()” if it existed in the same directory as the base R library.
If you choose to name your library directory “library” then pipelining the executables into RPM will require putting them into the base R library collection – which means stricter guidelines to prevent regressions against scripts running in the global context outside of virtual environments.
Renv orders the directories that it finds specifically to enable the ability to slip RPM-distributed binaries into the search path, but allow forward progression through divergence at the upper layer. By allowing the specified library at the project-level to override everything, a project maintainer can make changes to the library, test code, and then – when it comes time – feed a snapshot of the virtual environment into the RPM pipeline for automated integration and deployment, bringing the RPM layer in-sync with the project layer.
Interpreter Selection
Be able to use different versions of the desired interpreter (R-3.1.1 versus R-3.3.1 versus R-3.5.2 for example)
Just as /usr/bin/env allows you to select the interpreter based on the $PATH environment variable, Renv allows this and more for your R scripts.
A few key features:
You can use R or Rscript (including R-<version> and Rscript-<version>) interchangeably
You can create symbolic links to Renv to make referencing specific versions of R more convenient
When you install any of the R*-fraubsd RPMs, such as R331-fraubsd, a symbolic link is created:
/usr/bin/Renv-3.3.1 -> Renv
When invoked with the name Renv-3.3.1, Renv will default to using R-3.3.1 in the scope of $PATH to interpret your script. This allows you to use as an invocation line:
#!/usr/bin/Renv-3.3.1
or
#!/usr/bin/Renv-3.3.1 -l library.project7
On the command-line, you can invoke Renv interactively.
Renv-3.3.1
Without the aid of the symlink:
Renv R-3.3.1
This method also works with direct references to NFS based R distributions.
Renv /share/bin/R-3.3.1
IMPORTANT
This should be avoided unless there is a specific reason to avoid $PATH expansion.
If changes are required to the way virtual environments are utilized, Renv is a great tool for centralizing that logic as an intermediary between the script and interpreter.
Architecture Migration
Make it easy to convert library dependencies compiled on one architecture to another
Enable cross-platform testing
Produce the means to allow heterogenous cluster nodes between architectures allowing smooth/rolling migrations
Renv cannot help with these items. For that, we must look to the Virtual Comprehensive R (VCR) utility.
Taking an existing library directory that was compiled on one platform (say, CentOS 6) and trying to use it on another (say, CentOS 7) is usually difficult, but vcr aims to make it as painless, fast, and easy as possible.
Given a particular virtual environment library, the following command will record a “playlist”.
vcr list
If there is a “library” dir in the current working directory, this will produce a list of libraries required to build that environment.
If you need to list a different directory:
vcr list path/to/library
You will want to capture the output into a text file. While not strictly necessary, if problems occur, it is good to be able to modify the output with a text editor before trying again (for example, if software has changed location).
vcr list > reqs.txt
IMPORTANT
Use a more descriptive name than “reqs.txt”
You can then play this list on the target architecture (say, CentOS 7).
vcr play -d library.newarch reqs.txt
IMPORTANT
Suggest using something like “el6” or “el7” instead of “newarch”
If your library only contains items from CRAN, it will go smoothly. In such cases, you could even tie the process into a single command:
vcr list | vcr play -d library.new /dev/stdin
IMPORTANT
If your library collection contains libraries that come from GitHub or other non-CRAN source and you did not build the library directory using vcr, you will have to modify the output from “vcr list”. Commands such as “vcr install” which takes a URL to a CRAN-looking package record the URL provided so that a later “vcr list” will accurately describe where the software comes from.
A script that is using Renv to access a “library” directory might have an invocation line of:
#!/usr/bin/Renv-3.3.1
To make this script load the “library.new” directory containing CentOS 7 binaries instead of the “library” directory containing CentOS 6 binaries (for example):
Renv-3.3.1 -l library.new script [args]
This enables efficient cross-platform testing, but is not designed to be the end of the road. Allowing heterogeneous operation wherein your scripts can function properly on both CentOS 6 and 7 requires the ability to access a library that is architecture specific without the need to reference two different directory names.
That is why Renv checks for a named directory in the base R directory. In the absence of a “library.xxx” directory from such an invocation line in a script:
#!/usr/bin/Renv-3.3.1 -l library.xxx
When your script runs it will pick up will either be:
/opt/R/3.3.1/lib64/R/library.xxx
or
/software/R-3.3.1/lib64/R/library.xxx
Depending on what “R-3.3.1” evaluates (be it /usr/bin/R-3.3.1 symlink pointed to /opt/R/3.3.1/bin/R or perhaps instead /software/bin/R-3.3.1 symlinked to /software/R-3.3.1/bin/R)
IMPORTANT
The ability to pass an absolute path for the library location to Renv allows you to ignore a local directory in the virtual environment that may be the incorrect architecture due to NFS wherein, for example, a script may be running on CentOS 7 but the library directory was compiled on CentOS 6.
In this scenario when a script hosted on NFS run on various platforms will pick up the right binaries loaded onto the local disk. This not only solves the issues of targeting platform specific binaries but also improves performance by reducing reliance on NFS.
Pipelining
Prevent build errors, simple mistakes, and adhoc pathname dependencies in scripts
Does not require devtools to build the virtual environment (support R versions older than 3.0.2)
To pipeline a virtual environment so that you get CentOS 6/7 specific binaries in /opt/R/<version>/lib64/R/library.<name> (respectively) is not hard.
Run:
vcr play -d library.preflight project_reqs.txt
Where “project_reqs.txt” is the output of “vcr list” on the library you want pipelined and “library.preflight” is just a temporary directory name that we will later delete after testing.
IMPORTANT
Make sure library.preflight doesn’t already exist at the start of your preflight test.
When you can successfully complete a preflight run of your project requirements, you can proceed to pipelining and automation.
The “project_reqs.txt” file should be stored in git so that Jenkins can automate integration efforts whenever the file is modified. You should also create an automation user (for example, named “rbot“) for all R-related pipelining tasks.
You should check the requirements file into a git repository where a “git tag” operation can version the reqs file against the code that relies on it. For example, if you are pipelining a virtual environment for code that lives in the foobar git repository, the requirements file should live there.
As in all good things, the most important thing may be the naming of your library collection.
Your reqs file, for the purpose of pipelining, should be named:
*_<r-version>.lock
WARNING
The “.lock” suffix is required by the pipeline. The “<r-version>” value should be “3.3.1” for example.
The prefix (what comes before “_<r-version>.lock” does not dictate the name of the library collection in “/opt/R/<r-version>/lib64/R/” nor does it dictate the prefix of RPMs produced by the pipeline.
The naming of the RPMs is governed in one of a couple different ways.
However, before we talk about naming, I need to talk about the pipeline framework.
The “pkgcenter-newR” git repository is a skeleton repo that exists as a template for each individual pipeline. Each virtual environment deployment pipeline should be a copy of this git repository at the onset. https://FrauBSD.org/pkgcenter-newR
IMPORTANT
Don’t fork the repo unless you intend to upstream changes to the framework that affect future deployments from the pkgcenter-newR skeleton repo. Instead, use the below process for copying the framework into a new repo.
First create the new repo. Here is where the naming comes into play discussed earlier.
By default (this behavior can be overridden) the name of the repository minus the “.git” suffix and “pkgcenter-” prefix becomes the name of RPMs produced. That is to say that the git repository named “pkgcenter-newR” will generate RPMs named “newR-*.rpm” that installs R libraries to “/opt/R/<version>/lib64/R/library.newR“.
Let’s say we want to package a virtual environment for a project named “project7” which will have RPMs named project7R* (e.g., project7R311-*.rpm, project7R331-*.rpm, etc.)
IMPORTANT
It is highly recommend that aside from making a new repo with name beginning with “pkgcenter-“, the name should end with “R” because the R version will be appended to the repo name. A git repo named “pkgcenter-project7R” will produce RPMs named “project7R331-*” for R-3.3.1 compiles; “project7R311-*” for R-3.1.1 compiles; etc.
You can override the default behavior of taking the git repository name as the prefix for RPMs. When we copy the framework into t he new git repo, we will configure the build.conf file which has an option for hard-coding the prefix of RPM names.
Steps to copy the pkgcenter-newR build framework into a new git repository named pkgcenter-project7R:
IMPORTANT
Always put a trailing-slash (/) after each pathname (in rsync parlance the trailing-slash should be pronounced “contents-of” to remind you that if you forget it, you may end up with a copy of the src inside the dest argument (and if you are using the --delete flag with rsync, this could be fatal or at least potentially hazardous to your health).
Inside the ./depend/jenkins directory is a file named build.conf which contains all the details for building RPMs in an automated fashion from the output of “vcr list” (which is a pip-like “.lock” file).
For our mythical “project7” secret project, we need to put together some libraries.
Our secret project depends on the following:
bender version 0.1.0
blender (latest version)
catboost (latest version)
The first two are available from CRAN and the last is available from GitHub.
Using the previously discussed techniques for managing a virtual environment library, let’s get ready for a preflight test. Before reaching preflight you need a runway which means installing the software for an initial test.
INFORMATION blender requires vegan which requires permute. The latest version of vegan requires a newer version of R, so we specify the last version that works with R-3.3.1 (just one minor revision older than the latest version). The vcr utility does not automatically solve dependencies for you. You must install dependencies first. Above, you see that we have provided permute and vegan on the same line before blender.
IMPORTANT
While vcr can take any URL, the name of the tarball file should be in CRAN format (<name>_<version>.tar.gz). This is important for pipelining into binaries so that the CI/CD pipeline can determine the version of the package from the filename during the build process.
The filename will not work with the CI/CD pipeline which expects CRAN-like names.
Download the tarball, change the name to be in the format of “<name>_<version>tar.gz” and upload it to Artifactory or similar HTTP-based yum repository.
$ vcr-3.3.1 install bender==0.1.0
==> Download
curl -Lo /home/dteske/vcran/bender_0.1.0.tar.gz https://cran.r-project.org/web/packages/bender/../../../src/contrib/bender_0.1.0.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 992 0 992 0 0 426 0 --:--:-- 0:00:02 --:--:-- 426
curl -Lo /home/dteske/vcran/bender_0.1.0.tar.gz https://cran.r-project.org/web/packages/bender/../../../src/contrib/Archive/bender/bender_0.1.0.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3751 100 3751 0 0 4600 0 --:--:-- --:--:-- --:--:-- 4596
==> Check dependencies
command R-3.3.1 --slave --no-restore -e 'cat(.libPaths(.Library))'
tar zxfO /home/dteske/vcran/bender_0.1.0.tar.gz bender/DESCRIPTION
All good
==> bender [1/1]
R-3.3.1 --slave --no-restore --args nextArg--no-test-loadnextArg-lnextArglibrarynextArg/home/dteske/vcran/bender_0.1.0.tar.gz
* installing *source* package ‘bender’ ...
** package ‘bender’ successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
* DONE (bender)
==> SUCCESS
$ vcr-3.3.1 install permute vegan==2.5-4 blender
==> Download
curl -sLo- https://cran.r-project.org/web/packages/permute/index.html
curl -Lo /home/dteske/vcran/permute_0.9-5.tar.gz https://cran.r-project.org/web/packages/permute/../../../src/contrib/permute_0.9-5.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 390k 100 390k 0 0 245k 0 0:00:01 0:00:01 --:--:-- 245k
curl -Lo /home/dteske/vcran/vegan_2.5-4.tar.gz https://cran.r-project.org/web/packages/vegan/../../../src/contrib/vegan_2.5-4.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 992 0 992 0 0 1228 0 --:--:-- --:--:-- --:--:-- 1227
curl -Lo /home/dteske/vcran/vegan_2.5-4.tar.gz https://cran.r-project.org/web/packages/vegan/../../../src/contrib/Archive/vegan/vegan_2.5-4.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1989k 100 1989k 0 0 1022k 0 0:00:01 0:00:01 --:--:-- 1022k
curl -sLo- https://cran.r-project.org/web/packages/blender/index.html
curl -Lo /home/dteske/vcran/blender_0.1.2.tar.gz https://cran.r-project.org/web/packages/blender/../../../src/contrib/blender_0.1.2.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1300k 100 1300k 0 0 700k 0 0:00:01 0:00:01 --:--:-- 700k
==> Check dependencies
command R-3.3.1 --slave --no-restore -e 'cat(.libPaths(.Library))'
tar zxfO /home/dteske/vcran/permute_0.9-5.tar.gz permute/DESCRIPTION
tar zxfO /home/dteske/vcran/vegan_2.5-4.tar.gz vegan/DESCRIPTION
tar zxfO /home/dteske/vcran/blender_0.1.2.tar.gz blender/DESCRIPTION
All good
==> permute [1/3]
R-3.3.1 --slave --no-restore --args nextArg--no-test-loadnextArg-lnextArglibrarynextArg/home/dteske/vcran/permute_0.9-5.tar.gz
* installing *source* package ‘permute’ ...
** package ‘permute’ successfully unpacked and MD5 sums checked
** R
** data
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
* DONE (permute)
==> vegan [2/3]
R-3.3.1 --slave --no-restore --args nextArg--no-test-loadnextArg-lnextArglibrarynextArg/home/dteske/vcran/vegan_2.5-4.tar.gz
* installing *source* package ‘vegan’ ...
** package ‘vegan’ successfully unpacked and MD5 sums checked
** libs
gcc -std=gnu99 -I/opt/R/3.3.1/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c data2hill.c -o data2hill.o
gfortran -fpic -g -O2 -c decorana.f -o decorana.o
gcc -std=gnu99 -I/opt/R/3.3.1/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c getF.c -o getF.o
gcc -std=gnu99 -I/opt/R/3.3.1/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c goffactor.c -o goffactor.o
gcc -std=gnu99 -I/opt/R/3.3.1/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c init.c -o init.o
gfortran -fpic -g -O2 -c monoMDS.f -o monoMDS.o
gcc -std=gnu99 -I/opt/R/3.3.1/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c nestedness.c -o nestedness.o
gfortran -fpic -g -O2 -c ordering.f -o ordering.o
gcc -std=gnu99 -I/opt/R/3.3.1/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c pnpoly.c -o pnpoly.o
gcc -std=gnu99 -I/opt/R/3.3.1/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c stepacross.c -o stepacross.o
gcc -std=gnu99 -I/opt/R/3.3.1/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c vegdist.c -o vegdist.o
gcc -std=gnu99 -shared -L/opt/R/3.3.1/lib64/R/lib -L/usr/local/lib64 -o vegan.so data2hill.o decorana.o getF.o goffactor.o init.o monoMDS.o nestedness.o ordering.o pnpoly.o stepacross.o vegdist.o -L/opt/R/3.3.1/lib64/R/lib -lRlapack -L/opt/R/3.3.1/lib64/R/lib -lRblas -lgfortran -lm -lquadmath -lgfortran -lm -lquadmath -L/opt/R/3.3.1/lib64/R/lib -lR
installing to /data/homes/raidb/home/dteske/src/github/fraubsd/pkgcenter-project7R/library/vegan/libs
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
* DONE (vegan)
==> blender [3/3]
R-3.3.1 --slave --no-restore --args nextArg--no-test-loadnextArg-lnextArglibrarynextArg/home/dteske/vcran/blender_0.1.2.tar.gz
* installing *source* package ‘blender’ ...
** package ‘blender’ successfully unpacked and MD5 sums checked
** R
** data
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
* DONE (blender)
==> SUCCESS
$ vcr-3.3.1 install https://github.com/catboost/catboost/releases/download/v0.15.2/catboost-R-Linux-0.15.2.tgz
==> Download
curl -Lo /home/dteske/vcran/catboost-R-Linux-0.15.2.tgz https://github.com/catboost/catboost/releases/download/v0.15.2/catboost-R-Linux-0.15.2.tgz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 617 0 617 0 0 1485 0 --:--:-- --:--:-- --:--:-- 1483
100 57.5M 100 57.5M 0 0 5384k 0 0:00:10 0:00:10 --:--:-- 6028k
tar tzf /home/dteske/vcran/catboost-R-Linux-0.15.2.tgz
==> Check dependencies
command R-3.3.1 --slave --no-restore -e 'cat(.libPaths(.Library))'
tar zxfO /home/dteske/vcran/catboost-R-Linux-0.15.2.tgz catboost/DESCRIPTION
All good
==> catboost [1/1]
R-3.3.1 --slave --no-restore --args nextArg--no-test-loadnextArg-lnextArglibrarynextArg/home/dteske/vcran/catboost-R-Linux-0.15.2.tgz
* installing *source* package ‘catboost’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
* DONE (catboost)
==> SUCCESS
Upon completion, we should be able to generate the requirements file and then toss the test library.
vcr list | tee project7_3.3.1.lock
rm -Rf library
$ vcr list | tee project7_3.3.1.lock
bender==0.1.0
permute==0.9-5
vegan==2.5-4
blender==0.1.2
-u https://github.com/catboost/catboost/releases/download/v0.15.2/catboost-R-Linux-0.15.2.tgz
$ rm -Rf library
It is recommended you then stash the secret project7_3.3.1.lock file over in ~/src/configs/proj/r_reqs/ (potentially part of a git repo):
Configure depend/jenkins/build.conf with details such as (actual excerpt; all defaults):
#
# RPM name prefix
#
# The default behavior is to take the repo name and remove `pkgcenter-'.
# A value of `R' for example will produce:
# R311-{vcran,gsl,etc.}
# R331-{vcran,catboost,etc.}
# etc.
#
# You can override the prefix and hard-code a value by setting RPMPREFIX.
#
#?RPMPREFIX=R
#
# Location of checked-out config repo
# NB: If unset or NULL, download.sh cannot `git pull' to update the repo
# NB: Ignored/Unused if download.sh is given `-n' to prevent `git pull'
#
CONFIG_REPO=~/src/configs
#
# Git remote settings for cloning config repo (if required)
# NB: If $CONFIG_REPO dir does not exist, download.sh will clone $CONFIG_REMOTE
# NB: If unset or NULL, download.sh will error out unless either `-n' given
# (to prevent `git pull' in $CONFIG_REPO) or $CONFIG_REPO exists
#
CONFIG_REMOTE=git@github.com:SomeProj/configs.git
#
# Location of lock files within config repo
# NB: Only used in this file
#
R_REQS=proj/r_reqs
Note the potential override to prevent the git repository name from influencing RPM names, should that be desired.
For each target distribution of R, you will need a section like (also in depend/jenkins/build.conf; actual excerpt; all defaults; changed in following steps):
The default build.conf contains a section for R-3.1.1 (shown above) and R-3.3.1.
IMPORTANT
We are only going to automate a single build (for R-3.3.1) in this demo, and the builds are sequentially numbered (BUILD1, BUILD2, etc.), so only the first section should be used and others following should be deleted.
IMPORTANT
You must make “BUILD<n>_NAME” and “BUILD<n>_PATH” end in “_<r-version>.lock” where “<r-version>” matches the R version (for example, “3.3.1” or “3.1.1” or other).
Above is the default state of the file, but for our super secret skunk-works project, “project7”, we will be changing BUILD1_NAME to “project7_3.3.1.lock” for the R-3.3.1 build.
I am not going to actually commit this file to any repo, but I faked it. Placing the file where it should be but not actually committing it, the framework can do everything you expect it to without prematurely polluting the config repo.
Earlier (above), we moved (but did not yet “git add”) the “project7_3.3.1.lock” file to ~/src/configs/proj/r_reqs/. If all goes well we can add/commit/push the file as a sign of reaching the first pipelined version for the virtual environment.
All there is left to do to turn your “.lock” file into RPMs is (in the ./depend/jenkins/ directory of your “pkgcenter-project7R” git repository):
vi ~/.netrc
# Configure credentials for artifactory as-documented in build.conf
./build_fraubsd.sh -a
If your Artifactory credentials are correct, you will be able to then turn around say:
Notice how the URL elements of the “.lock” file were split out into their own RPMs. This is because GitHub projects and CRAN elements are released in separate cycles independently from each other and as-such are pipelined separately to help maintain that relationship.
If for some reason your Artifactory credentials are not working, you can expect to find local copies of the RPMs in your home directory under ~/jenkins/.
Notice how our pipeline only produced rhel6-x86_64 RPMs. To get the rhel7-x86_64 RPMs you have to compile on CentOS 7.
There is a one-time task at the start of each project, which is to tie the vcran package and ancillary external packages together using a meta package. That is covered below. in the next section.
Combining
The above process creates the RPMs but you can optionally, for convenience, combine a “meta-RPM” which installs them for you.
Following the above examples, for our secret “project7” virtual environment that we pipelined into RPMs, we got 2 RPMs:
project7R331-vcran-1.0-1.x86_64.rpm
Contains all the CRAN libraries (bender, permute, vegan, blender)
project7R331-catboost-0.15.1-1.x86_64.rpm
This comes from GitHub and we rename the tarball and put it in Artifactory
We want to create a “project7R331-fraubsd” RPM that installs the above two RPMs.
In the pkgcenter-project7R repository, starting from the TLD of the git repo:
cd redhat/rhel6-x86_64
../create.sh Applications/Engineering/project7R331-fraubsd
cd Applications/Engineering/project7R331-fraubsd
$ cd redhat/rhel6-x86_64
$ ../create.sh Applications/Engineering/project7R331-fraubsd
===> Creating `Applications/Engineering/project7R331-fraubsd'
Package NAME: project7R331-fraubsd
Package GROUP: Applications/Engineering
Creating package repository directory: Applications/Engineering/project7R331-fraubsd
Creating package `src' directory...
Copying `skel' structure into package repository...
./
./Makefile
./pkgcenter.conf
Generating `Applications/Engineering/project7R331-fraubsd/SPECFILE' from `../Mk/template.spec'...
Done.
$ cd !$
$ cd Applications/Engineering/project7R331-fraubsd
Edit the SPECFILE
Change this line (fix “First Last” and “flast@“) toward the top in the “HEADER” section:
Packager: First Last <flast@fraubsd.org>
Still in the “HEADER” section, in the blank space after:
BuildRoot: %{_tmppath}/src
but before the next section (“CONFIGURATION”), add:
IMPORTANT
Don’t forget to add “Requires: R331-fraubsd” (for R-3.3.1). It provides “/opt/R/3.3.1“, all the CRAN libraries at the specific versions required, “/usr/bin/R-3.3.1“, and “/usr/bin/Rscript-3.3.1“. It also brings in Renv for you.
ASIDE: You can use “vcr diff /share/R-3.3.1 /opt/R/3.3.1” to confirm that /opt/R/3.3.1 is a properly localized version of a shared or other version of R at, in example, /share/R-3.3.1/.
Change the line (fix “First Last” and “flast@“) at the bottom after “%changelog“:
* Sun Jul 14 2019 First Last <flast@fraubsd.org> 1.0-1
Save the SPECFILE and exit your editor.
Remove the “src” directory because this is a meta package and we don’t need it.
rmdir src
Make sure the “src” directory stays deleted by editing pkgcenter.conf and adding it to the list of directories which are cleaned when you, or a script, says “make distclean“:
#
# Directories to create before (and clean up after) creating the package.
# NOTE: Be careful to list sub-directories in depth-first order.
#
DIRS="
# Directory
$SRCDIR
"
Do a test compile of our meta RPM before you upload, commit, tag, and push your changes.
You should now have a file named “project7R331-fraubsd-1.0-1.noarch.rpm” in the current working directory. This is usually enough to tell me that I did not make any mistakes and that I can proceed to uploading everything.
mv project7R331-fraubsd-1.0-1.noarch.rpm ~/jenkins/rhel6-x86_64/
make distclean
ls # Pedantic
make autoimport
git push origin master
$ mv project7R331-fraubsd-1.0-1.noarch.rpm ~/jenkins/rhel6-x86_64/
$ make distclean
rm -f .rpmmacros
rm -Rf tmp
Reading SYMLINKS from ./pkgcenter.conf...
Reading DIRS from ./pkgcenter.conf...
rmdir ./src
rm -f .dirs_created
rm -f *.rpm
$ ls
Makefile SPECFILE pkgcenter.conf
$ make autoimport
Copying dependencies...
mkdir -p ./src
Reading DEPEND from ./pkgcenter.conf...
make[1]: Entering directory `/home/dteske/src/github/fraubsd/pkgcenter-project7R/redhat/rhel6-x86_64/Applications/Engineering/project7R331-fraubsd'
rm -f .rpmmacros
rm -Rf tmp
Reading SYMLINKS from ./pkgcenter.conf...
Reading DIRS from ./pkgcenter.conf...
rmdir ./src
rm -f .dirs_created
make[1]: Leaving directory `/home/dteske/src/github/fraubsd/pkgcenter-project7R/redhat/rhel6-x86_64/Applications/Engineering/project7R331-fraubsd'
make[1]: Entering directory `/home/dteske/src/github/fraubsd/pkgcenter-project7R/redhat/rhel6-x86_64/Applications/Engineering/project7R331-fraubsd'
rm -f .rpmmacros
rm -Rf tmp
Reading SYMLINKS from ./pkgcenter.conf...
Reading DIRS from ./pkgcenter.conf...
rm -f .dirs_created
rm -f *.rpm
make[1]: Leaving directory `/home/dteske/src/github/fraubsd/pkgcenter-project7R/redhat/rhel6-x86_64/Applications/Engineering/project7R331-fraubsd'
git add -v .
Commit message:
Autoimport by dteske on r-dev.lan
git commit -m "$MESSAGE"
[master 12f10a3] Autoimport by dteske on r-dev.lan
Committer: Devin Teske <dteske@r-dev.lan>
Your name and email address were configured automatically based
on your username and hostname. Please check that they are accurate.
You can suppress this message by setting them explicitly:
git config --global user.name "Your Name"
git config --global user.email you@example.com
After doing this, you may fix the identity used for this commit with:
git commit --amend --reset-author
3 files changed, 343 insertions(+)
create mode 100644 redhat/rhel6-x86_64/Applications/Engineering/project7R331-fraubsd/Makefile
create mode 100644 redhat/rhel6-x86_64/Applications/Engineering/project7R331-fraubsd/SPECFILE
create mode 100644 redhat/rhel6-x86_64/Applications/Engineering/project7R331-fraubsd/pkgcenter.conf
$ git push origin master
Counting objects: 14, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (9/9), 2.15 KiB | 0 bytes/s, done.
Total 9 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
To git@github.com:FrauBSD/pkgcenter-project7R.git
1e4f530..563f421 master -> master
IMPORTANT
If you get the warnings about “Your name and email address were configured automatically” it means you haven’t run the “.git-hooks/install.sh” script at the top of the git repo which was copied-in during the rsync of pkgcenter-newR skeleton repo into your new repo.
The project7R331-fraubsd RPM we created does not have versioned “Requires” statements in the SPECFILE so as changes are made to the “.lock” file (“vcr list” output for your virtual environment), “yum upgrade” will continue to upgrade the various components that are already installed. Only if you add a new external (non-CRAN) library (one that uses “-u URL” in the “.lock” file) will result in a requirement to modify the “*-fraubsd” RPM SPECFILE, adding the new dependency.
Conclusion
With our “project7” RPMs installed and providing the directory “/opt/R/3.3.1/lib64/R/library.project7R” we can now use Renv with the “-l library.project7R” command-line argument.
IMPORTANT
Running a script as an argument on the command-line instead of executing it directly will cause the invocation line at the top of it to be ignored. This may be exactly what you desire in some cases and not others. Caveat emptor.
Launching an interactive R session in the virtual environment from the command-line:
Renv-3.3.1 -l library.project7R
$ Renv-3.3.1 -l library.project7R
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(blender)
Loading required package: vegan
Loading required package: permute
Loading required package: lattice
This is vegan 2.5-4
>
or
Renv -l library.project7R R-3.3.1
$ Renv -l library.project7R R-3.3.1
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(blender)
Loading required package: vegan
Loading required package: permute
Loading required package: lattice
This is vegan 2.5-4
>
As discussed in earlier sections, if you wish to then override the “library.project7R” directory that was populated via RPMs and picked up via Renv, you need just create a “library.project7R” directory in same directory as scripts requiring it and Renv will still pick up the RPM based one but allow you to override it locally.
If changes are needed, they can be tested in a local override, merged back to the “library.project7R” directory by-way of the above pipeline.
BONUS: .Renv profile
Renv will automatically source a file named “.Renv” if it exists in the same directory as a script.
The “-p file” argument to Renv can override this behavior to source a specific file instead.