Rstats/3.3 and Rstats/3.4 updates: dealing with OpenMPI and Infiniband library concerns.

Dear CRMDA cluster users

During the past 2 months, some of us have seen the MPI warning from parallel R programs:

An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

We have wrestled with this. Today I've made a decision what to do. The CRMDA modules for Rstats/3.3 and 3.4 will prevent the OpenMPI (parallel computing) framework from trying to access the Infiniband network devices. That makes the warning go away. Because the ethernet communication devices are slower than Infiniband, this is not a decision taken lightly.

The CRMDA R module stanza should "just work", either

module purge 
module load legacy 
module load emacs 
module use /panfs/pfs.local/work/crmda/tools/modules 
module load Rstats/3.3

or

module purge 
module load legacy 
module load emacs 
module use /panfs/pfs.local/work/crmda/tools/modules 
module load Rstats/3.4

How is this done?

I've rebuilt openmpi-1.10.7, which is also now in our module collection, so I have power to insert the special configuration described below.

R Packages

The packages list that is kept up to date, system-wide, is the same in Rstats-3.3 or Rstats-3.4. A full list is included at the end of this announcement.

If you find that updates cause your applications to break, it is allowed for users to install old versions of R packages in ~/R.

Details about the OpenMPI/openib warning message.

Embarrassingly, while googling for help on this message, I've discovered that, in 2010, I was in exact same situation setting up the CRMDA cluster that used to be in the Structural Biology Center. It had completely gone out of my mind, but with the new cluster in 2017 and fresh installs of OpenMPI, we hit the problem again.

Here is what I've learned about OpenMPI and Rmpi during the past 2 weeks.

I don't understand computer science enough to understand fully the dangers of forks and data corruption when OpenMPI uses infiniband. However, perhaps one of you can tell me.

  1. Rmpi will compile with OpenMPI >= 2.0, but it is not fully compatible. The Rmpi author has written to me directly that he is working on revisions that will make these compatible. One symptom of the problem we find is that stopCluster() does not work. It hangs the session entirely. The only way to shut down the cluster is mpi.quit(), which terminates the R session entirely.

  2. Rmpi will compile/run with OpenMPI < 2.0.

However, on systems that have Infiniband connective devices and openib libraries, there will be warnings about threads and forks as well as a danger of data corruption. The warning from OpenMPI is triggered by such innocuous R functions as sessionInfo().

Here is a session that shows the warning, using R-3.4 in the cluster.

$ R

R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Microsoft R Open 3.4.0
The enhanced R distribution from Microsoft
Microsoft packages Copyright (C) 2017 Microsoft Corporation

Using the Intel MKL for parallel mathematical computing(using 1 cores).

Default CRAN mirror snapshot taken on 2017-05-01.
See: https://mran.microsoft.com/.

[Previously saved workspace restored]

> library(Rmpi)
> sessionInfo()
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:          n410 (PID 34456)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.4 (Santiago)

Matrix products: default
BLAS: /panfs/pfs.local/software/install/MRO/3.4.0/microsoft-r/3.4/lib64/R/lib/libRblas.so
LAPACK: /panfs/pfs.local/software/install/MRO/3.4.0/microsoft-r/3.4/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rmpi_0.6-6           RevoUtilsMath_10.0.0

loaded via a namespace (and not attached):
[1] compiler_3.4.0   RevoUtils_10.0.4 parallel_3.4.0

I do not know how how dangerous forks might be, but if you go read this message, it appears they can cause data corruption, and this has been known since 2010:

https://www.mail-archive.com/devel@lists.open-mpi.org/msg08785.html

It is above my understanding to say whether garden variety R users will cause these problems. I do know the R parallel documentation warns against system calls and forks, possibly for same reason. R functions that use disk--dir.create, list.files--make a system call that would fall into the dangerous fork category. Possibly. This is a little above my pay grade.

Conservative approach

My "better safe than sorry" instinct leads to this conclusion: TURN OFF INFINIBAND SUPPORT IN OpenMPI. This is the policy we adopted in 2010. It was in place on the KU community cluster. In the new cluster, it was not in place, resulting in the warning message. I had forgotten about this for a long time. With newly installed OpenMPI, I ran into same old problem.

This can be done in the user account, by adding ~/.openmpi/mca-params.conf (or, systemwide in the openmpi install folder etc/openmpi-mca-params.conf) with this line.

btl = ^openib

That prevents OpenMPI from using Infiniband transport layer. I am doing this in the CRMDA OpenMPI module configuration.

One can tell that an Infiniband device is detected with the shell program "ompi_info" provided by OpenMPI. Load the module Rstats/3.3 or Rstats/3.4. After running "ompi_info", look for the btl stanza. The return from ompi_info is like this if you have Infiniband.

   MCA btl: ofud (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5)

And like this after changing either ~/openmpi/mca-params.conf or, etc/openmpi-mca-params.conf, to include btl = ^openib.

   MCA btl: ofud (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5)

I believe it is worth mentioning that, if some of your compute nodes have Infiniband, an some do not, then OpenMPI jobs will crash if they try to integrate nodes connected with ethernet and Infiniband. That is another reason to tell OpenMPI not to try to use Infiniband at all.

If users do want to use Infiniband within OpenMPI, they can do so by editing a personal configuration file, in ~./openmpi.

Alphabetical R package list.

As of 2017-07-05, these are the packages we install in the directory "/panfs/pfs.local/work/crmda/tools/mro/3.3" (or 3.4)

c("ADGofTest", "AER", "Amelia", "BH", "BMA", "BradleyTerry2", 
"Cairo", "Cubist", "DBI", "DCluster", "DEoptimR", "Devore7", 
"DiagrammeR", "ENmisc", "Ecdat", "Ecfun", "Formula", "GPArotation", 
"HistData", "Hmisc", "HyperbolicDist", "ISwR", "Iso", "JGR", 
"JM", "JMdesign", "JavaGD", "Kendall", "LearnBayes", "MCMCpack", 
"MCPAN", "MEMSS", "MNP", "MPV", "MatchIt", "Matching", "MatrixModels", 
"MplusAutomation", "NMF", "PASWR", "PolynomF", "R2HTML", "R2OpenBUGS", 
"RColorBrewer", "RCurl", "RGtk2", "RSvgDevice", "RUnit", "RandomFields", 
"Rcmdr", "RcmdrMisc", "Rcpp", "RcppArmadillo", "RcppEigen", "Rd2roxygen", 
"Rmpi", "SASmixed", "SemiPar", "SoDA", "SparseM", "StanHeaders", 
"StatDataML", "SweaveListingUtils", "TH.data", "TeachingDemos", 
"UsingR", "VGAM", "VIM", "XML", "Zelig", "abind", "acepack", 
"actuar", "ada", "ade4", "adehabitat", "akima", "alr3", "amap", 
"aod", "ape", "aplpack", "arm", "arules", "assertthat", "backports", 
"base64enc", "bayesm", "bcp", "bdsmatrix", "bestglm", "betareg", 
"biglm", "bit", "bit64", "bitops", "bnlearn", "brew", "brglm", 
"caTools", "cairoDevice", "car", "caret", "cellranger", "censReg", 
"chron", "clue", "clv", "cocorresp", "coda", "coin", "colorspace", 
"combinat", "copula", "corpcor", "crayon", "cubature", "data.table", 
"deldir", "descr", "dichromat", "digest", "diptest", "distr", 
"dlm", "doBy", "doMC", "doMPI", "doParallel", "doSNOW", "dotCall64", 
"dse", "e1071", "earth", "ecodist", "effects", "eha", "eiPack", 
"emplik", "evaluate", "expm", "faraway", "fastICA", "fastmatch", 
"fda", "ffmanova", "fields", "flexmix", "foreach", "formatR", 
"forward", "gam", "gamlss", "gamlss.data", "gamlss.dist", "gamm4", 
"gbm", "gclus", "gdata", "gee", "geepack", "geoR", "geoRglm", 
"ggm", "ggplot2", "glmc", "glmmBUGS", "glmmML", "glmnet", "glmpath", 
"gmodels", "gmp", "gpclib", "gridBase", "gridExtra", "gsl", "gsubfn", 
"gtable", "gtools", "hexbin", "highr", "htmltools", "htmlwidgets", 
"igraph", "ineq", "influence.ME", "inline", "iplots", "irlba", 
"iterators", "itertools", "jpeg", "jsonlite", "kernlab", "knitr", 
"kutils", "labeling", "laeken", "languageR", "lars", "latticeExtra", 
"lava", "lavaan", "lavaan.survey", "lazyeval", "leaps", "lme4", 
"lmeSplines", "lmec", "lmm", "lmtest", "locfit", "logspline", 
"longitudinal", "longitudinalData", "lpSolve", "ltm", "magrittr", 
"manipulate", "maps", "maptools", "markdown", "matrixcalc", "maxLik", 
"mboost", "mcgibbsit", "mclust", "mcmc", "mda", "memisc", "memoise", 
"mi", "micEcon", "mice", "microbenchmark", "mime", "minqa", "misc3d", 
"miscTools", "mitools", "mix", "mixtools", "mlbench", "mnormt", 
"modeltools", "msm", "multcomp", "munsell", "mvProbit", "mvbutils", 
"mvtnorm", "network", "nloptr", "nnls", "nor1mix", "norm", "nortest", 
"np", "numDeriv", "nws", "openxlsx", "ordinal", "orthopolynom", 
"pan", "partDSA", "party", "pbivnorm", "pbkrtest", "pcaPP", "permute", 
"pixmap", "pkgKitten", "pkgmaker", "plm", "plotmo", "plotrix", 
"pls", "plyr", "pmml", "pmmlTransformations", "png", "polspline", 
"polycor", "polynom", "portableParallelSeeds", "ppcor", "profileModel", 
"proto", "proxy", "pscl", "psidR", "pspline", "psych", "quadprog", 
"quantreg", "randomForest", "randomForestSRC", "rattle", "rbenchmark", 
"rbugs", "rda", "readxl", "registry", "relimp", "rematch", "reshape", 
"reshape2", "rgenoud", "rgl", "rlang", "rlecuyer", "rmarkdown", 
"rms", "rngtools", "robustbase", "rockchalk", "roxygen2", "rpart.plot", 
"rpf", "rprojroot", "rrcov", "rstan", "rstudioapi", "sandwich", 
"scales", "scatterplot3d", "segmented", "sem", "semTools", "setRNG", 
"sets", "sfsmisc", "shapefiles", "simsem", "sm", "smoothSurv", 
"sna", "snow", "snowFT", "sp", "spam", "spatialCovariance", "spdep", 
"splancs", "stabledist", "stabs", "startupmsg", "statmod", "statnet.common", 
"stepwise", "stringi", "stringr", "strucchange", "subselect", 
"survey", "systemfit", "tables", "tcltk2", "tensorA", "testthat", 
"texreg", "tfplot", "tframe", "tibble", "tidyverse", "timeDate", 
"tis", "tree", "triangle", "trimcluster", "trust", "ucminf", 
"urca", "vcd", "vegan", "visNetwork", "waveslim", "wnominate", 
"xtable", "xts", "yaml", "zipfR", "zoo", "KernSmooth", "MASS", 
"Matrix", "MicrosoftR", "R6", "RUnit", "RevoIOQ", "RevoMods", 
"RevoUtils", "RevoUtilsMath", "base", "boot", "checkpoint", "class", 
"cluster", "codetools", "compiler", "curl", "datasets", "deployrRserve", 
"doParallel", "foreach", "foreign", "grDevices", "graphics", 
"grid", "iterators", "jsonlite", "lattice", "methods", "mgcv", 
"nlme", "nnet", "parallel", "png", "rpart", "spatial", "splines", 
"stats", "stats4", "survival", "tcltk", "tools", "utils")
This entry was posted in Data Analysis, R. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *