The link for students to apply is:
https://employment.ku.edu/student/8685BR
The last day students can apply is May 23, 2017, and committee members can review candidates by logging into the BrassRing system on or after May 24, 2017.
The link for students to apply is:
https://employment.ku.edu/student/8685BR
The last day students can apply is May 23, 2017, and committee members can review candidates by logging into the BrassRing system on or after May 24, 2017.
This is the 20170425 update, which includes an updated module set and reports of success with Java and TkTcl-based R packages. In other words, an almost complete victory is achieved. Special thanks to Wes Mason of ITTC.
To use R, here is a set of commands I run to set the environment. This is necessary every time I want to use R with Emacs. Let's call this the magic 5 line stanza, for sake of discussion.
module purge
module load legacy
module load emacs
module use /panfs/pfs.local/work/crmda/tools/modules
module load Rstats/3.3
I agree if you say "it is a pain in the rump to have to remember to do that every time I log in." In the old cluster, I was in a position to place those startup commands into all of the CRMDA user environments. That is no longer the case.
I'm checking on ways you can automate this within your own account. Details are posted at the end of this article.
When you want to work with R on the CRC cluster, please consider using the R packages we install within the $WORK folder for CRMDA group members. These packages have some special features and if you try to install them in your user folder (under $HOME/R, as R invites you to do if you run "install.packages()" in a session), then they may not compile correctly.
Recently, we have had runtime errors because the R we are recommending, as described below, is not compatible with packages that users build and install with other versions of R (or the same version of R in a different build environment). In particular, if
then you should delete the packages you have under $HOME/R. I think it is best if you let us try to install what you need, but if you install R packages in your own home folder, please do so only AFTER loading the modules listed below. Please DO NOT load the CRC-provided module "R/3.3". It does not provide the services we need.
The module Rstats/3.3 is built by Wes Mason of ITTC and it is installed into the $WORK folder for CRMDA (hence the module use command above). We work together to make sure the OpenMPI layer is compiled correctly, so it is possible to use Rmpi and the R package parallel. The compiler used is GCC-6.3, which is quite a bit newer than the standard GCC which is provided with the cluster node operating system. This is the principal reason why the CRC-provided "R/3.3" is not acceptable. It does not make sure that the OpenMPI and GCC components are kept in lock-step with R itself. Observe, if we start with an empty session and run
module purge
module use /panfs/pfs.local/work/crmda/tools/modules
module load Rstats/3.3
we find that we actually load several modules:
$ module list
Currently Loaded Modules:
1) compiler/gcc/6.3
2) openmpi/2.0
3) java/1.8.0_131
4) xz/5.2.3
5) icu/59.1
6) tcltk/8.6.6
7) Rstats/3.3
The openmpi version must be kept in lock-step with R and the packages we have installed in the past. gcc-6.3 is the compiler version we use for all of the packages. It is necessary to have that new version because of demands by packages like Rstan and OpenMX. The java and tcltk modules are needed by various R packages, such as rJava and tkrplot. The xz module is a decompression suite, needed to interact with source code itself. The Rstats module itself is, for the most part, a "holding company" that keeps all of this together. It simply loads the requirements of gcc, openmpi, java, xz, icu, and tcltk, and then it accesses the R provided by the CRC system maintainers from /panfs/pfs.local/software/install/MRO/3.3. The R packages provided by the base R install are found in the directory /panfs/pfs.local/work/crmda/tools/mro/3.3/site-library.
The R packages in our collection are, in most cases, going to be updates & replacements of those packages because we are building with the different compiler. Users who load Rstats/3.3 should notice that our directory comes into the user path before the system-wide folder. Inside R, we see:
> .libPaths()
[1] "/panfs/pfs.local/work/crmda/tools/mro/3.3/site-library"
[2] "/panfs/pfs.local/software/install/MRO/3.3/microsoft-r/3.3/lib64/R/library"
In my basic 5 line session starter sequence, I also have modules named legacy and emacs. In my opinion, that is a little dangerous because I'm jumbling together modules from the old and new cluster. That's necessary because I need an IDE with which to interact with R. Emacs was configured with ESS by Wes Mason, and it is in the legacy module set. If you prefer, legacy also provides RStudio version 9.98.978. That is, unfortunately, outdated and unmaintained. I've filed a request with CRC to get a new version of Rstudio.
After the magic 5 line stanza above, within your R session, you have access to these packages (Run "library()" to see all folders your path, and all packages within):
Packages in library '/panfs/pfs.local/work/crmda/tools/mro/3.3/site-library':
ADGofTest Anderson-Darling GoF test
AER Applied Econometrics with R
Amelia A Program for Missing Data
BH Boost C++ Header Files
BMA Bayesian Model Averaging
BradleyTerry2 Bradley-Terry Models
Cairo R graphics device using cairo graphics library
for creating high-quality bitmap (PNG, JPEG,
TIFF), vector (PDF, SVG, PostScript) and
display (X11 and Win32) output
Cubist Rule- And Instance-Based Regression Modeling
DBI R Database Interface
DCluster Functions for the Detection of Spatial Clusters
of Diseases
DEoptimR Differential Evolution Optimization in Pure R
Devore7 Data sets from Devore's "Prob and Stat for Eng
(7th ed)"
DiagrammeR Create Graph Diagrams and Flowcharts Using R
ENmisc Neuwirth miscellaneous
Ecdat Data Sets for Econometrics
Ecfun Functions for Ecdat
Formula Extended Model Formulas
GPArotation GPA Factor Rotation
HistData Data Sets from the History of Statistics and
Data Visualization
Hmisc Harrell Miscellaneous
HyperbolicDist The hyperbolic distribution
ISwR Introductory Statistics with R
Iso Functions to Perform Isotonic Regression
JGR JGR - Java GUI for R
JM Joint Modeling of Longitudinal and Survival
Data
JMdesign Joint Modeling of Longitudinal and Survival
Data - Power Calculation
JavaGD Java Graphics Device
Kendall Kendall rank correlation and Mann-Kendall trend
test
LearnBayes Functions for Learning Bayesian Inference
MCMCglmm MCMC Generalised Linear Mixed Models
MCMCpack Markov Chain Monte Carlo (MCMC) Package
MCPAN Multiple Comparisons Using Normal Approximation
MEMSS Data sets from Mixed-effects Models in S
MNP R Package for Fitting the Multinomial Probit
Model
MPV Data Sets from Montgomery, Peck and Vining's
Book
MatchIt Nonparametric Preprocessing for Parametric
Casual Inference
Matching Multivariate and Propensity Score Matching with
Balance Optimization
MatrixModels Modelling with Sparse And Dense Matrices
ModelMetrics Rapid Calculation of Model Metrics
MplusAutomation Automating Mplus Model Estimation and
Interpretation
NMF Algorithms and Framework for Nonnegative Matrix
Factorization (NMF)
OpenMx Extended Structural Equation Modelling
PASWR PROBABILITY and STATISTICS WITH R
PBSmapping Mapping Fisheries Data and Spatial Analysis
Tools
PolynomF Polynomials in R
R2HTML HTML Exportation for R Objects
R2OpenBUGS Running OpenBUGS from R
R6 Classes with Reference Semantics
RColorBrewer ColorBrewer Palettes
RCurl General Network (HTTP/FTP/...) Client Interface
for R
RGtk2 R bindings for Gtk 2.8.0 and above
RSvgDevice An R SVG graphics device.
RandomFields Simulation and Analysis of Random Fields
RandomFieldsUtils Utilities for the Simulation and Analysis of
Random Fields
Rcmdr R Commander
RcmdrMisc R Commander Miscellaneous Functions
Rcpp Seamless R and C++ Integration
RcppArmadillo 'Rcpp' Integration for the 'Armadillo'
Templated Linear Algebra Library
RcppEigen 'Rcpp' Integration for the 'Eigen' Templated
Linear Algebra Library
Rd2roxygen Convert Rd to 'Roxygen' Documentation
Rmpi Interface (Wrapper) to MPI (Message-Passing
Interface)
Rook Rook - a web server interface for R
SAScii Import ASCII files directly into R using only a
SAS input script
SASmixed Data sets from "SAS System for Mixed Models"
SemiPar Semiparametic Regression
SoDA Functions and Examples for "Software for Data
Analysis"
SparseM Sparse Linear Algebra
StanHeaders C++ Header Files for Stan
StatDataML Read and Write StatDataML Files
SweaveListingUtils Utilities for Sweave Together with TeX
'listings' Package
TH.data TH's Data Archive
TeachingDemos Demonstrations for Teaching and Learning
UsingR Data Sets, Etc. for the Text "Using R for
Introductory Statistics", Second Edition
VGAM Vector Generalized Linear and Additive Models
VIM Visualization and Imputation of Missing Values
XML Tools for Parsing and Generating XML Within R
and S-Plus
Zelig Everyone's Statistical Software
abind Combine Multidimensional Arrays
acepack ACE and AVAS for Selecting Multiple Regression
Transformations
actuar Actuarial Functions and Heavy Tailed
Distributions
ada The R Package Ada for Stochastic Boosting
ade4 Analysis of Ecological Data : Exploratory and
Euclidean Methods in Environmental Sciences
adehabitat Analysis of Habitat Selection by Animals
akima Interpolation of Irregularly and Regularly
Spaced Data
alr3 Data to accompany Applied Linear Regression 3rd
edition
amap Another Multidimensional Analysis Package
aod Analysis of Overdispersed Data
ape Analyses of Phylogenetics and Evolution
aplpack Another Plot PACKage: stem.leaf, bagplot,
faces, spin3R, plotsummary, plothulls, and some
slider functions
arm Data Analysis Using Regression and
Multilevel/Hierarchical Models
arules Mining Association Rules and Frequent Itemsets
assertthat Easy Pre and Post Assertions
backports Reimplementations of Functions Introduced Since
R-3.0.0
base64enc Tools for base64 encoding
bayesm Bayesian Inference for
Marketing/Micro-Econometrics
bcp Bayesian Analysis of Change Point Problems
bdsmatrix Routines for Block Diagonal Symmetric matrices
bestglm Best Subset GLM
betareg Beta Regression
biglm bounded memory linear and generalized linear
models
bit A class for vectors of 1-bit booleans
bit64 A S3 Class for Vectors of 64bit Integers
bitops Bitwise Operations
bnlearn Bayesian Network Structure Learning, Parameter
Learning and Inference
brew Templating Framework for Report Generation
brglm Bias reduction in binomial-response generalized
linear models.
broom Convert Statistical Analysis Objects into Tidy
Data Frames
caTools Tools: moving window statistics, GIF, Base64,
ROC AUC, etc.
cairoDevice Embeddable Cairo Graphics Device Driver
car Companion to Applied Regression
caret Classification and Regression Training
cellranger Translate Spreadsheet Cell Ranges to Rows and
Columns
censReg Censored Regression (Tobit) Models
checkmate Fast and Versatile Argument Checks
chron Chronological Objects which can Handle Dates
and Times
clue Cluster Ensembles
clv Cluster Validation Techniques
cocorresp Co-Correspondence Analysis Methods
coda Output Analysis and Diagnostics for MCMC
coin Conditional Inference Procedures in a
Permutation Test Framework
colorspace Color Space Manipulation
combinat combinatorics utilities
commonmark High Performance CommonMark and Github Markdown
Rendering in R
copula Multivariate Dependence with Copulas
corpcor Efficient Estimation of Covariance and
(Partial) Correlation
crayon Colored Terminal Output
cslogistic Conditionally Specified Logistic Regression
cubature Adaptive Multivariate Integration over
Hypercubes
data.table Extension of `data.frame`
deldir Delaunay Triangulation and Dirichlet (Voronoi)
Tessellation
desc Manipulate DESCRIPTION Files
descr Descriptive Statistics
dichromat Color Schemes for Dichromats
digest Create Compact Hash Digests of R Objects
diptest Hartigan's Dip Test Statistic for Unimodality -
Corrected
distr Object Oriented Implementation of Distributions
dlm Bayesian and Likelihood Analysis of Dynamic
Linear Models
doBy Groupwise Statistics, LSmeans, Linear
Contrasts, Utilities
doMC Foreach Parallel Adaptor for 'parallel'
doMPI Foreach parallel adaptor for the Rmpi package
doSNOW Foreach Parallel Adaptor for the 'snow' Package
dplyr A Grammar of Data Manipulation
dse Dynamic Systems Estimation (Time Series
Package)
e1071 Misc Functions of the Department of Statistics,
Probability Theory Group (Formerly: E1071), TU
Wien
earth Multivariate Adaptive Regression Splines
ecodist Dissimilarity-based functions for ecological
analysis
effects Effect Displays for Linear, Generalized Linear,
and Other Models
eha Event History Analysis
eiPack eiPack: Ecological Inference and
Higher-Dimension Data Management
emplik Empirical Likelihood Ratio for
Censored/Truncated Data
evaluate Parsing and Evaluation Tools that Provide More
Details than the Default
expint Exponential Integral and Incomplete Gamma
Function
expm Matrix Exponential, Log, 'etc'
faraway Functions and Datasets for Books by Julian
Faraway
fastICA FastICA Algorithms to perform ICA and
Projection Pursuit
fastmatch Fast match() function
fda Functional Data Analysis
ffmanova Fifty-fifty MANOVA
fields Tools for Spatial Data
flexmix Flexible Mixture Modeling
forcats Tools for Working with Categorical Variables
(Factors)
formatR Format R Code Automatically
forward Forward search
gam Generalized Additive Models
gamlss Generalised Additive Models for Location Scale
and Shape
gamlss.data GAMLSS Data
gamlss.dist Distributions to be Used for GAMLSS Modelling
gamm4 Generalized Additive Mixed Models using 'mgcv'
and 'lme4'
gbm Generalized Boosted Regression Models
gclus Clustering Graphics
gdata Various R Programming Tools for Data
Manipulation
gee Generalized Estimation Equation Solver
geepack Generalized Estimating Equation Package
geoR Analysis of Geostatistical Data
geoRglm A Package for Generalised Linear Spatial Models
ggm Functions for graphical Markov models
ggplot2 Create Elegant Data Visualisations Using the
Grammar of Graphics
glmc Fitting Generalized Linear Models Subject to
Constraints
glmmBUGS Generalised Linear Mixed Models with BUGS and
JAGS
glmmML Generalized Linear Models with Clustering
glmnet Lasso and Elastic-Net Regularized Generalized
Linear Models
glmpath L1 Regularization Path for Generalized Linear
Models and Cox Proportional Hazards Model
gmodels Various R Programming Tools for Model Fitting
gmp Multiple Precision Arithmetic
gpclib General Polygon Clipping Library for R
gridBase Integration of base and grid graphics
gridExtra Miscellaneous Functions for "Grid" Graphics
grpreg Regularization Paths for Regression Models with
Grouped Covariates
gsl Wrapper for the Gnu Scientific Library
gsubfn Utilities for strings and function arguments.
gtable Arrange 'Grobs' in Tables
gtools Various R Programming Tools
haven Import and Export 'SPSS', 'Stata' and 'SAS'
Files
hexbin Hexagonal Binning Routines
highr Syntax Highlighting for R Source Code
hms Pretty Time of Day
htmlTable Advanced Tables for Markdown/HTML
htmltools Tools for HTML
htmlwidgets HTML Widgets for R
httpuv HTTP and WebSocket Server Library
httr Tools for Working with URLs and HTTP
igraph Network Analysis and Visualization
ineq Measuring Inequality, Concentration, and
Poverty
influence.ME Tools for Detecting Influential Data in Mixed
Effects Models
influenceR Software Tools to Quantify Structural
Importance of Nodes in a Network
inline Functions to Inline C, C++, Fortran Function
Calls from R
iplots iPlots - interactive graphics for R
irlba Fast Truncated SVD, PCA and Symmetric
Eigendecomposition for Large Dense and Sparse
Matrices
itertools Iterator Tools
jpeg Read and write JPEG images
kernlab Kernel-Based Machine Learning Lab
knitr A General-Purpose Package for Dynamic Report
Generation in R
kutils Project Management Tools
labeling Axis Labeling
laeken Estimation of indicators on social exclusion
and poverty
languageR Data sets and functions with "Analyzing
Linguistic Data: A practical introduction to
statistics".
lars Least Angle Regression, Lasso and Forward
Stagewise
latticeExtra Extra Graphical Utilities Based on Lattice
lava Latent Variable Models
lavaan Latent Variable Analysis
lavaan.survey Complex Survey Structural Equation Modeling
(SEM)
lazyeval Lazy (Non-Standard) Evaluation
leaps Regression Subset Selection
lme4 Linear Mixed-Effects Models using 'Eigen' and
S4
lmeSplines Add smoothing spline modelling capability to
nlme.
lmec Linear Mixed-Effects Models with Censored
Responses
lmerTest Tests in Linear Mixed Effects Models
lmm Linear Mixed Models
lmtest Testing Linear Regression Models
locfit Local Regression, Likelihood and Density
Estimation.
logspline Logspline Density Estimation Routines
longitudinal Analysis of Multiple Time Course Data
longitudinalData Longitudinal Data
lpSolve Interface to 'Lp_solve' v. 5.5 to Solve
Linear/Integer Programs
ltm Latent Trait Models under IRT
lubridate Make Dealing with Dates a Little Easier
magic create and investigate magic squares
magrittr A Forward-Pipe Operator for R
manipulate Interactive Plots for RStudio
maps Draw Geographical Maps
maptools Tools for Reading and Handling Spatial Objects
markdown 'Markdown' Rendering for R
matrixcalc Collection of functions for matrix calculations
maxLik Maximum Likelihood Estimation and Related Tools
mboost Model-Based Boosting
mcgibbsit Warnes and Raftery's MCGibbsit MCMC diagnostic
mclust Gaussian Mixture Modelling for Model-Based
Clustering, Classification, and Density
Estimation
mcmc Markov Chain Monte Carlo
mda Mixture and Flexible Discriminant Analysis
mediation Causal Mediation Analysis
memisc Tools for Management of Survey Data and the
Presentation of Analysis Results
memoise Memoisation of Functions
mi Missing Data Imputation and Model Checking
micEcon Microeconomic Analysis and Modelling
mice Multivariate Imputation by Chained Equations
microbenchmark Accurate Timing Functions
mime Map Filenames to MIME Types
minqa Derivative-free optimization algorithms by
quadratic approximation
misc3d Miscellaneous 3D Plots
miscTools Miscellaneous Tools and Utilities
mitools Tools for multiple imputation of missing data
mix Estimation/Multiple Imputation for Mixed
Categorical and Continuous Data
mixtools Tools for Analyzing Finite Mixture Models
mlbench Machine Learning Benchmark Problems
mnormt The Multivariate Normal and t Distributions
modelr Modelling Functions that Work with the Pipe
modeltools Tools and Classes for Statistical Models
msm Multi-State Markov and Hidden Markov Models in
Continuous Time
multcomp Simultaneous Inference in General Parametric
Models
munsell Utilities for Using Munsell Colours
mvProbit Multivariate Probit Models
mvbutils Workspace organization, code and documentation
editing, package prep and editing, etc.
mvtnorm Multivariate Normal and t Distributions
neighbr Classification, Regression, Clustering with K
Nearest Neighbors
network Classes for Relational Data
nloptr R interface to NLopt
nnls The Lawson-Hanson algorithm for non-negative
least squares (NNLS)
nor1mix Normal (1-d) Mixture Models (S3 Classes and
Methods)
norm Analysis of multivariate normal datasets with
missing values
nortest Tests for Normality
np Nonparametric kernel smoothing methods for
mixed data types
numDeriv Accurate Numerical Derivatives
nws R functions for NetWorkSpaces and Sleigh
openssl Toolkit for Encryption, Signatures and
Certificates Based on OpenSSL
openxlsx Read, Write and Edit XLSX Files
ordinal Regression Models for Ordinal Data
orthopolynom Collection of functions for orthogonal and
orthonormal polynomials
pan Multiple Imputation for Multivariate Panel or
Clustered Data
pander An R Pandoc Writer
partDSA Partitioning Using Deletion, Substitution, and
Addition Moves
party A Laboratory for Recursive Partytioning
pbivnorm Vectorized Bivariate Normal CDF
pbkrtest Parametric Bootstrap and Kenward Roger Based
Methods for Mixed Model Comparison
pcaPP Robust PCA by Projection Pursuit
permute Functions for Generating Restricted
Permutations of Data
pixmap Bitmap Images (``Pixel Maps'')
pkgKitten Create Simple Packages Which Do not Upset R
Package Checks
pkgmaker Package development utilities
plm Linear Models for Panel Data
plotmo Plot a Model's Response and Residuals
plotrix Various Plotting Functions
pls Partial Least Squares and Principal Component
Regression
plyr Tools for Splitting, Applying and Combining
Data
pmml Generate PMML for Various Models
pmmlTransformations Transforms Input Data from a PMML Perspective
polspline Polynomial Spline Routines
polycor Polychoric and Polyserial Correlations
polynom A Collection of Functions to Implement a Class
for Univariate Polynomial Manipulations
portableParallelSeeds Allow Replication of Simulations on Parallel
and Serial Computers
ppcor Partial and Semi-Partial (Part) Correlation
praise Praise Users
profileModel Tools for profiling inference functions for
various model classes
proto Prototype Object-Based Programming
proxy Distance and Similarity Measures
pscl Political Science Computational Laboratory,
Stanford University
psidR Build Panel Data Sets from PSID Raw Data
pspline Penalized Smoothing Splines
psych Procedures for Psychological, Psychometric, and
Personality Research
purrr Functional Programming Tools
quadprog Functions to solve Quadratic Programming
Problems.
quantreg Quantile Regression
rJava Low-Level R to Java Interface
randomForest Breiman and Cutler's Random Forests for
Classification and Regression
randomForestSRC Random Forests for Survival, Regression and
Classification (RF-SRC)
rattle Graphical User Interface for Data Mining in R
rbenchmark Benchmarking routine for R
rbugs Fusing R and OpenBugs and Beyond
rda Shrunken Centroids Regularized Discriminant
Analysis
readr Read Rectangular Text Data
readxl Read Excel Files
registry Infrastructure for R Package Registries
relimp Relative Contribution of Effects in a
Regression Model
rematch Match Regular Expressions with a Nicer 'API'
reshape Flexibly Reshape Data
reshape2 Flexibly Reshape Data: A Reboot of the Reshape
Package
rgenoud R Version of GENetic Optimization Using
Derivatives
rgexf Build, Import and Export GEXF Graph Files
rgl 3D Visualization Using OpenGL
rlecuyer R Interface to RNG with Multiple Streams
rmarkdown Dynamic Documents for R
rms Regression Modeling Strategies
rngtools Utility functions for working with Random
Number Generators
robustbase Basic Robust Statistics
rockchalk Regression Estimation and Presentation
roxygen2 In-Line Documentation for R
rpart.plot Plot 'rpart' Models: An Enhanced Version of
'plot.rpart'
rpf Response Probability Functions
rprojroot Finding Files in Project Subdirectories
rrcov Scalable Robust Estimators with High Breakdown
Point
rstan R Interface to Stan
rstudio Tools and Utilities for RStudio
rstudioapi Safely Access the RStudio API
rvest Easily Harvest (Scrape) Web Pages
sandwich Robust Covariance Matrix Estimators
scales Scale Functions for Visualization
scatterplot3d 3D Scatter Plot
segmented Regression Models with Breakpoints/Changepoints
Estimation
selectr Translate CSS Selectors to XPath Expressions
sem Structural Equation Models
semTools Useful Tools for Structural Equation Modeling
setRNG Set (Normal) Random Number Generator and Seed
sets Sets, Generalized Sets, Customizable Sets and
Intervals
sfsmisc Utilities from "Seminar fuer Statistik" ETH
Zurich
shapefiles Read and Write ESRI Shapefiles
shiny Web Application Framework for R
simsem SIMulated Structural Equation Modeling
sm Smoothing methods for nonparametric regression
and density estimation
smoothSurv Survival Regression with Smoothed Error
Distribution
sna Tools for Social Network Analysis
snow Simple Network of Workstations
snowFT Fault Tolerant Simple Network of Workstations
sourcetools Tools for Reading, Tokenizing and Parsing R
Code
sp Classes and Methods for Spatial Data
spam SPArse Matrix
spatialCovariance Computation of Spatial Covariance Matrices for
Data on Rectangles
spatialkernel Nonparameteric estimation of spatial
segregation in a multivariate point process
spdep Spatial Dependence: Weighting Schemes,
Statistics and Models
splancs Spatial and Space-Time Point Pattern Analysis
stabledist Stable Distribution Functions
stabs Stability Selection with Error Control
startupmsg Utilities for Start-Up Messages
statmod Statistical Modeling
statnet.common Common R Scripts and Utilities Used by the
Statnet Project Software
stepwise Stepwise detection of recombination breakpoints
stringi Character String Processing Facilities
stringr Simple, Consistent Wrappers for Common String
Operations
strucchange Testing, Monitoring, and Dating Structural
Changes
subselect Selecting Variable Subsets
survey Analysis of Complex Survey Samples
survival Survival Analysis
systemfit Estimating Systems of Simultaneous Equations
tables Formula-Driven Table Generation
tcltk2 Tcl/Tk Additions
tensorA Advanced tensors arithmetic with named indices
testthat Unit Testing for R
texreg Conversion of R Regression Output to LaTeX or
HTML Tables
tfplot Time Frame User Utilities
tframe Time Frame Coding Kernel
tibble Simple Data Frames
tidyr Easily Tidy Data with 'spread()' and 'gather()'
Functions
tidyverse Easily Install and Load 'Tidyverse' Packages
timeDate Rmetrics - Chronological and Calendar Objects
tis Time Indexes and Time Indexed Series
tkrplot TK Rplot
tree Classification and Regression Trees
triangle Provides the Standard Distribution Functions
for the Triangle Distribution
trimcluster Cluster analysis with trimming
trust Trust Region Optimization
ucminf General-Purpose Unconstrained Non-Linear
Optimization
urca Unit Root and Cointegration Tests for Time
Series Data
vcd Visualizing Categorical Data
vegan Community Ecology Package
viridis Default Color Maps from 'matplotlib'
viridisLite Default Color Maps from 'matplotlib' (Lite
Version)
visNetwork Network Visualization using 'vis.js' Library
waveslim Basic wavelet routines for one-, two- and
three-dimensional signal processing
wnominate Roll Call Analysis Software
xgboost Extreme Gradient Boosting
xml2 Parse XML
xtable Export Tables to LaTeX or HTML
xts eXtensible Time Series
yaml Methods to Convert R Data to YAML and Back
zipfR Statistical models for word frequency
distributions
zoo S3 Infrastructure for Regular and Irregular
Time Series (Z's Ordered Observations)
Packages in library '/panfs/pfs.local/software/install/MRO/3.3/microsoft-r/3.3/lib64/R/library':
KernSmooth Functions for Kernel Smoothing Supporting Wand
& Jones (1995)
MASS Support Functions and Datasets for Venables and
Ripley's MASS
Matrix Sparse and Dense Matrix Classes and Methods
MicrosoftR Microsoft R umbrella package
R6 Classes with Reference Semantics
RUnit R Unit test framework
RevoIOQ Microsoft R Services Test Suite
RevoMods R Functions Modified For Revolution R
RevoUtils Microsoft R Utility Package
RevoUtilsMath Microsoft R Services Math Utilities Package
base The R Base Package
boot Bootstrap Functions (Originally by Angelo Canty
for S)
checkpoint Install Packages from Snapshots on the
Checkpoint Server for Reproducibility
class Functions for Classification
cluster "Finding Groups in Data": Cluster Analysis
Extended Rousseeuw et al.
codetools Code Analysis Tools for R
compiler The R Compiler Package
curl A Modern and Flexible Web Client for R
datasets The R Datasets Package
deployrRserve Binary R server
doParallel Foreach Parallel Adaptor for the 'parallel'
Package
foreach Provides Foreach Looping Construct for R
foreign Read Data Stored by Minitab, S, SAS, SPSS,
Stata, Systat, Weka, dBase, ...
grDevices The R Graphics Devices and Support for Colours
and Fonts
graphics The R Graphics Package
grid The Grid Graphics Package
iterators Provides Iterator Construct for R
jsonlite A Robust, High Performance JSON Parser and
Generator for R
lattice Trellis Graphics for R
methods Formal Methods and Classes
mgcv Mixed GAM Computation Vehicle with GCV/AIC/REML
Smoothness Estimation
nlme Linear and Nonlinear Mixed Effects Models
nnet Feed-Forward Neural Networks and Multinomial
Log-Linear Models
parallel Support for Parallel computation in R
png Read and write PNG images
rpart Recursive Partitioning and Regression Trees
spatial Functions for Kriging and Point Pattern
Analysis
splines Regression Spline Functions and Classes
stats The R Stats Package
stats4 Statistical Functions using S4 Classes
survival Survival Analysis
tcltk Tcl/Tk Interface
tools Tools for Package Development
utils The R Utils Package
As usual, if these don't work right, its something I got wrong and will fix. Email me.
As of 2017-04-25, we have solved the problems of compiling Java and tk-based R packages. In other words, we find ourselves roughly back in the place where we were in October, 2016, or perhaps a little bit ahead of that. Now that the gcc issues have been addressed, we are able to stay up to date with changes in the cutting edge packages like Rcpp, Rstan and OpenMx.
If you need other packages, I'll install them if you email me
If you launch R and you don't find packages (in the output of library(), for example), it probably means you forgot the module magic.
If you are having trouble with Rstan, the likely sources of trouble are 1) errors in your ~/.R/Makevars file, or 2) old packages in your home folder ~/R/ that do not cooperate with the new R and the other packages we make available.
I have another more lesson. Instead of re-typing that stanza whenever it is needed, put those lines in a file. I just tested this. I put module stanza rstats.sh. I saved that in $HOME/bin and made it executable ("chmod +x rstats.sh"). It seems to succeed then to run
source rstats.sh
If we don't build packages for you, you have to build your own. Here is a lesson from the school of hard knocks. In the new CRC cluster, the memory limits of your sessions are strictly enforced. The compiler will often use more than 2GB memory. As a result, when you try to build a package inside R with "install.packages", you may get a vague message of failure. To protect yourself against that, it is wise to ask for an interactive session with more memory. I do this, for example:
$ msub -I -X -l nodes=1:ppn=1,pmem=6144m
That is sufficient to compile Rstan, which is the most intensive package I have tried to build.
The cluster runs on RedHat RHEL 6, which is too old to support the new versions of R. The principal weakness is the older gcc compiler in RHEL6.
In the cluster, however, we have access to much newer Intel MKL compiler and math libraries, so the R program, and the things on which it relies, can be built with the Intel compiler. It appears as though we can stay up to date with the troublesome R modules like Rstan, Rcpp, RcppArmadillo.
Wes Mason of ITTC worked this out for us. The scheme we are testing now can be accessed as follows.
For people in the crmda user group, try this interactively
$ module purge
$ module use /panfs/pfs.local/work/crmda/tools/modules
$ module load Rstats/3.3
After that, observe
$ R
> library("rstan")
Loading required package: ggplot2
Loading required package: StanHeaders
rstan (Version 2.14.2, packaged: 2017-03-19 00:42:29 UTC, GitRev:
5fa1e80eb817)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
We are still in a testing phase on this setup, surely there will be problems. I do not understand what is necessary to compile new R packages with this setup. We don't want packages built with gcc if we can avoid it, there is always danger of incompatability when shared libraries are built with different compilers.
But the key message is still encouraging. Even though the OS does now have the needed parts, there is a work around.
Why is this "Revolution R"? The company Revolution R, which was later purchased by Microsoft, popularized the use of the Intel MKL on Ubuntu Linux. A version of R built with Intel's compiler was used, with permission, on Ubuntu in 2012. The version of R we are using now goes by the moniker "MRO". Can you guess what the M and the R stand for?
KU thesis rules require that all fonts used in the submitted PDF document must be embedded in the document itself. This is required to eliminate the problem that special symbols are not legible in the document on the receiver's computer.
Making sure all fonts are embedded appears to be not so easy across platforms. When I compile the ku thesis document, I notice the Wingding and symbols are not embedded.
However, this is not a flaw in pdflatex as it currently exists. It was a pdflatex flaw in the past. So far as I can tell, all fonts needed in the pdflatex run are embedded if you use a LaTeX distribution that is reasonably modern.
The major problem arises when a document includes other PDF documents, using \includegraphics{} for example. If those included documents are lacking in embedded fonts, then pdflatex does not fix that.
In my example document, before 20160503, the fonts were missing because they were not embedded in the R plots that are included in the example chapters. I had to to go back and re-run the R code to make sure the fonts are embedded in the pdf files for the graphs. After that, the pdflatex output of the thesis template is fine.
You can check for yourself, Run
$ pdffonts thesis-ku.pdf
If we don't fix the R output files before compiling the thesis itself, we are in a somewhat dangerous situation. People suggest using various magic wands to add fonts, but all of them seem to have major flaws. They either corrupt the quality of the output or destroy its internal structure.
I found ways to embed fonts using ghostscript. This converts document over to ps and then back to pdf.
$ pdf2ps thesis-ku.pdf test.ps
$ ps2pdf14 -dPDFSettings=/prepress -dEmbedAllFonts=true test.ps
test.pdf
The bad news. 1 It destroys internal hyperlinks. 2 IT DOES NOT embed fonts needed for material in embedded graphs (things inserted by \includegraphics, such as PDF produced by R).
See:
http://askubuntu.com/questions/50274/fonts-are-not-embedded-into-a-pdf
In my opinion, this is a bad outcome, should not happen. But it does.
As a result, it seems necessary to fix the individual PDF graphics files before compiling the larger thesis document.
This reminds me that at one point I had a post-processing script written for R Sweave sessions that would embed fonts in all pdf output files.
The shell script would cycle through all of the R output and embed fonts. Enjoy!
for i in *.pdf; do
base=`basename $i .pdf`;
basenew="${base}/newtemp.pdf"
##echo "$i base: $base new: $basenew"
/usr/bin/gs -o $basenew -dNOPAUSE -dPDFSETTINGS=/prepress -sDEVICE=pdfwrite $i
mv -f $basenew $i
done;
Same can be achieved inside R. Each time a PDF is created, embed the fonts with the embedFonts() function. See ?embedFonts
User home folders are limited at 100GB and no customization is allowed. To our users who were previously limited to 20GB, that's great news. To the others who had 600GB allocations, that's disaster. Oh, well. Just one among many.
When you log in on hpc.crc.ku.edu, a system status message appears. One report is the disk usage. Here's what I see today:
Primary group: hpc_crmda
Default Queue: crmda
$HOME = /home/pauljohn
<GB> <soft> <hard> : <files> <soft> <hard> : <path to volume> <pan_identity(name)>
65.04 85.00 100.00 : 136150 85000 100000 : /home/pauljohn uid:xxxxxx(pauljohn)
$WORK = /panfs/pfs.local/work/crmda/pauljohn
Filesystem Size Used Avail Use% Mounted on
panfs://pfs.local/work
14T 1.6T 13T 12% /panfs/pfs.local/work/crmda/pauljohn
$SCRATCH = /panfs/pfs.local/scratch/crmda/pauljohn
Filesystem Size Used Avail Use% Mounted on
panfs://pfs.local/scratch
55T 37T 19T 67% /panfs/pfs.local/scratch/crmda/pauljohn
In case you want to see the same output, the new cluster has a command called "mystats" which will display it again. In the terminal, run
mystats
In the output about my home folder, there is a "hard limit" at 100GB, as you can see. That is not adjustable in the current regime.
The main concern today is that I'm over the limit on the number of files. The limit is now 100,000 files but I have 136150. If I'm over the limit, I am not allowed to create new files. If I remain over the limit, the system can prevent me from doing my job.
Wait a minute. 136,150 files? WTH? Last time I checked, there were only 135,998 files and I'm sure I did not add any. Did some make babies? Do you suppose some R files found some C++ files and made an Rcpp project? (That's programmer humor. It knocks them out at conferences.)
I probably have files I don't need any more. I'm pretty sure that, for example, when I compile R, it uses tens of thousands of files. Maybe I can move that work somewhere else.
I wondered how I could find out where I have all those files. We asked and the best suggestion so far is to run the following, which sifts through all directories and counts the files.
for i in $(find . -maxdepth 1 -type d);do echo $i;find $i -type f |wc -l;done
The return shows directory names and file counts, like this:
./tmp
17365
./work
46
./.emacs.d
0
./src
25519
./texmf
1794
./packages
5041
./SVN
4321
./Software
12014
./.ccache
995 .
/TMPRlib-3.3
19316
I'll have to sift through that. Clearly, there are some files I can live without. I've got about 20K files in TMPRlib, which is a building spot for R packages before I put them in the generally accessible part of the system. .ccache is the compiler cache, I can delete those files. They just get regenerated and saved to speed up C compiler jobs, but I have to make a choice there.
So far, I've obliterated the temporary build information, but I remain over the quota. I'll show the output from "mystats" so that you can see the difference:
$ mystats
Primary group: hpc_crmda
Default Queue: crmda
$HOME = /home/pauljohn
<GB> <soft> <hard> : <files> <soft> <hard> : <path to volume> <pan_identity(name)>
63.26 85.00 100.00 : 113510 85000 100000 : /home/pauljohn uid:xxxxx(pauljohn)
$WORK = /panfs/pfs.local/work/crmda/pauljohn
Filesystem Size Used Avail Use% Mounted on
panfs://pfs.local/work
14T 1.6T 13T 12% /panfs/pfs.local/work/crmda/pauljohn
$SCRATCH = /panfs/pfs.local/scratch/crmda/pauljohn
Filesystem Size Used Avail Use% Mounted on
panfs://pfs.local/scratch
55T 37T 19T 67% /panfs/pfs.local/scratch/crmda/pauljohn
Oh, well, I'll have to cut/move more things.
The CRC put in place a hard, unchangeable 100GB limit on user home directories.
There is a limit of 100,000 on the number of files that can be stored within that. Users will need to cut files to be under the limit.
One can use the find command in the shell to find out where the files are.
How to avoid the accidental buildup of files? The main issue is that compiling software (R packages) creates intermediate object files that are not needed once the work is done. It is difficult to police these files (at least it is for me).
I don't have time to write all this down now, but here is a hint. The question is where to store "temporary" files that are need to compile software or run a program, but they are not needed after that. In many programming chores, one can link the "build" folder to a faster, temporary storage device that is not in the network file system. In the past, I've usually used "/tmp/a_folder_i_create" because that is on the disk "in" the compute node. Disk access on the local disk is much faster than the network file system. Lately, I'm told it is even faster to put temporary material in "/dev/shm", but have not much experience. By a little clever planning, one can write the temporary files in a much faster memory disk that will be easily disposed of and, so far as I can see today, do not count within the file quota. This is not to be taken lightly. I've compared the time required to compile R using the network file storage against the local temporary storage. The difference is 45 minutes versus 15 minutes.
Danger: new smaller memory default!
At the user meeting on April 12, we found out that requesting 1 core will automatically provide only 500MB of memory. This is a BIG change, because in older cluster we received 2GB per core and that was generally sufficient. That is to say, we almost always did not specify memory.
The default interactive session is not likely to be sufficient, so it will be required to specify memory.
As a result, the command to ask for 1 node with 1 processor (core) on that node would be
msub -X -I -l nodes=1:ppn=1,pmem=2048m
This asks for graphics X11 forwarding (-X). The memory can also be specified as "2gb".
If you only want 1 core on 1 node, the simpler notation would be to use the flag "procs".
msub -X -I -l procs=1,pmem=2048m
To ask for several cores on 1 node (test multicore project), run
msub -X -I -l nodes=1:ppn=5,pmem=2048m
** Specify a queue **
Interactive jobs can be run on any queue. By default, they go to the user's nodes.
The default queue is displayed with 'mystats'. If you wish to run on a node that is not in your owner group, like a GPGPU node, you will then need to specify the sixhour queue and the node name. You will only have a maximum of 6 hours on this node. There is no time limit to your default queue.
msub -X -I -l nodes=1:ppn=5,pmem=2048m -q sixhour
One can specify a particular node, "g0001", with a request likee:
msub -X -I -lnodes=g001:ppn=1 -q sixhour
CRC made a page regarding queues and has relocated it at http://crc.ku.edu/using-hpc#Submitting http://crc.ku.edu/queues
Update 20170413
We requested a simpler way to launch the usual type of interactive session--one node, one core--as we had in the old cluster. The administrators created a script "qxlogin" which the user can run from the login node.
$ qxlogin
qsub: waiting for job 40565091.sched to start
qsub: job 40565091.sched ready
We suggest caution with this, since the new memory default limit is 500MB and CRMDA users have regularly reported frustration with unanticipated job failures.
In case you want to write your own login script, you can take an example from the new qxlogin, which I found is installed in /usr/local/bin on the new cluster.
$ cat /usr/local/bin/qxlogin
#!/bin/sh
ARGS=$@
/opt/moab/bin/msub -X -I -lnodes=1:ppn=1 $ARGS
If you want more interactive nodes, or more ppn, just change the 1's. To test that, suppose you save it as "qxlogin2", then run
$ sh qxlogin2
If you enjoy the result, save that file in your $HOME/bin directory, make it executable, and then it will be more generally available within your sessions. After that, there is no need to run "sh" before "qxlogin2". Try it out, let me know if there is trouble.
We will have a cluster update meeting on Friday at 10AM in Watson Room 440D (within the suite of the Digital Humanities group).
Today the Center for Research Computing announced the re-opening of the compute cluster. A number of features we have come to depend on were removed. All of the CRMDA documentation (http://crmda.ku.edu/computing) will need to be revised. This will take some time. These changes were not well publicized during the six-month-long runup to the cluster administration changeover, we are playing catchup.
They have declined to support NoMachine GUI connections and that the cluster storage is not externally accessible via Windows Server or Network File System protocols. We will have to find ways to navigate around those changes.
The top priority right now is updating the hpc example collection,
https://gitlab.crmda.ku.edu/crmda/hpcexample
Most of that work has been kindly attended to by Wes Mason at KU ITTC.
Here is a copy of the announcement.
Over the course of the last few weeks we have been working to transition the administration of the KU Community Cluster to the Center for Research Computing (CRC). We have completed testing with a subset of users and we are now restoring access for all users who are part of an owner group. If you know someone in your group that did not get this announcement, please email crchelp@ku.edu.
We have kept the underlying legacy software environment the same to make this transition simpler, but have made some improvements and updates that you will need to be aware of to use the cluster. We will be building upon these initial improvements over the coming months to standardize, implement best practices, update and integrate the software stack, provide transparency of resources utilization, integrate with KU, and help you optimize your use of the cluster.
We have integrated with KU's identity management system so you will use your KU username and password to access the cluster. We have 2 login nodes that you will randomly be assigned to when you login to the address:
> KU_USERNAME@hpc.crc.ku.edu
'env-selector' was removed and only 'module' is available to load different software packages.
When issuing the command:
> module avail
you will see the new software we have compiled that is optimized for the latest version of the CPUs in the cluster.
To see the software installed before this transition, you must enter:
> module load legacy
and then you can see all legacy software by entering the command:
> module avail
You must place these commands in your job submit scripts as well if you choose to use the legacy software.
'qsub' has been replaced with 'msub'. All your submit scripts will still work with 'msub'. The #PBS directives in your job submit scripts are also compatible with "msub', but we suggest when you create new job submit scripts to use the #MSUB directives.
Your home directory now has a 100GB quota. We have integrated the cluster with KU's identity management system so your home directory also matches the KU home directory path (e.g., /home/a123b456).
All data from /research, /projects, /data, and if you had your own root directory (for example: /compbio), this has all been placed in
/panfs/pfs.local/work/<owner group>/<user>
If your owner group has used all their storage allocation or if your group does not have a storage allocation, some of your data had to be moved to $SCRATCH:
/panfs/pfs.local/scratch/<owner group>/<user>
We organized the data to better keep track of usage for owner groups. Scratch has been set up in the same manner. Some groups were previously allocated more storage than they purchased and you will see your quota for your $HOME, $WORK, and $SCRATCH directories when you log on. If you see any directory at 100%, then you must remove files before writing to it.
To see your quota, group, and queue stats at anytime, run:
> mystats
on the submit nodes.
NO data was deleted. If you see that you are missing something, please contact crchelp@ku.edu. Please check all paths first, please.
Your default queue will be displayed when you log in. This is the queue you will run in if you do not specify a queue name. If you wish to run across the whole cluster, you must specify:
#MSUB -q sixhour
in your job script or from command line:
> msub -q sixhour
You may only run a maximum of 6 hours on the 'sixhour' queue, but your jobs goes across all nodes.
Most users will only have access to their owner group queue and the 'sixhour' queue. Others will be part of multiple groups and have access to other queues as well.
All of this information will be displayed when you login to the cluster for at least the first few months after coming back online.
We are continuing to write documentation and help pages about the new setup of the cluster. These pages can be found at https://crc.ku.edu under the HPC tab and more will be added as time goes on so check back often. We will also have an introduction to the cluster next Wednesday, March 8, at 10:30am during our regular monthly HPC meeting (location TBD).
We understand that change can some times be a little jarring so if you have any questions feel free to contact us at crchelp@ku.edu and we will get back to you as soon as we can.
Thank you, Center for Research Computing Team
In the high performance computing example archive, we've just inserted Example 05, a long-running multi-core Mplus exercise.
https://gitlab.crmda.ku.edu/crmda/hpcexample/tree/master/Ex05-Mplus-1
This one demonstrates how I suggest we ought to keep the data, code files, and output files in separate folders, even if we are using Mplus!
Special thanks to Chong Xing, of the KU Dept. of Communications, for the example and the real-life data set that goes with it. This explores mediation in an structural equation model with the Children of Immigrants data set.
We are having a little practice session. Quick notes about working through the examples in Matthew L. Jockers fine book, Text Analysis with R for Students of Literature.
Browse here:
http://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature
and download the zip file that it points at here:
http://www.matthewjockers.net/wp-content/uploads/2014/05/TextAnalysisWithR.zip
Save that zip file INSIDE a project folder. My folder is called R-text.
Unzip that package! (Don't just let the file manager peer inside it. You need to extract it.) It creates a directory called:
TextAnalysisWithR
Inside there, there is a directory structure, including text files and R code. Use the file manager to change into the directory "start.up.code" and then a chapter, such as "chapter.3".
If you open the R file in an R-aware editor (eg Rstudio), the code won't run as it is. But is easy to fix. Change the name of the data file by inserting "../../" at the beginning. Like so
text.v <- scan("../../data/plainText/melville.txt", what="character", sep="\n")
After doing that, you can step through the example code line by line.
It appears to me the starter code has all of the basic data manipulation work. It does not include the code to manufacture graphs.
We can explore that when we meet together...
We've been reworking the high performance computing examples so that they line up with the latest and greatest advice about how to organize submissions on the ACF cluster computing system. Please update your copy of the hpcexample archive (see http://crmda.ku.edu/parallel-programs).
In the process, we notice that some updates are possible in our "portableParallelSeeds" package for R. Because this package alters the random number generator structure in the R environment, we are not releasing it to the CRAN system. It can be installed from our KU server, however. We suggest you try the following;
CRAN <- "http://rweb.crmda.ku.edu/cran"
KRAN <- "http://rweb.crmda.ku.edu/kran"
options(repos = c(KRAN, CRAN))
install.packages("portableParallelSeeds")
Remember this: if you want to get updates by running "update.packages()" inside R, it is necessary to run the first 3 lines here to set your system to look for packages in KRAN.
The portableParallelSeeds package is delivered with two vignettes (essays) named "PRNG-basics" and "pps". To see all about it, run
help(package = "portableParallelSeeds")
Along with the package, a prototype design for Monte Carlo simulations is included. It is in the install folder of the package. There is a directory named "examples" and the prototype is "paramSweep-1.R".