kutils Package for R: New Updates Available

The KRAN server, an R repository hosted by the Center for Research Methods and Data analysis at the University of Kansas, offers packages being prepared for researchers that use R. A suite of tools for research project management, dubbed "kutils", is undergoing rapid development. This includes functions that can initialize projects, scan input data, create quick overviews of the information in the data, and guide the recoding and data refactoring process. The package includes the Variable Key System, a framework to import and revise data in a team/project oriented manner. An essay about this is available (and also is included with the package): The Variable Key Data Management Framework.

In case you might like to try this out, the KRAN server is available. It will be necessary to pair this with the general purpose CRAN server (and any other servers you use currently, such as OmegaHat or Bioconductor). We suggest trying this R code:

CRAN <- "https://rweb.crmda.ku.edu/cran"
KRAN <- "https://rweb.crmda.ku.edu/kran"
options(repos = c(KRAN, CRAN))
## We suggest installing updates, but this next step is not required
update.packages(ask = FALSE, checkBuilt = TRUE)
## Then install our new package
install.packages("kutils", dep = TRUE)

In case you use Bioconductor, for example, here is the way we integrate it with the update process

CRAN <- "https://rweb.crmda.ku.edu/cran"
KRAN <- "https://rweb.crmda.ku.edu/kran"
BIOC <- "https://www.bioconductor.org/packages/3.3/bioc"
options(repos = c(KRAN, CRAN, BIOC))
## We suggest installing updates, but this next step is not required
update.packages(ask = FALSE, checkBuilt = TRUE)
install.packages("kutils", dep = TRUE)

After running that, then using

kutils is as easy as:

library(kutils)

The functions that we suggest you check first are

peek and initProject. We include with this package the first fully functional version of the Variable Key, a custom development process developed within CRMDA. The variable key offers an enhanced project management framework along with an easier-to-use, tabular system of notation that makes it easier for non-technicians to guide and supervise research exercises. There is a vignette about the Variable Key provided with the package. After library(kutils), just run

vignette("variablekey")

and a PDF should be displayed. If your system's PDF viewer can't be found by R, you'll get an error message that points you in the right direction. The most recent changes in the package concern the Variable Key. We have streamlined the "round trip" research process. A researcher should approach data importation and re-coding in these steps.

  1. Run keyTemplate. This scans the data and creates a table that can be used for recoding.
  2. Edit the key template document. This can be done in a spreadsheet program (MS Excel) or a text editor such as Emacs, Notepad++, Sublime Text, or Textmate (basically, any programmer's file editor, NOT Microsoft Word)
  3. Run keyImport.  This reviews the requested data revisions from the revised template.
  4. Run keyApply. The requested data changes in the new key are applied to the data frame being considered. In our original design, we thought the four step process was the end of this. However, we have run into a few cases in which the Variable Key system exposes problems in the original data frame that cause the data owners to revise their data frame.  Our original thought was that the teams that revise the data will repeat the original 4 step process--create a new key template, revise it, import and apply it. However, in a case where the key includes 100s of variables, this implies a lot of repeated work. In the most recent version of kutils, we include a new function that adds a fifth step that can address this situation.
  1. Run keyUpdate. This scans the new data, checks for new variables and new values, and then incorporates them into the previously prepared variable key.

While this is the newest function in a

development package, we encourage researchers to try it and let us know how it works. For troubleshooting purposes, here is the sessionInfo output.

Update 2016-10-28. A zip file variablekey-anes-20161028 is available with an example that we have used to test the variable key setup. This revealed some challenges with "fancy quotes" that we need to solve in the future. If a person edits the key and inserts fancy slanted quotes, then the re-import process fails because slanted quotes are unrecognized.

> sessionInfo()
    R version 3.3.1 (2016-06-21)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 16.04.1 LTS
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
     [9] LC_ADDRESS=C               LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base
    
    other attached packages:
    [1] kutils_0.31
    
    loaded via a namespace (and not attached):
    [1] plyr_1.8.4     tools_3.3.1    Rcpp_0.12.7    xtable_1.8-2   openxlsx_3.0.0
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *