Digesting Data 2019.03

Cool Stuff

Python

If you use numpy or pandas, check out numba. Numba provides decorators that work with your existing numpy code but compiles it for performance that can match C or FORTRAN. And since pandas uses numpy under the hood, you can also get speed increases in pandas computation.

R

The CRAN-like package repository manager drat is a fantastic way to store R packages in a CRAN-like repository, both locally and on remote web-servers (including on github). It has also been a fantastic tool for versioning and testing as I develop packages that must work together. I can easily install older version of package ‘A’ or ‘B’ and ensure they are compatible.

Databases

Getting a database up and running is no small task. I love PostgreSQL, but configuring it correctly can be a nightmare. The good people at 2ndQuadrant have created a postgres installer that makes things much easier.


New Releases and Developments

R

The latest issue of the R Journal was released last month. Check it out here. The R Journal details new packages, updates to the R compiler, and more, and is a good read for more advanced users. I especially enjoyed this article on alternatives to the pipe operator and I’m now excited about this package for representing measurement error.

For the tidyverse lovers, dplyr has a new release (v0.8) with some important bugfixes and new features. dplyr operates on an even-odd release schedule, with only odd-numbered releases bringing breaking changes and deprecations, so upgrade without fear!

Python

A new minor release of IPython is available. This version includes new cell magics %pip and %conda for installing python packages in the current session.


Recent Blog Posts

I’ve been working with dplyr a lot lately and have gained some respect for how smooth it can be to manipulate remote database tables as if they were local. “Binning Columns […]” compares a few ways to approach one of the more difficult operations on remote tables.