Posts tagged “R”

Some more graphs of Beijing's Air Pollution

A bunch of folks across the internet have been doing some great stuff with the air quality data coming out of China via official channels and the US Embassy twitter feeds. My advisor asked for some graphs of available data. They are posted below (all were created in R using ggplot2). If time ever permits, I’ll post some interactive visualizations.

Shiny Server on WebFaction

Update: WebFaction released today a one-click installed for node.js, obviating Step 2 below. Leaving it in here for posterity.

Shiny “makes it super simple for R users like you to turn analyses into interactive web applications that anyone can use.” It’s a powerful tool with a relatively simple syntax. It’s great for local apps — but I wanted to set up a web-based app that others could access and that wasn’t beholden to Shiny and RStudio’s excellent beta server platform.

I host this site and a few others at WebFaction — an awesome service with little to no downtime, fast servers, and relatively flexible restrictions. Getting Shiny up and running on WebFaction required a little work.

Step 1: SSH into WebFaction. Follow the instructions on their website for your specific server(s).

Step 2: Make a source directory. Download and install node.js.

mkdir src
cd src
wget 'http://nodejs.org/dist/v0.10.20/node-v0.10.20.tar.gz'
tar -xzf node-v0.10.20.tar.gz
cd node-v0.10.20
python2.7 configure --prefix=$HOME
make PYTHON=python2.7
make PYTHON=python2.7 install

export NODE_PATH="$HOME/lib/node_modules:$NODE_PATH"
echo 'export NODE_PATH="$HOME/lib/node_modules:$NODE_PATH"' >> $HOME/.bashrc 

Step 3: Download and install R.

#install R
wget 'http://cran.us.r-project.org/src/base/R-3/R-3.0.2.tar.gz'
tar -xzf R-3.0.2.tar.gz
cd R-3.0.2
./configure --prefix $HOME
make
make install

Step 4: Make a temp/tmp/temporary director.

cd $HOME
mkdir tmp
chmod 777 tmp
TMPDIR=$HOME/tmp
export TMPDIR

Step 5: Download Shiny from source and install using NPM.

git clone https://github.com/rstudio/shiny-server.git
npm install -g shiny-server/

installing from NPM directly did not work — Shiny would not launch. I believe this is because you’re not allowed root access on WebFaction shared accounts.

Step 6: Launch R and install whatever packages you need.

install.packages('ggplot2')
install.packages('data.table')
devtools::install_github("ShinyDash", "trestletech")
devtools::install_github("shiny-incubator", "rstudio")

Step 7: Want plots to work? In your Shiny app’s global.R file, set

options(bitmapType = 'cairo')

Next up: a cron job to keep a Shiny instance running or to restart it if it goes down… and putting Shiny behind some light authentication to prevent pre-release apps from general consumption.

Batch Download IHME's Global Burden of Disease Data

A few requests had come in to download around 12 countries worth of the recently released Global Burden of Disease from the IHME website. There’s no way to quickly download multiple files; by my count, it requires you to type the country name, click a link, click a tab, and then option-click a CSV file.

The URLs had relatively similar construction, so I wrote a quick R script to download all of the data and save each one as a separate compressed RDS file. I also dropped a couple of redundant columns to try to save some space. The compression is pretty efficient; 25-27 MB files were reduced to between 6.6 - 7.4 MB. Check it out here or below.

Update (April 2015): Updated to allow users to specify download location, making it work better ‘out of the box’; users can specify whether to download as CSV or RDS (or both); fixed some other minor bugs; fixed a major change in the URL structure.

R + Global Burden of Disease / Comparative Risk Assessment Data: A tutorial (version 0.1)

R can be scary for those new to it, but it is exceptionally useful for a number of things, including managing, importing, and merging text files; resaving them; and performing statistical analyses to your heart’s content. It is your friend, albeit one that you must learn to love slowly and painfully.

This brief tutorial does not serve as an introduction to R. Instead, it focuses on reading in a large, complex data set with ~1 million rows and 50+ columns. It was created to help facilitate some analysis in a GBD course at Berkeley. It will help you figure out how to do some basic manipulation and subsetting and export these subsetted data into a comma-separated text file (“csv”) for analysis in your favorite spreadsheet program. It is a work in progress and will be updated over time.

Guess which state has the most medal winners from the Great American Beer Festival?

Subtitle: Mapping the 2012 GABF Winners Using R

The Great American Beer Festival (GABF) announced its winners on October 13. Lots of amazing beers from all over the US. They have a nifty search feature which lets you (1) find beers from specific states, (2) search by year of competition, (3) search by award - gold, silver, or bronze, and (4) search by keyword.

Like a true beer-loving nerd, I was curious to see which state won the most awards and to look at the geographic distribution of winners. I also needed to learn how to make simple maps using R for some work related stuff. The confluence of curiosity and need got me giddy… and set me to work. Turns out that making simple maps in R is… simple.

More on the details of the process in a few days (along with a table outlining the above data). In the meantime, revel in the beer mecca that is California.

all rights reserved
snarglr is written & maintained by ajay pillarisetti



click here to turn on all posts