r-directory > Blog > Installing R & RStudio on a Rackspace Server

Installing R & RStudio on a Rackspace Server

By

Create a cloud server to handle your R data processing needs.Working in the field of data science, at some point you'll need computational horsepower. I'm working from a Mac Book Pro, which has been great for learning, but not enough to do some of the things that I'd like to be able to do. In working on a Kaggle competition, I had one algorithm take more than 12 hours. That's when I decided to look for another option.

Initially I looked at buying a used workstation with no O/S. Shopping Craigslist and eBay you can find a suitable system with 16 cores for less than $600, so this seemed like a good option. Then it occurred to me that Rackspace let's you spin up servers as needed and delete them just as quickly. The catch is that you have to put together the server that you want.

I did a little research and found that you can build your server on a small system, save an image of that server, then boot up a powerhouse with that image. That's pretty awesome. Their prices are very reasonable, documentation & support is fantastic to boot. I've created my image and run a couple of jobs on their hardware, and my bill so far this month is less than $20. No need to buy a workstation.

By the way, I know that this is possible with Amazon as well. I love shopping at Amazon, but AWS isn't my favorite. Call it personal preference.

Getting Started

If you know how to use R then you shouldn't have any trouble with this. I've fully documented my steps below, and walked through them a second time to make sure they work, so you should be able to get through it by following this article. Having said that, you should have some familiarity with Linux.

Quick note on Linux flavors. RStudio should work with any version of Linux that you care to use. These instructions are specific to Ubuntu. I tried them with both 12.04 LTS and 13.10. These instructions apply to either version.

  1. Log into your Rackspace account (create one if needed).
  2. Click Create Server to get started.
  3. Name your server and choose the location closest to you.
  4. Select Ubuntu as your operating system, and either 12.04 (Precise Pangolin) or 13.10 (Saucy Salamander) for the version.
  5. For Flavor, select Performance 1 and 2 vCPUs with 2 GB of RAM. This should be 8¢ per hour.
  6. Scroll to the bottom of the page and click Create Server.
  7. Rackspace will give you the initial root password. Make a note of this.

Root password generated when the server is created. Save this!

  1. Once the server is ready (it'll take a few minutes), look for its IP address on the right side under Managing Your Server.

Congratulations. You now have a server in the cloud. We're done with the web interface for a while. The rest of our work will be done within the terminal.

Securing The Server

Next we'll address a few security issues and update the server.

  1. Open the terminal on your system. OS X and Linux users, you have a terminal. Windows users will need to use PuTTY
  2. Log into your server with the following command, ssh root@xxx.xxx.xxx.xxx, where the x's represent your server's IP address. The first time you connect you'll see a message about an unrecognized RSA key. Go ahead and accept that.

$ # Log in to your server
$ ssh root@xxx.xxx.xxx.xxx

$ # Change the root password.
$ passwd

$ # Update the server
$ apt-get update
$ apt-get upgrade --show-upgraded

  1. Next you'll create a couple of new users. One is so that we no longer need to log in as root (for system security). The other will be used to run RStudio.

$ # Add a new user. Replace  with whatever you choose.
$ # Follow the on screen prompts to create the new user.
$ adduser <sudo_username>

$ # Grant sudo privileges to your new user.
$ adduser <sudo_username> sudo

$ # Add one more user.
$ adduser <rstudio_user>

  1. Next we'll restrict access to the server. You'll use vi to edit a text file. Just in case you get stuck, here's a list of vi commands.

$ # Prevent root from logging in. Once the file is open, scroll down until
$ # you see PermitRootLogin and change that setting to no
$ vi /etc/ssh/sshd_config

  1. Scroll through the file to find PermitRootLogin and change that setting to no.
  2. Save your changes and close the file by typing :wq and hit <Return>.
  3. Log out and log back in as your sudo user.

$ # Restart the ssh service to apply these updates.
$ service ssh restart

$ # Log out
$ logout

  1. This time log in as your newly created sudo user.

ssh @xxx.xxx.xxx.xxx

Now it's time to configure a firewall.

$ # Check existing firewall rules. This will ask you for your password.
$ sudo iptables -L

$ # Create a file to store rules.
$ touch /etc/iptables.firewall.rules

$ vi /etc/iptables.firewall.rules

Copy & paste this:

*filter

#  Allow all loopback (lo0) traffic and drop all traffic to 127/8 that doesn't use lo0
-A INPUT -i lo -j ACCEPT
-A INPUT -d 127.0.0.0/8 -j REJECT

#  Accept all established inbound connections
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

#  Allow all outbound traffic - you can modify this to only allow certain traffic
-A OUTPUT -j ACCEPT

#  Allow HTTP and HTTPS connections from anywhere (the normal ports for websites and SSL).
-A INPUT -p tcp --dport 80 -j ACCEPT
-A INPUT -p tcp --dport 443 -j ACCEPT
-A INPUT -p tcp --dport 8787 -j ACCEPT

#  Allow SSH connections
#  The -dport number should be the same port number you set in sshd_config
-A INPUT -p tcp -m state --state NEW --dport 22 -j ACCEPT

#  Allow ping
-A INPUT -p icmp -j ACCEPT

#  Log iptables denied calls
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7

#  Drop all other inbound - default deny unless explicitly allowed policy
-A INPUT -j DROP
-A FORWARD -j DROP

COMMIT

  1. Save your changes and close the file by typing :wq and hit Enter.
  2. Activate your new rules, then verify that they're running.

$ # Activate the rules.
$ sudo iptables-restore < /etc/iptables.firewall.rules

$ # Recheck the existing firewall rules.
$ sudo iptables -L

  1. Ensure that these firewall rules are activated each time the system starts.

$ # Create a new file that calls the iptables rules.
$ vi /etc/network/if-pre-up.d/firewall

Copy & paste this into the file.

#!/bin/sh
/sbin/iptables-restore < /etc/iptables.firewall.rules

  1. Make the new file executable.

$ # Make the file executable.
$ sudo chmod +x /etc/network/if-pre-up.d/firewall

That's it for system setup & security. The hard part is over, and we're almost done.

Install R

  1. Next you'll install R.

$ # Install R.
$ sudo apt-get install r-base

This ran without error for me. If you have any difficulty with this, you’ll need to add your closest CRAN mirror to your /etc/apt/sources.list file. Check the CRAN Mirrors list for the server closest to you.

Once that’s complete, you’ll have R installed. You can now run it at any time by typing R at the prompt. Note that it’s case sensitive, typing r will get you an error message.

Install RStudio

  1. Verify that you're in your home directory (to make it easy to find this later if needed).

$ cd $HOME

  1. Install a couple of required packages.
  2. Download RStudio. Note that you'll want the 64-bit version. You can look up the latest version available on the RStudio Server Downloads Page.

$ sudo apt-get install gdebi-core
$ sudo apt-get install libapparmor1
$ wget http://download2.rstudio.org/rstudio-server-xx-amd64.deb
$ sudo gdebi rstudio-server-xx-amd64.deb

You now have RStudio up & running. You can access it by opening http://xxx.xxx.xxx.xxx:8787 in your browser.

Create the image

I don’t recommend leaving this server up & running all the time. The benefit of doing this with Rackspace is that you can power servers up & down as needed. So save your server by creating an image.

Notes

Don’t lose your passwords. You’ve deleted the server, but when you launch it again you’ll need those to access it.

Also, when accessing RStudio, only log in using the <rstudio_user> user that you created. All of the work that you do in RStudio will be over an unsecured connection. This server will only be online for a few hours at most, so I'm not paranoid about security, but there's no need to tempt fate.

When you're working with the server, you'll need to find a way to load your data onto it. To save on costs I've been cleaning/preparing data on my system, then uploading it to Rackspace Files and accessing it there. Any output from the server should be stored in the $HOME directory of your <rstudio_user>.

That's it, you're done! You just created an RStudio server. Total cost to create this for me was less than 50¢. The next time you need horsepower to crunch your data, you can access 32 cores with 120 GB of RAM within minutes. Power these up & down as needed.

comments powered by Disqus comments powered by Disqus
The Short List

These are the sites that are visited most frequently.

Recent Blog Posts