R is both a language as well as an environment for doing statistical
analysis. R is available as Free Software under the GPL. For those
familiar with environments such as S, MatLab, and SAS - R serves the
same purpose. It has powerful constructs for manipulating arrays,
packages for importing data from various datasources such as relational
databases, csv, dbf files and spreadsheets. In addition it has an
environment for doing graphical plotting and outputting the results to
both screen, printer and file.
For more information on R check out R Project for Statistical Computing.
What is PL/R?
PL/R is a PostgreSQL language extension that allows you to write
PostgreSQL functions and aggregate functions in the R statistical
computing language.
With the R-language you can write such things as aggregate function for median
which doesn't exist natively in PostgreSQL and exists only in a few
relational databases natively (e.g. Oracle) I can think of. Even in
Oracle the function didn't appear until version 10.
Another popular use of R is for doing Voronoi diagrams
using the R Delaunay Triangulation and Dirichlet (Voronoi) Tesselation
(deldir) Comprehensive R Archive Network (CRAN) package. When you
combine this with PostGIS you have an extremely powerful environment
for doing such things as nearest neighbor searches and facility
planning.
In the past, PL/R was only supported on PostgreSQL Unix/Linux/Mac OSX
environments. Recently that has changed and now PLR can be run on
PostgreSQL windows installs. For most of this exercise we will focus on
the Windows installs, but will provide links for instructions on
Linux/Unix/Mac OSX installs.
Installing R and PL/R
In order to use PLR, you must first have the R-Language environment
installed on the server you have PostgreSQL on. In the next couple of
sections, we'll provide step by step instructions.
Installing PostgreSQL and PostGIS
It goes without saying. If
you don't have PostgreSQL already - please install it and preferably
with PostGIS support. Checkout Getting started with PostGIS
Installing R
- Next install R-Language:
- Pick a CRAN Mirror from http://cran.r-project.org/mirrors.html
- In Download and Install R section - pick your OS from the
Precompiled binary section. In my case I am picking Windows (95 and
later). Note there are binary installs for Linux, MacOS X and Windows.
- If
you are given a choice between base and contrib. Choose base. This will
give you an install containing the base R packages. Once you are up and
running with R, you can get additional packages by using the builit in
package installer in R or downloading from the web which we will do
later.
- Run the install package. As of this writing the latest version of R
is 2.10.1. The windows install file is named R-2.10.1-win32.exe
- Once you have installed the package - open up the RGUI. NOTE: For
windows users - this is located on Start menu - Start -> Programs -
>R -> R. 2.10.1. If for some reason you don't find it on the start
menu - it should be located at "C:\Program Files\R\R-2.10.1\bin\Rgui.exe". If you are on a 64-bit system this will be in C:\Program Files (x86)\R\R-2.10.1
- Run the following command at the R GUI Console.
update.packages()
- Running the above command should popup a dialog requesting for a CRAN MIRROR - pick one closest to you and then click OK.
- A sequence of prompts will then follow requesting if you would like to replace existing packages. Go ahead and type y to each one. After that you will be running the latest version of the currently installed packages.
Installing PL/R
Now that you have both PostgreSQL and R installed, you are now ready to install PLR procedural language for PostgreSQL.
- Go to http://www.joeconway.com/plr/
- For non-Windows users, follow the instructions here http://www.joeconway.com/plr/doc/plr-install.html.
For Windows users:
- download the installation file from step 6 of http://www.joeconway.com/web/guest/pl/r
- As of this writing, there is no install setup for PostgreSQL 8.3/8.4/9* for windows. So what you need to do is copy the plr.dll into your PostgreSQL/(8.3/8.4/9.1/9.2/9.3)/lib folder.
If you are installing on PostgreSQL 9.1+, make share to copy the .control, .sql files to share/extension folder.
- Set the enviroment variable (you get here by going to Control Panel -> System ->Advanced ->
Environment Variables
- Add an R_HOME system variable and the R_HOME location of your R install. If you are on a 64-bit system and running 32-bit - it will be installed in Program Files (x86). On 32-bit (or a 64-bit running 64-bit install it will be installed in Program Files.
If you are running R version 2.12 or above on Windows, the R bin folder has changed. Instead of
bin it's
bin\i386 or
bin\x64. Also if you install the newer version, you'll need to use the binaries and manually register the paths and R_HOME yourself since the installer will not install. You can still use the plr.dll etc. See our other
Quick Intro to PL/R for more details and examples.
- Edit Path system variable and add the R bin folder to the end of it. Do not remove existing ones, just add this to the end
- Restart your PostgreSQL service from control panel -> Services. On rare circumstances, you may need to restart the computer for changes to take effect.
Loading PL/R functionality into a database
In order to start using PL/R in a database, you need to load the help functions in the database. To do so do the following.
- Using PgAdmin III - select the database you want to enable with PL/R and then click the SQL icon to get to the query window.
For users running PostgreSQL 9.1+, install by typing in SQL window:
CREATE EXTENSION plr;
If you are running on PostgreSLQ 9.0 or lower you have to install using the plr.sql file. Choose -> File -> Open ->
path/to/PostgreSQL/8.4/contrib/plr.sql (NOTE: on Windows the default
location is C:\Program Files\PostgreSQL\8.4\contrib\plr.sql
- Click the Green arrow to execute
Testing out PL/R
Next run the following commands from PgAdminIII or psql to test out R
SELECT * FROM plr_environ();
SELECT load_r_typenames();
SELECT * FROM r_typenames();
SELECT plr_array_accum('{23,35}', 42);
Next try to create a helper function (this was copied from (http://www.joeconway.com/plr/doc/plr-pgsql-support-funcs.html) - and test with the following
CREATE OR REPLACE FUNCTION plr_array (text, text)
RETURNS text[]
AS '$libdir/plr','plr_array'
LANGUAGE 'C' WITH (isstrict);
select plr_array('hello','world');
Using R In PostgreSQL
Creating Median Function in PostgreSQL using R
Below is a link creating a median aggregate function. This basically
creates a stub aggregate function that calls the median function in R.
http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html
NOTE: I ran into a problem here installing median from the plr-aggregate-funcs via PgAdmin. Gave R-Parse error when trying to use the function. I had to install median function by removing
all the carriage returns (\r\n) so put the whole median function body in single line like below to be safe. Evidentally when copying from IE - IE puts in carriage returns instead of unix line breaks. When creating PL/R functions make sure to use Unix line breaks instead of windows carriage returns by using an editor such as Notepad++ that will allow you to specify unix line breaks.
create or replace function r_median(_float8)
returns float as 'median(arg1)' language 'plr';
CREATE AGGREGATE median (
sfunc = plr_array_accum,
basetype = float8,
stype = _float8,
finalfunc = r_median
);
create table foo(f0 int, f1 text, f2 float8);
insert into foo values(1,'cat1',1.21);
insert into foo values(2,'cat1',1.24);
insert into foo values(3,'cat1',1.18);
insert into foo values(4,'cat1',1.26);
insert into foo values(5,'cat1',1.15);
insert into foo values(6,'cat2',1.15);
insert into foo values(7,'cat2',1.26);
insert into foo values(8,'cat2',1.32);
insert into foo values(9,'cat2',1.30);
select f1, median(f2) from foo group by f1 order by f1;
In the next part of this series, we will cover using PL/R in conjunction with PostGIS.
Post Comments About PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide