PostGIS Spatial Database Engine UMN Mapserver Boston Geographic Information Systems    Checkout our PostGIS in Action book.  First chapter is a free download   PostGreSQL Object Relational Database Management System
GIS Books  Home   Consulting Services  About Boston GIS   Boston GIS Blog  Postgres OnLine Journal
PostGIS in Action is out in hard-copy,
download the first chapter
and SQL Primer for free. Tips and Tricks for PostGIS
  GIS Article comments Comments Rss
PLR Part 2: PL/R and PostGIS

PL/R and PostGIS

In this tutorial we will explore using PostGIS and PL/R together. Some examples we will quickly run thru.

  • Median Function in conjunction with PostGIS
  • Voronoi Diagrams

If you missed part one of our series and are not familiar with R or PL/R, please read
Getting started with PostGIS and PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide

Next create a new PostgreSQL database using the template_postgis database as template and load in the PLR support functions as described in the above article. For this particular exercise, we will assume the database is called testplrpostgis.

Loading Some Test GIS Data

We will be working with census data from MassGIS. Click the following "Download This Layer" - census2000blockgroups_poly.exe, from this link
Get the Massachusetts Town Polygon geometries from this link Townsurvey points - ESRI Shapefiles
Get the Massachusetts Census Blocks from this link

Get Massachusetts Schools Layer from here

Extract the files into a folder (running the exe,right-click extract, and for Linux/Unix/MacOS just extracting with unzip or stuffit or some other similar package)

Now load the shape files into your test database using shp2pgsql by doing the following at your OS commandline. Note for non-windows users or if you installed PostgreSQL in a non-standard directory - change the path to yours - default being "C:\Program Files\PostgreSQL\8.2\bin\". Also replace items in italics with those that match your environment.
"C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\census2000blockgroups_poly cens2000bgpoly > cens2000bgpoly.sql

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f cens2000bgpoly.sql

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_cens2000bgpoly_the_geom ON cens2000bgpoly USING GIST(the_geom)"

"C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\TOWNSSURVEY_POLY towns > towns.sql

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f towns.sql

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_towns_the_geom ON towns USING GIST(the_geom)"

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_towns_town ON towns USING btree(town)"

"C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\SCHOOLS_PT schools > schools.sql

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f schools.sql

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_schools_the_geom ON towns USING GIST(the_geom)"

"C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\census2000blocks_poly cens2000blocks > cens2000blocks.sql

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f cens2000blocks.sql

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_cens2000blocks_the_geom ON cens2000blocks USING GIST(the_geom)"

"C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "vacuum analyze"

Using Median function in conjunction with POSTGIS constructs

In this example - we will find out the median population of a town census blockgroup, total population of all block groups within the town, and average population of a town census blockgroup, and count of block groups for each town that is within each towns boundary for all Massachusetts towns. Note: we have to do a geomunion here because the towns layer has multiple records per town. Unioning will consolidate so we have a single multipolygon for each town.

The below example assumes you installed the PL/R aggregate median function as described in part one of this tutorial.

SELECT, count(bg.gid) as total_bg, avg(bg.total_pop) as avg_bg_pop, 
median(bg.total_pop) as median_pop, sum(bg.total_pop) as totalpop
FROM cens2000bgpoly bg,
(SELECT, geomunion(the_geom) as the_geom 
     FROM towns GROUP BY t 
WHERE bg.the_geom && t.the_geom AND within(bg.the_geom, t.the_geom) 

The above took about 37859ms ms on my windows 2003 server and 71922 ms on my Windows XP pc, but leaving out the median call did not speed it up much so bottleneck is in the geometry joining part.
If the subquery was a commonly used construct, then we would materialize it as a table and index it appropriately.

CREATE TABLE townsingle AS 
SELECT MAX(gid) as gid,, geomunion(the_geom) as the_geom 
     FROM towns GROUP BY;

CREATE INDEX idx_townsingle_the_geom ON townsingle USING GIST(the_geom);
CREATE UNIQUE INDEX idx_townsingle_gid ON townsingle USING btree(gid); 
CREATE UNIQUE INDEX idx_townsingle_town ON townsingle USING btree(town);

Installing R Packages

Before we continue, we need to expand our R environment by installing deldir package which contains voronoi functionality. In order to do so, please follow these steps.

We will be using the deldir package - I listed some other packages here because they looked interesting, but we won't have time to explore those in too much detail in this exercise

  1. Open up your RGUI console
  2. At the console prompt type the following
    This command should pop open a dialog requesting you to pick a mirror. Pick one closest to you and click OK.
  3. At the console prompt type
    This command should pop open a dialog listing all the packages available in your selected CRAN mirror.

    If the above doesn't work try

    to get a command line listing of packages available at your chosen CRAN mirror.
  4. Hold the control-key down and select the packages listed above and then click ok
    For a list of what other packages are available and what they do check out

    Alternatively for example if you did available.packages(), you can install the packages individually by doing for example< br/> install.packages('deldir')
    where in this case deldir is the name of the package we wish to install.

Exploring Help in R

R offers some neat features as far as help goes that you can access straight from the R GUI console. Below are some useful commands to find out more about a package or drill in at examples.

  • List functions and summary about a package - help(package=<package name>) Example: help(package=deldir)
  • Search all help text for a particular phrase -'<phrase>') Example:'median')
  • See list of demos available in all installed packages - Example: demo(package = .packages(all.available = TRUE))
  • Load a library and then run a demo in that library
    demo(<demo name>) -
    Example: libary(gstat)

Delaunay Triangulation and Dirichlet Package

In this example we will use a voronoi PL/R implementation provided by Mike Leahy in the PostGIS users newsgroup -

The function and voronoi type code is repeated below: Run the below in PgAdmin or psql. Note for Windows users - if you are copying this - make sure to paste in Notepad and save the file and then open in PgAdmin or psql. This will strip the carriage returns and just leave line breaks. R parser chokes on carriage returns.

--This function uses the deldir library in R to generate voronoi polygons for an input set of points in a PostGIS table.

--  R-2.5.0 with deldir-0.0-5 installed
--  PostgreSQL-8.2.x with PostGIS-1.x and PL/R- installed

--Usage: select * from voronoi('table','point-column','id-column');

--  'table' is the table or query containing the points to be used for generating the voronoi polygons,
--  'point-column' is a single 'POINT' PostGIS geometry type (each point must be unique)
--  'id-column' is a unique identifying integer for each of the originating points (e.g., 'gid')

--Output: returns a recordset of the custom type 'voronoi', which contains the id of the
--  originating point, and a polygon geometry

create type voronoi as (id integer, polygon geometry);
create or replace function voronoi(text,text,text) returns setof voronoi as '
    # select the point x/y coordinates into a data frame...
    points <- pg.spi.exec(sprintf("select x(%2$s) as x, y(%2$s) as y from %1$s;",arg1,arg2))
    # calculate an approprate buffer distance (~10%):
    buffer = ((abs(max(points$x)-min(points$x))+abs(max(points$y)-min(points$y)))/2)*(0.10)
    # get EWKB for the overall buffer of the convex hull for all points:
    buffer <- pg.spi.exec(sprintf("select buffer(convexhull(geomunion(%2$s)),%3$.6f) as ewkb from %1$s;",arg1,arg2,buffer))
    # the following use of deldir uses high precision and digits to prevent slivers between the output polygons, and uses 
    # a relatively large bounding box with four dummy points included to ensure that points in the peripheral areas of the  
    # dataset are appropriately enveloped by their corresponding polygons:
    voro = deldir(points$x, points$y, digits=22, frac=0.00000000000000000000000001,list(ndx=2,ndy=2), rw=c(min(points$x)-abs(min(points$x)-max(points$x)), max(points$x)+abs(min(points$x)-max(points$x)), min(points$y)-abs(min(points$y)-max(points$y)), max(points$y)+abs(min(points$y)-max(points$y))))
    tiles = tile.list(voro)
    poly = array()
    id = array()
    p = 1
    for (i in 1:length(tiles)) {
        tile = tiles[[i]]
        curpoly = "POLYGON(("
        for (j in 1:length(tile$x)) {
             curpoly = sprintf("%s %.6f %.6f,",curpoly,tile$x[[j]],tile$y[[j]])
        curpoly = sprintf("%s %.6f %.6f))",curpoly,tile$x[[1]],tile$y[[1]])
        # this bit will find the original point that corresponds to the current polygon, along with its id and the SRID used for the 
        # point geometry (presumably this is the same for all points)...this will also filter out the extra polygons created for the 
        # four dummy points, as they will not return a result from this query:
        ipoint <- pg.spi.exec(sprintf("select %3$s as id, intersection(''SRID=''||srid(%2$s)||'';%4$s'',''%5$s'') as polygon from %1$s where intersects(%2$s,''SRID=''||srid(%2$s)||'';%4$s'');",arg1,arg2,arg3,curpoly,buffer$ewkb[1]))
        if (length(ipoint) > 0)
            poly[[p]] <- ipoint$polygon[1]
            id[[p]] <- ipoint$id[1]
            p = (p + 1)
' language 'plr';

After you have installed the above Voronoi function. Test it out with the following code

CREATE TABLE schools_voroni(gid int);

SELECT AddGeometryColumn('', 'schools_voroni', 'the_geom', 26986, 'POLYGON',2);

INSERT INTO schools_voroni(gid, the_geom)
    SELECT, v.polygon
        FROM voronoi('(SELECT gid, the_geom 
        FROM schools 
            WHERE town = ''BOSTON'' AND grades LIKE ''%3%'' AND type = ''PUB'') as bs', 'bs.the_geom', 'bs.gid') As v

ALTER TABLE schools_voroni
  ADD CONSTRAINT pk_schools_voroni PRIMARY KEY(gid);

Below is a map of the public elementary schools in Boston overlayed on the Voronoi Polygons of those schools

At this moment, some people may be thinking - Nice polygons, but what is this supposed to tell me about Boston public elementary schools? Well the short simplistic answer is that the Voronoi polygon of a school tells you theoretically what area a school serves, all else being equal. To find out more about this interesting topic, checkout Voronoi links and tidbits. For example we assume people prefer to send their kids to schools near them. With this information and using census blocks, we can theoretically figure out which areas are underserved based on population of elementary school kids in that voroni polygon area, relative to the number of schools serving that area. So a query something like (NOTE I had only population not elementary school population so this is theoretically flawed already.)

 CREATE TABLE vorhighdens As 
SELECT v.thegid, sum(pop_2000)/v.schcount As vpop, v.the_geom
FROM (SELECT MAX(gid) as thegid, COUNT(gid) As schcount, the_geom, area(the_geom) as the_area
FROM schools_voroni 
GROUP BY the_geom, area(the_geom)) As v 
    INNER JOIN cens2000blocks cb ON v.the_geom && cb.the_geom AND Within(cb.the_geom, v.the_geom)
GROUP BY v.the_geom, v.schcount, v.thegid

The darkened areas are where based on my simple model, more elementary schools need to be built. Please NOTE this model is gravely flawed for many reasons

  • I have not taken into consideration the size of each school (assumed all are the same size)
  • My model is using overall population which in certain areas - like where I live - very few kids live here since its filled with yuppie executives, DINKS, and retired wealthy folk so the population here is probably very skewed.
  • I assumed census blocks fit nicely in a voronoi zone which may or may not be true. In theory this is probably not too bad of an assumption
  • I assumed all elementary schools are created equal. For example some schools have special needs programs that others do not.

Hopefully there is enough here to demonstrate how a real model would work.

Post Comments About PLR Part 2: PL/R and PostGIS

PostGIS in Action
PostGIS ver. 3.3 Geometry/Geography Quick Guide - Cheatsheet
Part 1: Getting Started With PostGIS: An almost Idiot's Guide
PostgreSQL PL/R Support Functions
PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide
PLR Part 3: PL/R and Geospatial Data Abstraction Library (GDAL) RGDAL
Voronoi Diagrams
SUNY Stony Brook note on Voronoi Diagrams and what they are used for
WikiPedia: Voronoi (Voronoy) Diagram

I asked a GIS analyst friend of mine, named Larry Braman, to give me his definition of what a voronoi diagram is and its purpose. He gave me this very nice succinct definition. His definition is as follows:

A Voronoi diagram divides the drawing into regions around each point that are shaped so that the borders of the regions are equidistant from the two nearest points. It's kind of the vector equivalent of grid mapping where you're trying to show boundaries of influence from point data.

Not sure where that definition originated, but I like it because its short and simple to understand. Manifold has the same definition and has some other interesting tidbits on there site.

This Document is available under the GNU Free Documentation License 1.2 & for download at the BostonGIS site

Boston GIS      Copyright 2024      Paragon Corporation