Drawing every street in Romania

· 2783 words · Tweet this

Ben Fry’s project All Streets left quite the impression on me back in 2007 and it’s stayed in the back of my head ever since. About a year ago I started to wonder if I could pull off something similar with tools I am comfortable with, namely JavaScript for data processing and SVG for drawing things.

And this happened:

Here’s a step-by-step account of how I got there.


1. Get the data

Geofabrik thoughtfully packages OpenStreetMap data for every country, so I grabbed the .osm.pbf for Romania. PBF is an alternative to the XML format in which OSM data is usually kept.

OSM works with just three data types:

For our modest purposes, we only need:

  1. Ways that are labeled as streets;
  2. The nodes that comprise those ways.

2. Extracting street data from the PBF

Time to brush off our Node.js skills and extract the data from the PBF file.

After a naïve attempt at loading everything in memory, it became apparent that data at this volume needs streaming – a technique in which we read data item-by-item, with only fraction in memory at any given time. osm-pbf-parser is a streaming parser which goes through the PBF data and outputs small sets of JSON objects in the formats below.

This is a node…

{
  type: 'node',
  id: 122321,
  lat: 53.527972600000005,
  lon: 10.0241143,
  tags: {...},
  info: {...}
}

…and this is a way

{
  type: 'way',
  id: 108,
  tags: {
    created_by: 'Potlatch 0.8',
    highway: 'living_street',
    name: 'Kitzbühler Straße',
    postal_code: '01217' 
  },
  refs: [ 442752, 231712390, 442754 ],
  info: {...}
}

Traversing the PBF file, we can do these checks to pick up the items we need:

  function isNode(item) {
    return item.type === 'node';
  }

  function isStreet(item) {
    return item.type === 'way' && item.tags.highway;
  }

We’ll put our extracted nodes and streets into plain-text files, with one item per line – this is a format that’s amenable to streaming so it will be easy to read them back on subsequent steps.

extract-nodes.js

var fs = require('fs');
var through2 = require('through2');
var osm_parser = require('osm-pbf-parser');
var JSONStream = require('JSONStream');

var INPUT_FILE = 'data/data.osm.pbf';
var OUTPUT_FILE = 'output/nodes.txt';

function isNode(item) {
    return item.type === 'node';
}

function serializeNode(item) {
    return item.id + ',' + item.lat + ',' + item.lon;
};

console.log('Extracting nodes from data file: ' + INPUT_FILE);
fs.createReadStream(INPUT_FILE)
    .pipe(new osm_parser())
    .pipe(
        through2.obj(function(items, enc, next) {
            var nodes = items.filter(isNode);
            var output = nodes.map(serializeNode).join('\n');
            this.push(output);
            next();
        })
    )
    .pipe(
        fs.createWriteStream(OUTPUT_FILE)
    )
    .on('finish', function() {
        console.log('Finished extracting nodes onto file: ' + OUTPUT_FILE);
    });

extract-streets.js

var fs = require('fs');
var through2 = require('through2');
var osm_parser = require('osm-pbf-parser');
var JSONStream = require('JSONStream');

var INPUT_FILE = 'data/data.osm.pbf';
var OUTPUT_FILE = 'output/streets.txt';

function isStreet(item) {
    return item.type === 'way' && item.tags.highway;
}

function serializeStreet(item) {
    return item.refs.join(',');
}

console.log('Extracting streets from data file: ' + INPUT_FILE);
fs.createReadStream(INPUT_FILE)
    .pipe(new osm_parser())
    .pipe(
        through2.obj(function(items, enc, next) {
            var streets = items.filter(isStreet);
            var output = streets.map(serializeStreet).join('\n');
            this.push(output);
            next();
        })
    )
    .pipe(
        fs.createWriteStream(OUTPUT_FILE)
    ).on('finish', function() {
        console.log('Finished extracting streets onto file: ' + OUTPUT_FILE);
    });

Which outputs:

nodes.txt

360714,44.493699500000005,26.0854494
360853,44.467436600000006,26.0771428
537912,44.425765000000006,26.123137900000003
546140,44.47436450000001,26.123994300000003
...

(id,latitude,longitude)

streets.txt

656951,2260664460,3227352565,656952, ...
256700851,2152136723,659642,256705252,2152144026, ...
304797001,2382014755,310215524,255848765 ...

(node1,node2,node3,...)

Note: We’re using through2.obj() to simplify the pipework.

3. Mapping node IDs to their coordinates

We now have a huge set of node coordinates and another huge set of node IDs. In order to map the IDs to the coordinates, we need to do two things:

  1. Load the nodes into some sort of database
  2. Query the database to look up the coordinates for a given ID

For storage I’ve turned to LevelDB which is a pretty straightforward, file-based database. You use it in Node through leveldown and levelup.

load-nodes.js

var fs = require('fs');
var through2 = require('through2');
var split2 = require('split2');
var levelup = require('level');

var DATABASE_NAME = 'everystreet';
var INPUT_FILE = 'output/nodes.txt';

var i = 0;

console.log('creating levelDB database ' + DATABASE_NAME);
levelup(DATABASE_NAME, function(err, db) {
    var write_stream = db.createWriteStream();
    fs.createReadStream(INPUT_FILE, { encoding: 'utf8' })
        .pipe(split2())
        .pipe(through2.obj(function(line, enc, next){
            var parts = line.split(',');
            this.push({ 
                key: parts[0], 
                value: parts[1] + "," + parts[2]
            });

            // Prevent memory leak
            // See: https://github.com/rvagg/node-levelup/issues/298
            if (i++ > 999) {
                setImmediate(next);
                i = 0;
            } else {
                next();
            }
        }))
        .pipe(write_stream)
        .on('finish', function() {
            console.log('Finished importing nodes into the database ' + DATABASE_NAME);
        });
});

This creates a LevelDB database with the name everystreet (which in turn creates an everystreet folder where the data is kept), and adds all nodes with key=ID and value=lat,lon.

Note: While attempting this I ran into some memory troubles to which the easy solution is to delay every 1000th next() call. There’s also level-bulk-load which attempts to optimize bulk writing in LevelDB, so that might be something to look into.

Next, let’s map the node IDs to their coordinates in our street definitions.

apply-nodes.js

var fs = require('fs');
var through2 = require('through2');
var split2 = require('split2');
var levelup = require('level');
var async = require('async');

var DATABASE_NAME = 'everystreet';
var INPUT_FILE = 'output/streets.txt';
var OUTPUT_FILE = 'output/streets-with-coordinates.txt';

console.log('Applying node data from database ' + DATABASE_NAME + ' to street data from file: ' + INPUT_FILE);
levelup(DATABASE_NAME, function(err, db) {

    var write_stream = fs.createWriteStream(OUTPUT_FILE);

    fs.createReadStream(INPUT_FILE, { encoding: 'utf8' })
        .pipe(split2())
        .pipe(through2.obj(function(line, enc, next){
            async.mapSeries(line.split(','), function(node_id, callback) {
                db.get(node_id, function(err, coords) {
                    callback(err, coords);
                });
            }, function(err, result) {
                this.push(result.join(',') + '\n');
                next();
            }.bind(this));
        }))
        .pipe(write_stream)
        .on('finish', function() {
            console.log('Finished applying node data into file: ' + OUTPUT_FILE);
        });
});

We’re streaming through each street in the data file, querying the database for the coordonates – using async.mapSeries to make sure we get back the node data in the correct order – and serializing them into a plain-text file.

streets-with-coordinates.js

44.469672200000005,26.093109000000002,44.469469600000004,26.093366600000003,...
44.46975080000001,26.092981700000003,44.4696756,26.092841000000004,...

At this point we’re done with extracting all the data we need but we still need to convert it from geographical coordinates to screen coordinates. *Takes deep breath*. Onwards!

4. Mapping geographical coordintes to screen coordinates

There are many different ways to project the Earth’s surface onto 2D space. Many maps are laid out based on the spherical Mercator projection. Assuming λ is the longitude and φ is the latitude, both expressed in radians, the formula is simple:

function mercator(λ, φ) {
  return [λ, Math.log(Math.tan(Math.PI/4 + φ/2))];
}

Before we dive into it, let’s see what we need to do:

  1. Find the bounding box of all our coordinates and its aspect ratio;
  2. Transform the geographical coordinates into screen coordinates, based on the bounding box.

The bounding box of our map is the smallest rectangle that contains all the nodes. We can find it by identifying the minimum/maximum longitude and latitude in our dataset. If we transform the points that define the bounding box using the Mercator projection, we can also obtain our final map’s aspect ratio, which we’re going to use later. The script below computes both:

bbox.js

var fs = require('fs');
var through2 = require('through2');
var split2 = require('split2');

var INPUT_FILE = 'output/streets-with-coordinates.txt';
var OUTPUT_FILE = 'output/bbox.json';

// initial values
var bbox = {
    north: -90, // minimum latitude
    south: 90, // maximum latitude
    east: -180, // minimum longitude
    west: 180 // maximum longitude
};

function toRadians(deg) {
    return deg * Math.PI / 180;
}

function mercator(λ, φ) {
  return [λ, Math.log(Math.tan(Math.PI/4 + φ/2))];
}

function projection(lat, lon) {
    return mercator(toRadians(lon), toRadians(lat));
}

console.log('Finding bounding box in file: ' + INPUT_FILE);
fs.createReadStream(INPUT_FILE, { encoding: 'utf8' })
    .pipe(split2())
    .pipe(through2.obj(function(line, enc, next) {
        var coords = line.split(',');
        for (var j = 0; j < coords.length; j+=2) {
            var lat = parseFloat(coords[j]),
                lon = parseFloat(coords[j+1]);

            if (lat > bbox.north) {
                bbox.north = lat;
            }
            if (lat < bbox.south) {
                bbox.south = lat;
            }
            if (lon > bbox.east) {
                bbox.east = lon;
            }
            if (lon < bbox.west) {
                bbox.west = lon;
            }
        }
        next();
    }))
    .on('finish', function() {

        var nw_projected = projection(bbox.north, bbox.west);
        var se_projected = projection(bbox.south, bbox.east);

        var west = nw_projected[0];
        var north = nw_projected[1];
        var east = se_projected[0];
        var south = se_projected[1]; 

        var output = {
            bbox: bbox,
            ratio: (east - west) / (north - south)
        };

        fs.writeFile(OUTPUT_FILE, JSON.stringify(output, null, 2));
    });

The script outputs the following information:

bbox.json

{
  "bbox": {
    "north": 48.4394855,
    "south": 43.578847700000004,
    "east": 29.726612400000004,
    "west": 20.198656500000002
  },
  "ratio": 1.3601773494902782
}

Let’s now take our geographical coordinates transform them using the Mercator projection; afterwards, we express them as percentages within the map’s bounding box which will make it easy for us to draw at any scale.

map-coordinates.js

var fs = require('fs');
var through2 = require('through2');
var split2 = require('split2');

var INPUT_FILE = 'output/streets-with-coordinates.txt';
var OUTPUT_FILE = 'output/streets-with-coordinates-mapped.txt';
var BBOX_FILE = 'output/bbox.json';

// read the bounding box and project it using Mercator
var o = JSON.parse(fs.readFileSync(BBOX_FILE, 'utf8'));
var nw_projected = projection(o.bbox.north, o.bbox.west);
var se_projected = projection(o.bbox.south, o.bbox.east);
var north = nw_projected[1];
var south = se_projected[1];
var west = nw_projected[0];
var east = se_projected[0];

function toRadians(deg) {
    return deg * Math.PI / 180;
}

function mercator(λ, φ) {
  return [λ, Math.log(Math.tan(Math.PI/4 + φ/2))];
}

function projection(lat, lon) {
    return mercator(toRadians(lon), toRadians(lat));
}

function percent(lonlat) {
    return [
        (lonlat[0] - west) / (east - west),
        (lonlat[1] - south) / (north - south)
    ];
}

console.log('Mapping nodes using Mercador projection from file: ' + INPUT_FILE);
fs.createReadStream(INPUT_FILE, { encoding: 'utf8' })
    .pipe(split2())
    .pipe(through2.obj(function(line, enc, next) {
        var coords = line.split(',');
        var pts = [];
        for (var i = 0; i < coords.length; i+=2) {
            pts.push(
                percent(
                    projection(coords[i], coords[i+1])
                ).join(',')
            );
        }
        this.push(pts.join(';') + '\n');
        next();
    }))
    .pipe(fs.createWriteStream(OUTPUT_FILE))
    .on('finish', function() {
        console.log('Finished mapping nodes using Mercator projection onto file: ' + OUTPUT_FILE);
    });

All nodes should be now in the [0,1] range:

strets-with-coordinates-mapped.txt

0.6186481719547,0.17686588159630384;0.6186752081839509,0.17682535252522288;...
0.618634811271534,0.1768816051533656;0.6186200442006669,0.17686656174973386;...
...

5. Drawing the map in SVG

We now have everything we need to start drawing some SVG paths. This is the structure we’re aiming for:

<svg width='...' height='...' viewbox='...'>
    <path d="M x1 y1 L x2 y2 L x3 y3 ..."/>
    ...
</svg>

Remember the map ratio we computed earlier? We can use that to derive the height of our map based on a width of our choice. And to transform our points from the [0,1] range to SVG-ready coordinates, we just need to factor in the map’s dimensions:

var map_width = 1500; 
var map_height = map_width / map_ratio; 

var screen_x = longitude * map_width;
var screen_y = (1 - latitude) * map_height;

Note: The 1 - latitude is to account for the fact that the origin of SVG coordinates is at the top left corner while our coordinates assume an origin in the bottom left corner.

generate-svg.js

var fs = require('fs');
var through2 = require('through2');
var split2 = require('split2');
var multiline = require('multiline');

var DEFAULT_MAP_WIDTH = 1500; // px

var INPUT_FILE = 'output/streets-with-coordinates-mapped.txt';
var OUTPUT_FILE = 'output/streets.svg';
var BBOX_FILE = 'output/bbox.json';

var o = JSON.parse(fs.readFileSync(BBOX_FILE, 'utf8'));

var map_width = DEFAULT_MAP_WIDTH;
var map_height = map_width / o.ratio;

var write_stream = fs.createWriteStream(OUTPUT_FILE);

write_stream.write(
    "<svg xmlns='http://www.w3.org/2000/svg' width='{w}' height='{h}' viewbox='0 0 {w} {h}'>"
        .replace(/\{w\}/g, map_width)
        .replace(/\{h\}/g, map_height)
);

console.log('Generating SVG from file: ' + INPUT_FILE);
fs.createReadStream(INPUT_FILE, { encoding: 'utf8' })
    .pipe(split2())
    .pipe(through2.obj(function(line, enc, next) {
        var path_data = 'M ' + line.split(';').map(function(pt) {
            var lonlat = pt.split(',');
            return [lonlat[0] * map_width, (1 - lonlat[1]) * map_height].join(' ');
        }).join(' L ');
        this.push('<path d="' + path_data + '" stroke-width="0.1" stroke="black" fill="none"/>\n');
        next();
    }, function(flush) {
        this.push('</svg>');
        flush();
    }))
    .pipe(write_stream)
    .on('finish', function() {
        console.log('Finished generating SVG onto file: ' + OUTPUT_FILE);
    });

And voilá! We have our all streets in Romania drawn up in SVG, which you can open in your browser (as opposed to other tools).

Here it is in all its glory: streets.svg (Warning: 262MB file!)

Final thoughts

Making it printable. Loading or converting a 262MB SVG is no easy feat, but ImageMagick somehow miraculously created a wall-sized PNG image:

convert -density 900 output/streets.svg output/streets.png

Another idea worth pursuing is making electron print out a PDF.

Optimizing the map. Taking into account the output resolution, one could simplify the paths with simplify.js to eliminate details without affecting the appearance (e.g. points that are very close together):

simplify([{x: ..., y: ... }, { x: ..., y: ... }, ...], 0.5) => [{x : ..., y: ...}, ...]

And considering the browser needs to build the DOM tree along with drawing the millions of paths in our file, one easy fix to have fewer DOM nodes is to batch the path data into multipaths of, say, a thousand paths:

var path_buffer = '';
var counter = 0;
...
  path_buffer += path_data;
  if (counter++ > 999) {
    this.push('<path d="' + path_buffer + '" stroke-width="0.1" stroke="black" fill="none"/>\n');
    counter = 0;
    path_buffer = '';
  }

I hope you’ve enjoyed this short foray into mapping! You can find all the scripts discussed here on Github: danburzo/every-street. If you have any idea on how to make this workflow better, I’d love to hear it!

Earlier: Eight Versions of Satie’s Gnossienne No. 1 (1890)

Later: A few FACT mixes