Toolbox

Data sets

Repo Description Notes
Words A huge dataset of words in four languages (English, German, Spanish and French) used in Atebits' game Letterpress.
corpora A collection of small corpuses of interesting data for the creation of bots and similar stuff. I also keep a repo, inspired by it.
Natural Earth Vector A global, public domain map dataset available at three scales and featuring tightly integrated vector and raster data.
countries World countries in JSON, CSV and XML.
geonames contains over 10 million geographical names and consists of over 9 million unique features whereof 2.8 million populated places and 5.5 million alternate names.
Geofabrik OSM Data Extracts On continent/country level.
Mapzen Metro Extracts City-sized portions of OpenStreetMap, served weekly
all-the-cities All the 138,398 cities of the world with a population of at least 1000 inhabitants, in a big JSON array.
Awesome public datasets
whiskyverse JSON file containing Scotch Malt Whisky Society bottles