← Toolbox
Data sets
Repo | Description | Notes |
---|---|---|
Words | A huge dataset of words in four languages (English, German, Spanish and French) used in Atebits' game Letterpress. | |
corpora | A collection of small corpuses of interesting data for the creation of bots and similar stuff. | I also keep a repo, inspired by it. |
Natural Earth Vector | A global, public domain map dataset available at three scales and featuring tightly integrated vector and raster data. | |
countries | World countries in JSON, CSV and XML. | |
geonames | contains over 10 million geographical names and consists of over 9 million unique features whereof 2.8 million populated places and 5.5 million alternate names. | |
Geofabrik OSM Data Extracts | On continent/country level. | |
Mapzen Metro Extracts | City-sized portions of OpenStreetMap, served weekly | |
all-the-cities | All the 138,398 cities of the world with a population of at least 1000 inhabitants, in a big JSON array. | |
Awesome public datasets | ||
whiskyverse | JSON file containing Scotch Malt Whisky Society bottles |