Toolbox

wget Recipes

Recipes for website mirroring / scraping

Here are some useful tricks with wget. More of them here.

Mirror a website

wget --mirror --no-clobber --no-parent --wait=3 --execute robots=off --domains=danburzo.ro,assets.danburzo.ro --user-agent=Mozilla danburzo.ro

A quick explanation for these flags:

Download sequential URLs

wget http://example.com/records/{1..1000}

Download a list of URLs

The URLs can be specified in a separate file list-of-URLs.txt, with one URL per line.

wget --input list-of-URLs.txt

Other usages

Find broken links

wget --recursive --level=0 --spider danburzo.ro

Further reading