Quick’n’dirty Mediawiki file crawler


URL='http://10.0.0.1' MIME='image/jpeg' 
  bash -c 'wget -q -O - "$URL/wiki/index.php?title=Special:MIMESearch&mime=$MIME&limit=500&offset=0" 
  | grep -Po "/wiki/images[^"]+" 
  | xargs -n1 -I {} wget "$URL{}"'

What it does: it uses the “MIME search” functionality on the wiki to locate files of a certain mime type and then xargs+wget each of them.

Limitations:

  • A maximum of 500 files are downloaded
  • Downloads are not parallelized, thus slower than they could be
, , ,

Leave a Reply

Your email address will not be published. Required fields are marked *