Grey Panthers Savannah

About
Blog
Interesting
Projects & Ideas

Quick’n’dirty Mediawiki file crawler


URL='http://10.0.0.1' MIME='image/jpeg' 
  bash -c 'wget -q -O - "$URL/wiki/index.php?title=Special:MIMESearch&mime=$MIME&limit=500&offset=0" 
  | grep -Po "/wiki/images[^"]+" 
  | xargs -n1 -I {} wget "$URL{}"'

What it does: it uses the “MIME search” functionality on the wiki to locate files of a certain mime type and then xargs+wget each of them.

Limitations:

A maximum of 500 files are downloaded
Downloads are not parallelized, thus slower than they could be

September 6, 2011

gpanther

bash, linux, mediawiki, wiki

Quick’n’dirty Mediawiki file crawler

Leave a Reply Cancel reply