life is a rum go guv’nor, and that’s the truth

Web crawling on a budget

Justin and I submitted proposals to the Digital Media and Learning Competition. I was amazed to see the breadth of the 100 pages of submissions. There are a lot of good ideas there. Not being sure that the submissions will always be kept public, I wanted to archive them for later reference. Here was the ruby script I came up with:

(1..100).each {|page| system("curl -o #{page}.html
   http://dmlcompetition.net/pligg/index.php?page=#{page}")}

Ruby rocks!

Leave a Reply