Ruby: Create a Web Spider.

If you want to write a spider/crawler to scrape the HTML source from a given web URL, there are two way I know which will allow you to write using existing libraries of Ruby. The first example for ruby scraper is using open-uri. This doesn't have much features but handy enough to write a crawler. Below is the example code block which will first read the HTML source from google, and print the title of the page after matching regular expression.

require 'open-uri';

url = "http://www.google.com/";
connection = open(url);
content = connection.read;
if(content =~ /<title>(.*?)<\/title>/)
    print $1,"\n";
end

Below is another way, using the ruby library net/http, you can write a crawler too. But this one has more options compare to previous one. You can handle GET/POST method along with cookie and other features. My suggestion is use this one rather previous example.

require 'net/http';

url = URI.parse('http://www.yahoo.com/');
req = Net::HTTP::Get.new(url.path);
res = Net::HTTP.start(url.host, url.port) {|http|
  http.request(req);
}
content = res.body;
if(content =~ /<title>(.*?)<\/title>/)
    print $1,"\n";
end

Hope this will make you a good spiderman using Ruby script.

The world is amazing

Search This Blog

Ruby: Create a Web Spider.

Comments