Kash Farooq's software development blog

.NET Developer

Programmatically searching Google (Part 2): Using the RESTful interface

Posted by Kash Farooq on January 30, 2011

This post follows on from part 1, in which I perform a Google search using the .NET wrapper library project. I was curious why the library didn’t appear to provide any paging functionality and seemed to just get all the search results in one hit.

So, I’m now going to look at searching directly using Google’s RESTful API.

The URL that you hit is:

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=MY SEARCH TEXT&rsz=large&key=MY GOOGLE KEY&start=0

You can omit the key, but Google recommends you provide it.

This returns a JSON result set like the following.

[Note: also see “Creating .NET objects from JSON using DataContractJsonSerializer“]

{"responseData":
 {"results":
 [
  {
  "GsearchResultClass":"GwebSearch",
  "unescapedUrl":"http://www.google.com/help/features.html",
  "url":"http://www.google.com/help/features.html",
  "visibleUrl":"www.google.com",
  "cacheUrl":"http://www.google.com/search?q\u003dcache:BNRWhS8EKYAJ:www.google.com",
  "title":"\u003cb\u003eSearch\u003c/b\u003e Features - \u003cb\u003eGoogle\u003c/b\u003e",
  "titleNoFormatting":"Search Features - Google",
  "content":"To find reviews and showtimes...."
  },
  {
  "GsearchResultClass":"GwebSearch",
  etc
  },
  etc
 ],
 "cursor": {
 "pages": [
  { "start": "0", "label": 1 },
  { "start": "8", "label": 2 },
  etc
  { "start": "56","label": 8 }
  ],
  "estimatedResultCount": "59600000",
  "currentPageIndex": 0,
  "moreResultsUrl": "http://www.google.com/search?oe=utf8&ie=utf8&source=uds&start=0&hl=en&q=MY SEARCH TEXT"
  }
 },
 "responseDetails": null,
 "responseStatus": 200
}

Now we’re getting somewhere. The “cursor” block at the bottom of the returned data provides paging information, a “moreResultsUrl”, an estimate of the number of results. Eveything we need.

As I have been told the number of pages (in the above example there are 8 pages) and the starting point of each page (0, 8, …, 56), I can just use my original URL again, and adjust the start parameter for each call (i.e. I don’t need to use the moreResultsUrl data provided in the cursor block).

So, I can just use:

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=MY SEARCH TEXT&rsz=large&key=MY Google KEY&start=8

After playing with some more searches I realised that a maximum of 8 pages were being returned each time.

In my example above, each page returned 8 results. 8 pages of 8 results does not give me 59600000 results!

I tried the following URL:

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=MY SEARCH TEXT&rsz=large&start=100

The result returned was:

{"responseData": null, "responseDetails": "out of range start", "responseStatus": 400}

Google does not allow you to search for more than 64 results!

So the only way to get more than 64 results would be to screen scrape, which is ugly (but can be made more bearable by using SGMLReader and LINQ to XML). Though I’m pretty sure Google’s T’s & C’s don’t allow screen scraping!

Advertisements

Sorry, the comment form is closed at this time.

 
%d bloggers like this: