Kash Farooq's software development blog

.NET Developer

Posts Tagged ‘Google’

Creating .NET objects from JSON using DataContractJsonSerializer

Posted by Kash Farooq on January 31, 2011

Creating .NET objects from JSON using DataContract and DataMember

Note: .NET 4 has made it far easier to do this. See Creating objects from JSON in .NET 4.

In a previous post and demonstraed that the RESTful Google Search API returns data as JSON. I needed a way to convert the JSON data into .NET objects and this post shows what I ended up with.

Here is an example of the JSON result set returned:

{"responseData":
 {"results":
 [
  {
  "GsearchResultClass":"GwebSearch",
  "unescapedUrl":"http://www.google.com/help/features.html",
  "url":"http://www.google.com/help/features.html",
  "visibleUrl":"www.google.com",
  "cacheUrl":"http://www.google.com/search?q\u003dcache:BNRWhS8EKYAJ:www.google.com",
  "title":"\u003cb\u003eSearch\u003c/b\u003e Features - \u003cb\u003eGoogle\u003c/b\u003e",
  "titleNoFormatting":"Search Features - Google",
  "content":"To find reviews and showtimes...."
  },
  {
  "GsearchResultClass":"GwebSearch",
  etc
  },
  etc
 ],
 "cursor": {
 "pages": [
  { "start": "0", "label": 1 },
  { "start": "8", "label": 2 },
  etc
  { "start": "56","label": 8 }
  ],
  "estimatedResultCount": "59600000",
  "currentPageIndex": 0,
  "moreResultsUrl": "http://www.google.com/search?oe=utf8&ie=utf8&source=uds&start=0&hl=en&q=MY SEARCH TEXT"
  }
 },
 "responseDetails": null,
 "responseStatus": 200
}

We can use System.Runtime.Serialization to create .NET objects from this JSON data.

For example, the top level JSON responseData can be represented by the following .NET type:

[DataContract]
public class GoogleSearchResults {
  [DataMember(Name = "responseData")]
  public ResponseData ResponseData { get; set; }
}

Note that I’ve specified the Name attribute. Without it I would have had to have a property called “responseData” rather than “ResponseData”.

The rest of the data is extracted using the classes below. Note that I can be quite selective in what data I wanted to transfer from JSON to .NET. If I don’t need, say, the “cacheUrl”, I can just omit it from my .NET objects. I can also rename data. I have put the data from “titleNoFormatting” into a property called Title:

[DataContract]
public class ResponseData
{
  [DataMember(Name="results")]
  public IEnumerable<Result> Results { get; set; }

  [DataMember(Name = "cursor")]
  public Cursor Cursor { get; set; }
}

[DataContract]
public class Cursor
{
  [DataMember(Name = "moreResultsUrl")]
  public string MoreResultsUrl { get; set; }

  [DataMember(Name = "pages")]
  public IEnumerable<Page> Pages { get; set; }
}

[DataContract]
public class Result
{
  [DataMember(Name = "url")]
  public string Url { get; set; }

  [DataMember(Name = "titleNoFormatting")]
  public string Title { get; set; }
}

[DataContract]
public class Page
{
  [DataMember(Name = "start")]
  public int Start { get; set; }

  [DataMember(Name = "label")]
  public string Label { get; set; }
}

Finally, I need a method to actually deserialize a JSON string into my .NET objects. I can use System.Runtime.Serialization.Json.DataContractJsonSerializer to do this. I have created the following generic method that takes a JSON string and the type I want it to be deserialised into, and then returns an instantiated .NET object:

public static T Deserialise(string json) {
  var obj = Activator.CreateInstance();
  using (var memoryStream = new MemoryStream(Encoding.Unicode.GetBytes(json))) {
    var serializer = new DataContractJsonSerializer(obj.GetType());
    obj = (T) serializer.ReadObject(memoryStream);
    return obj;
  }
}

This generic method can then be used to convert a JSON string to the specified .NET type:

string reponseJson=GetJsonDataFromGoogle("MY SEARCH TERM);
results = Deserialise(responseJson);
foreach (var googleSearchResult in results.ResponseData.Results) {
  Console.WriteLine(googleSearchResult.Url);
}

And for completeness, here is GetJsonDataFromGoogle():

public static string GetJsonDataFromGoogle(string searchTerm)
{
    var url = string.Format("http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q={0}&rsz=large&start=0", searchTerm);
    var req = (HttpWebRequest) WebRequest.Create(url);
    req.Referer = "http://mywebsite.com";
    var res = (HttpWebResponse) req.GetResponse();
    string responseJson;
    using (var streamReader = new StreamReader(res.GetResponseStream())) {
        responseJson = streamReader.ReadToEnd();
    }
    return responseJson;
}

Posted in .NET | Tagged: , , , | 10 Comments »

Programmatically searching Google (Part 2): Using the RESTful interface

Posted by Kash Farooq on January 30, 2011

This post follows on from part 1, in which I perform a Google search using the .NET wrapper library project. I was curious why the library didn’t appear to provide any paging functionality and seemed to just get all the search results in one hit.

So, I’m now going to look at searching directly using Google’s RESTful API.

The URL that you hit is:

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=MY SEARCH TEXT&rsz=large&key=MY GOOGLE KEY&start=0

You can omit the key, but Google recommends you provide it.

This returns a JSON result set like the following.

[Note: also see “Creating .NET objects from JSON using DataContractJsonSerializer“]

{"responseData":
 {"results":
 [
  {
  "GsearchResultClass":"GwebSearch",
  "unescapedUrl":"http://www.google.com/help/features.html",
  "url":"http://www.google.com/help/features.html",
  "visibleUrl":"www.google.com",
  "cacheUrl":"http://www.google.com/search?q\u003dcache:BNRWhS8EKYAJ:www.google.com",
  "title":"\u003cb\u003eSearch\u003c/b\u003e Features - \u003cb\u003eGoogle\u003c/b\u003e",
  "titleNoFormatting":"Search Features - Google",
  "content":"To find reviews and showtimes...."
  },
  {
  "GsearchResultClass":"GwebSearch",
  etc
  },
  etc
 ],
 "cursor": {
 "pages": [
  { "start": "0", "label": 1 },
  { "start": "8", "label": 2 },
  etc
  { "start": "56","label": 8 }
  ],
  "estimatedResultCount": "59600000",
  "currentPageIndex": 0,
  "moreResultsUrl": "http://www.google.com/search?oe=utf8&ie=utf8&source=uds&start=0&hl=en&q=MY SEARCH TEXT"
  }
 },
 "responseDetails": null,
 "responseStatus": 200
}

Now we’re getting somewhere. The “cursor” block at the bottom of the returned data provides paging information, a “moreResultsUrl”, an estimate of the number of results. Eveything we need.

As I have been told the number of pages (in the above example there are 8 pages) and the starting point of each page (0, 8, …, 56), I can just use my original URL again, and adjust the start parameter for each call (i.e. I don’t need to use the moreResultsUrl data provided in the cursor block).

So, I can just use:

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=MY SEARCH TEXT&rsz=large&key=MY Google KEY&start=8

After playing with some more searches I realised that a maximum of 8 pages were being returned each time.

In my example above, each page returned 8 results. 8 pages of 8 results does not give me 59600000 results!

I tried the following URL:

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=MY SEARCH TEXT&rsz=large&start=100

The result returned was:

{"responseData": null, "responseDetails": "out of range start", "responseStatus": 400}

Google does not allow you to search for more than 64 results!

So the only way to get more than 64 results would be to screen scrape, which is ugly (but can be made more bearable by using SGMLReader and LINQ to XML). Though I’m pretty sure Google’s T’s & C’s don’t allow screen scraping!

Posted in .NET | Tagged: , , | Leave a Comment »

Programmatically searching Google (Part 1): The Google API for .NET

Posted by Kash Farooq on January 24, 2011

Also see Programmatically searching Google (Part 2): Using the RESTful interface

I needed to do some web searches and record the host names that I found, so I started investigating what APIs Google exposed.

I found the “Google APIs for .NET Framework” hosted on Google Code. There hasn’t been much activity on this project recently – the last release was April 2010 – and there is not much documentation, but it works and it is simple to use.

It provides two sets of libraries: one that wraps Google Search and one that wraps Google Translate.

Here is an example of a search:

public void SearchForSomethingUsingLib() {
  var client = new GwebSearchClient("http://mywebsite.com");
  var results = client.Search("google api for .NET", 100);
  foreach (var webResult in results) {
   Console.WriteLine("{0}, {1}, {2}", webResult.Title, webResult.Url, webResult.Content);
  }
}

The ‘100’ in the above example indicates the number of search results to return.

What puzzled me was that the set of results returned didn’t appear to have a concept of paging, and the library didn’t seem to have a “Search Paged” method. What if I had asked for 1000000 results? Surely the API wasn’t going to get 1000000 results in one go and load them all into a List?

I fired up reflector. The API definitely returned a straight forward generic list. No delayed execution – one Web Request sent to Google, followed by one “new List<T>” to return the results.

I needed to investigate further, and I’ll look at that in Part 2.

Posted in .NET | Tagged: , , | Leave a Comment »