life is a rum go guv’nor, and that’s the truth

Multilingual Google search mashup

For sometime I have envisioned a web browser that allows me to search and browse all of the web-pages of the world and view them in English. I figure there have got to be lots of cool things going on in the non-English speaking world that I would be interested in but I never hear about them because I don’t speak those languages.

While attending the 2008 Mountain West Ruby Conference and needing something to hack on I decided to take a crack at the project. I already hacked Google Translate for Send2Wiki so I figured it would be a snap to do for this project. My plan was to take the search text, run it through the translator for each of the languages to search, then pass the translated queries off to the Google search sites for each of the languages and then pass those pages through Google translate to get English versions of the pages. I soon found that Google has already done most of the work for me with their cross-language search.

The only thing cross-language search doesn’t do for me is collate all of the language results into a single results page. You can only search for results in a single targeted language. Anyway, between sessions (a coder has always got to brag about how fast he can work right :-) I threw together a Multilingual Google Search Mashup that does the job. As I put it together, a couple of things almost immediately stood out:

  • Wikipedia owns the top hit slot for many searches. Because their pages are essentially equivalent in the different languages, listing that entry for each of the languages isn’t especially useful.
  • Interleaving search results is difficult. Rather than try to figure out an intelligent way to order in real-time the search results from the various languages, I just give the first two from each language and provide a language for getting more. I’ve got ideas for interleaving results, but none of them are too easy. Notice also that I haven’t included English in the search, which is probably where the most relevant pages will actually come from.

These issues makes me wonder if a different approach would be preferrable. Perhaps Google could annotate search results with relevant pages in different languages. This also makes me think about Google’s search result ordering. Google search results appear to be determinative (if you execute the same search twice, the same item will show up at the top of the list). While this may be what we have come to expect, my experience with writing OER Recommender makes me believe that it isn’t necessarily the best or the fairest thing to do. When ranking pages it is often the case that the scores of the top two or even 10 pages are statistically indistinguishable. So why should the one that happens to have a .00000001% higher score always show up first. My approach with was to identify a strata of rankings for those “highest ranked pages” that are virtually indistinguishable, I randomize the order. This seems fairer since it is quite natural for users to click on the first item in on a search results page, thus biasing it to become more and more popular.

One Response to “Multilingual Google search mashup”

  1. I think you raised and implemnted some great ideas. Don’t let Google hear tho or they willl either swipe ur idea or come with a checkbook and try to buy u. I say resist.

Leave a Reply