life is a rum go guv’nor, and that’s the truth

Lucene and Multi-Lingual Updates to OER Recommender

Last week I posted an update to OER Recommender. The source for the project is posted in Google code projects: oerrecommender, recommenderd, and aggregatord. The biggest change was moving OER Recommender from my home-brewed indexing and recommendation engine to using the super fast, super easy, open source search engine Lucene. I made the move because I had heard many good things about Lucene and wanted to explore using it. In addition, Lucene supports multiple languages nicely. Because the OER Recommender web app is in written in Rails, I used the acts_as_solr plugin which depends on Solr, another Apache project which provides easy integration with Web applications. Here is a list of the changes I made:

  • Added Collections. The index now contains more than 90,000 records from over 100 collections and 26 languages.
  • Added Support for Harvesting via SQI/WSDL. Support for harvesting SQI/WSDL using Axis was added in order to harvest MERLOT via Araidne.
  • Catalog Links. When providing recommendations if we have catalog links and direct links for resouces such as in the case of OER Commons and MERLOT both are provided.
  • OAI Set Discovery – In order to get the names of collections from OER Commons the ability to discover the OAI Sets (collections) was added to the harvester.
  • Lucene. The home brewed search and recommendation system was swapped out with Lucene. This makes for faster and better searching as well as faster indexing. The full range of query syntax supported by Lucene is now supported.
  • Multi-Lingual. With Lucene in place OER Recommender can now support language-specific search and recommendation.
  • Additional Metadata. Additional metadata was added to search and recommendation results: Descriptions, Authors, Date (metadata), Date (relevance was calculated).
  • Home Page Cleanup. The home page was simplified by moving the Greasemonkey script and example resources and to a separate page.
  • Search Results. Search results were modified to look similar to Google search results. In cases where the index contains both links to catalog pages and direct links for a resource, a Metadata link next to the title takes you to the catalog page. A “Related Resources” link is also provided next to each item in search results. This makes it easy to see recommendations.
  • More Recommendations Page. The geek friendly page was replaced with a page that looks essentially like the search results page. A link to the original page containing details such as is included near the top of the page.
  • Incremental Updates Support. The recommender was modified to support incrementally updating the indexes and recommendations without losing user data. It now runs every night, harvesting the collections, indexing and creating recommendations for new records. Once a week it re-runs all recommendations so that recommendations could be created that point at new records.
  • Time on Page. Average time on page tracking was added and used to adapt the recommendations algorithm.
  • Localized Interface. The main web pages were translated into Spanish, French, German, Japanese, Dutch, Russian, and Chinese using Google Translate (if you speak one of those languages and want to help the translation, feel free to send me fixes). Localization is supported via the swell Simple Localization rails plugin. The web app also auto-detects the language set in your web browser and sets that as the default search interface.

More on implementation later…

Leave a Reply