I’m going to start implementing a recommender system soon. It seems like recommender systems get a bad rap from many people. My experiences with them have not been so stellar either. I think my basic gripe is that they speak up when they shouldn’t, that is when they don’t really have anything good to recommend or at a time that I’m not prepared to listen. A good friend wouldn’t do that. On the other hand, Google might be considered a recommender system, and for the most part it does a great job. In fact, I’ve joked that instead of coming up with a fancy algorithm we should take the descriptors of the context in which we are trying to match, throw them into a google search and give back the top 5 search results
I’m currently thinking about the problem in the context of ozmozr and the NSDL. To tackle the problem, I’ve broken it down into the following questions:
- Who – Who are we recommending things to? (people or groups)
- What – What are we recommending? (groups, feeds, stories/web pages, tags)
- When – When do we offer recommendations? (when searching, when visiting home page)
- When not – When do we refrain from making recommendations? (when we don’t reach a certain threshold of certainty)
- Criteria – What factors should we include when considering what recommendations to make? What weight should be given to those criteria.
- Algorithm – What algorithm should we use to implement the recommender?
- Implementation – How do we implement recommender? (use R to do the analysis and store the results in the DB, present it via rails)
- Priority – What is the priority of implementation? Where will we get the biggest payoff for our efforts?
The factors considered in the algorithm will depend on the what is being recommended and who it is being recommended to.
Recommending Stories to Users
Our approach will combine content filtering (co), collaborative filtering (cl), and rational analysis (ra) (rules that make sense).
Factors to consider:
- Titles, tags, and contents of stories they have read, tagged, shared, voted, or externally linked
- Feeds they subscribe to
- Groups they belong to
- Stories that people like them have read, tagged, or shared, and how recently they read, tagged or shared them
- Whether or not they have read the story before
- How recent the story was published
- How often the story was viewed
Algorithm:
- (Rational) Get the set of stories published within the specified recency.
- (Rational) Eliminate stories the user has read before.
- (Rational) Eliminate stories from the feeds that the user subscribes to?
- (Rational) Eliminate stories from the feeds that groups subscribe to that the user belongs?
- (Content) Get the set of stories similar to ones that the user has read, tagged, or shared before.
- (Collaborative) Get the set of stories that similar users have read, tagged, or shared.
- Store in a “user_story_recommendation” table (user_id, story_id, content_score, collaborative_score)
- Display recommendations by querying the r-table for stories except 2-4, ranked according to a combination of the content and collaborative scores and recency.
Proposed formula for composite score: rank = r ( A(cn) + B(cf) ). Where cn = content score, cf = collaborative score, r = recency score, and A and B are arbitrary weights to allow us to tune the relative contribution of the content and collaborative filtering scores.
Should the recency factor decay linearly, logarithmically, or exponentially?
Posted on May 9th, 2007 by joel
Filed under: information retrieval
[...] approach. The proposal and plans are to make it a recommender based on user profiles. See my previous post for details about where it is intended to [...]