Introducing Recommenders This chapter covers: •What recommender are, within Mahout •A first look at a Recommender in action •Evaluating accuracy and quality of recommender engines •Evaluating a recommender on a real data set: GroupLens 39186
Each day we form opinions about things we like, don't like, and don't even care about. It happens unconsciously. You hear a song on the radio and either notice it because it's catchy, or because it sounds awful – or maybe don't notice it at all. The same thing happens with t-shirts, salads, hairstyles, ski resorts, faces, and television shows.
Although people's tastes vary, they do follow patterns. People tend to like things that are similar to other things they like. Because I love bacon-lettuce-and-tomato sandwiches, you can guess that I would enjoy a club sandwich, which is mostly the same sandwich but with turkey. Likewise, people tend to like things that similar people like.
These patterns can be used to predict such likes and dislikes. Recommendation is all about predicting these patterns of taste, and using them to discover new and desirable things you didn’t already know about.
After introducing the idea of recommendation in more depth, this chapter will help you experiment with some Mahout code to run a simple recommender engine, and understand how well it works, in order to give you an immediate feel for how Mahout works in this regard.
2.1 What is recommendation?
You picked up this book from the shelf for a reason. Maybe you saw it next to other books you know and find useful, and figure the bookstore has put it there since people who like those books tend to like this one too. Maybe you saw this book on the shelf of a coworker, who you know shares your interest in machine learning, or perhaps he recommended it to you directly.
These are different, but valid strategies for discovering new things: to discover items you may like, you could look to what people with similar tastes seem to like. On the other hand, you could figure out what items are like the ones you already like, again by looking to others’ apparent preferences. In fact, these describe the two broadest categories of recommender engine algorithms: “user-based” and “item-based” recommenders, both of which are well-represented within Mahout.
2.2 Collaborative filtering, not content-based recommendation
Strictly speaking, the scenarios above are examples of “collaborative filtering” -- producing recommendations based on, and only based on, knowledge of users’ relationships to items. These techniques require no knowledge of the properties of the items themselves. This is, in a way, an advantage. This recommender framework does not care whether the “items” are books, theme parks, flowers, or even other people, since nothing about their attributes enters into any of the input.
There are other approaches based on the attributes of items, and are generally referred to as “content-based” recommendation techniques. For example, if a friend recommended this book to you because it’s a Manning book, and the friend likes other Manning books, then the friend is engaging in something more like content-based recommendation. The thought is based on an attribute of the books: the publisher. The Mahout recommender framework does not directly implement these techniques, though it offers some ways to inject item attribute information into its computations. As such, it might technically be called a collaborative filtering framework.
There is nothing wrong with these techniques; on the contrary, they can work quite well. They are necessarily domain-specific approaches, and would be hard to meaningfully codify into a framework. To build an effective content-based book recommender, one would have to decide which attributes of a book -- page count, author, publisher, color, font -- are meaningful, and to what degree. None of this knowledge translates into any other domain; recommending books this way doesn’t help in recommend pizza toppings.
- 上一篇:MAP-REDUCE的程序和系统英文文献和中文翻译
- 下一篇:进销存管理系统英文文献和中文翻译
-
-
-
-
-
-
-
中国传统元素在游戏角色...
巴金《激流三部曲》高觉新的悲剧命运
高警觉工作人群的元情绪...
上市公司股权结构对经营绩效的影响研究
g-C3N4光催化剂的制备和光催化性能研究
江苏省某高中学生体质现状的调查研究
现代简约美式风格在室内家装中的运用
C++最短路径算法研究和程序设计
NFC协议物理层的软件实现+文献综述
浅析中国古代宗法制度