摘要近年来,微博应用飞速发展,已成为人们广泛使用的信息发布和传播途径。但是网页信息碎片化的问题也随之出现,为了解决这一问题就必须进行用户建模,对用户实际兴趣进行描述。微博作用的发挥是以用户信息行为为载体,利用用户信息行为结合微博内容挖掘用户兴趣、进行用户建模,具有非常重要的研究意义。目前对于用户建模的研究主要集中于用户兴趣建模。进入Web2。0时代,网络用户信息行为也逐渐取代传统用户信息行为得到了学界的关注综合分析已有研究。微博内容的挖掘难点在于隐性话题的抽取,需要利用中文分词技术。综合分析之前的研究和微博自身特点,本文最终选择微博用户的点赞和转发行为作为用户信息行为的代表,选取了卡方检验算法作为用户兴趣建模的特征选择方法,选取向量空间模型作为建模的表示方法,进行用户兴趣建模。
本文通过选取微博上五个大类(互联网、法律、医疗、文学、足球)的大V用户作为研究对象,使用火车头采集器采集用户信息行为和微博内容数据,利用Python语言进行编程提取用户兴趣特征词并计算特征值,得出分别表示用户转发行为、点赞行为和用户微博原创内容的特征向量,这三个向量加权可以得出每位用户的每个兴趣特征词的特征值,按照特征值的大小对兴趣特征词进行排序,得出融合用户信息行为和微博内容的用户兴趣特征向量,用来表示用户兴趣。82852
毕业论文关键词 用户信息行为 微博内容挖掘 用户建模
毕 业 论 文 外 文 摘 要
Title User Modeling on Social Networks —— Using User Information Behavior and Weibo Content for User Modeling
Abstract In recent years, Weibo develops rapidly, and it has become a popular way of information release and dissemination。 However the problem of information fragmentation also arose, in order to solve this problem we should utilize user modeling to describe the actual user interests。 The function of Weibo is based on user information behavior, so the research on user modeling of combining user information behavior and Weibo content is of great importance。 The present study for user modeling focused on modeling user’s interests。 Into the era of Web2。0, the academic field paid more attention on the research of network user information behavior instead of traditional user information behavior。 The difficulty of Weibo content mining is to extract the hidden information, which should use Chinese word segmentation method。 This paper finally chose users’ praise and forwarding behavior to represent user information behavior, used the chi square test method for feature selection algorithm of user modeling, and selected vector space model to show the result of user modeling。
This paper selected the most influential users of five major categories (Internet, law, medicine, literature, and football) on Sina Weibo as research objects。 We used locoy spider to collect the data of user information behavior and Weibo content and used the python programming language to select the words that can represent every user’s interest。 Finally, we obtained the vectors that can represent the feature of every user’s behavior and Weibo content。 After being weighted, the three vectors can be used to represent user interest, according to the size of the weighted results。
Keywords User Information Behavior Weibo Content Mining User Modeling
目 录
1 绪论 1
1。1 选题背景 1
1。2 研究意义