实例中的所有数据都是在GitHub上下载的,打包下载即可。
地址是: [ http://github.com/pydata/pydata-book ](http://github.com/pydata/pydata-
book)
我使用的是Python2.7,书中的代码有一些有错误,我使用自己的2.7版本调通。
# coding: utf-8 import pandas as pd unames = ['user_id','gender','age','occupation','zip'] users = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\users.dat', sep='::', header=None, names=unames) rnmaes = ['user_id','movie_id','rating','timestamp'] ratings = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\\ratings.dat', sep='::', header=None, names=rnmaes) mnames = ['movie_id','title','genres'] movies = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\movies.dat', sep='::', header=None, names=mnames) users[:5] ratings[:5] movies[:5] ratings data = pd.merge(pd.merge(ratings, users), movies) data.ix[0] mean_rating = data.pivot_table('rating', index='title', columns='gender', aggfunc='mean') mean_rating[:5] ratings_by_title = data.groupby('title').size() ratings_by_title[:10] active_titles = ratings_by_title.index[ratings_by_title >= 250] active_titles mean_rating = mean_rating.ix[active_titles] mean_rating top_female_rating = mean_rating.sort_index(by='F', ascending=False) top_female_rating[:10] mean_rating['diff'] = mean_rating['M'] - mean_rating['F'] sorted_by_diff = mean_rating.sort_index(by='diff') sorted_by_diff[:15] sorted_by_diff[::-1][:15] ratings_std_by_title = data.groupby('title')['rating'].std() ratings_std_by_title = ratings_by_title.ix[active_titles] ratings_std_by_title.order(ascending=False)[:10] ratings_std_by_title [/code] ![在这里插入图片描述](https://www.www.zyiz.net/i/ll/?i=20210608151750993.gif)