python版本:Python 3.6.1
python开发工具:JetBrains PyCharm 2018.3.6 x64
第三方库:pandas ;matplotlib ;seaborn
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') sns.set_style({'font.sans-serif': ['simhei', 'Arial']}) lianjia_df = pd.read_csv('lianjia.csv') # 添加房屋均价 df = lianjia_df.copy() df['PerPrice'] = round(lianjia_df['Price'] / lianjia_df['Size'], 2) # 重新摆放列位置 columns = ['Region', 'District', 'Garden', 'Layout', 'Floor', 'Year', 'Size', 'Elevator', 'Direction', 'Renovation', 'PerPrice', 'Price'] df = pd.DataFrame(df, columns=columns) # 电梯异常数据处理 print(df.head()) df['Elevator'] = df.loc[(df['Elevator'] == '有电梯') | (df['Renovation'] == '无电梯'), 'Elevator'] # 填补Elevator缺失值 df.loc[(df['Floor'] > 6) & (df['Elevator'].isnull()), 'Elevator'] = '有电梯' df.loc[(df['Floor'] <= 6) & (df['Elevator'].isnull()), 'Elevator'] = '无电梯' # 装修特征异常数据处理 df['Renovation'] = df.loc[(df['Renovation'] != '南北'), 'Renovation'] # 去掉南北的异常值 # 在数据集的不同子集上绘制同一图的多个实例,格子图 grid = sns.FacetGrid(df, row='Elevator', col='Renovation', palette='seismic', height=6) # height控制参数内容显示 grid.map(plt.scatter, 'Year', 'Price') grid.add_legend() plt.show()
可以看到图中看到价格,装修特征,有无电梯,和年份之间的关系
https://url71.ctfile.com/f/13238771-530323628-1950bb
(访问密码:8835)
这里主要运用了python的进行数据分析时,注意从图中分析异常数据,在对相关数据进行过滤掉
有问题欢迎在评论区中讨论