本小节是通过使用逻辑回归算法对mnist数据集的数字识别,效果只能说勉强凑合,不过比7.8节的nb算法好一些。
1.源码修改
作者的代码会报错以及报警
(1)报错
Traceback (most recent call last): File "C:/Users/liujiannan/PycharmProjects/pythonProject/Web安全之机器学习入门/code/8-3.py", line 15, in <module> training_data, valid_data, test_data=load_data() File "C:/Users/liujiannan/PycharmProjects/pythonProject/Web安全之机器学习入门/code/8-3.py", line 10, in load_data training_data, valid_data, test_data = pickle.load(fp) UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)
修改方法
def load_data(): with gzip.open('../data/MNIST/mnist.pkl.gz') as fp: training_data, valid_data, test_data = pickle.load(fp, encoding="bytes") return training_data, valid_data, test_data
(2)报警
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:469: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning. "this warning.", FutureWarning)
源码修改
logreg = linear_model.LogisticRegression(C=1e5, solver='liblinear', multi_class='ovr')
2.完整代码
基于原作者代码修改可运行在python3环境的源码:
# -*- coding:utf-8 -*- from sklearn import model_selection from sklearn.naive_bayes import GaussianNB import pickle import gzip def load_data(): with gzip.open('../data/MNIST/mnist.pkl.gz') as fp: training_data, valid_data, test_data = pickle.load(fp, encoding="bytes") return training_data, valid_data, test_data if __name__ == '__main__': training_data, valid_data, test_data=load_data() x1,y1=training_data x2,y2=test_data clf = GaussianNB() clf.fit(x1, y1) score = model_selection.cross_val_score(clf, x2, y2, scoring="accuracy") print(score) print(score.mean())
不过,我们看一下,交叉验证使用的是x2和y2,相当于用x2和y2既训练又测试得到的结果。
3.运行结果
[0.76482924 0.8529853 0.8639231 ] 0.8272458792084002
本例虽然作者示例有点问题,但是实际上逻辑回归测试mnist图片效果有很多,很容易做到90%以上,这里只是展示一种用法,不要太较真。