ML之xgboost:利用xgboost算法(自带,特征重要性可视化+且作为阈值训练模型)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测)
目录
输出结果
设计思路
核心代码
后期更新……
可知,8个或者5个特征就足够好了 ,odor、spore-print-color、population、gill-spacing、gill-size
后期更新……
后期更新……
print('XGB_model.feature_importances_:','\n', XGB_model.feature_importances_) from matplotlib import pyplot pyplot.bar(range(len(XGB_model.feature_importances_)), XGB_model.feature_importances_) from xgboost import plot_importance plot_importance(XGB_model) thresholds = sort(XGB_model.feature_importances_) for thresh in thresholds: selection = SelectFromModel(XGB_model, threshold=thresh, prefit=True) select_X_train = selection.transform(X_train) selection_model = XGBClassifier() selection_model.fit(select_X_train, y_train) select_X_test = selection.transform(X_test) y_pred = selection_model.predict(select_X_test) predictions = [round(value) for value in y_pred] accuracy = accuracy_score(y_test, predictions) print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (thresh, select_X_train.shape[1], accuracy*100.0))