数理证明
前置知识:拉格朗日数乘法、对偶问题、核技巧
针对的是约束优化问题:
例题:
已知x>0,y>0,x+2y+2xy=8,则x+2y的最小值__。
解:
引入参数 λ \lambda λ 构造新函数L: x + 2 y + λ ( x + 2 y + 2 x y − 8 ) x+2y+\lambda(x+2y+2xy-8) x+2y+λ(x+2y+2xy−8)
分别对x,y,
λ
\lambda
λ求偏导:
L
x
=
1
+
λ
(
1
+
2
y
)
=
0
L
y
=
2
+
λ
(
2
+
2
x
)
=
0
L
λ
=
x
+
2
y
+
2
x
y
−
8
=
0
L_x = 1+\lambda(1+2y)=0\\ L_y = 2+\lambda(2+2x)=0\\ \ \ \ \ \ \ L_\lambda = x+2y+2xy-8=0\\
Lx=1+λ(1+2y)=0Ly=2+λ(2+2x)=0 Lλ=x+2y+2xy−8=0
三个方程三个未知数,可以求得x=2,y=1。
即当x=2,y=1时x+2y的 最小值为4。
用于对优化问题的转换
例如:maxmin -> minmax
默认约束优化问题是弱对偶关系,当满足KNN条件时,具有强对偶关系,即二者等价。
用于对高维特征的扩展,当样本数或超平面维数过大时,可以利用核技巧优化,将问题转化为有限维问题。
本章学习的是支持向量机中最流行的一种实现–序列最小优化(Sequential Minimal Optimization)算法。
优点:泛化错误率低,计算开销不大,结果容易理解 缺点:对参数调节和和函数的选择敏感,原始分类器不加修改仅适用于处理二分类问题。 适用数据类型:数值型和标称型数据
原理性部分
分割超平面集将不同类别的数据点分割的平面,支持向量机就是由这些分割超平面组成的分类器。
**支持向量(support vector)是指离分割超平面(separating byperplane)**最近的那些点,令支持向量的间隔最大化,就是构造支持向量机的优化目标
图中a,b,c都可以认为是一个分割超平面,显然,超分割平面b的鲁棒性即泛化能力要优于a,c。
如何建立数理模型来寻找到优质超分割平面?
那么就需要定义一个优化目标,即距离超分割面最近的那些的间隔最大化。
a
r
g
m
a
x
w
,
b
{
m
i
n
(
w
T
∗
x
+
b
)
}
arg\ max_{w,b}\ \{ min(w^T*x+b) \}
arg maxw,b {min(wT∗x+b)}
其中w,b就是待优化的参数,假如直接求解该模型是十分困难的一件事,于是开始了一系列的模型转换,感兴趣的可以结合文章开头的视频以及西瓜书进一步学习,这里只将流程梳理一遍。
同时问题的参数也变为 α \alpha α,接下来便是对该问题进行求解
机器学习实战中使用的方法时SMO算法。
SMO算法伪代码:
创建一个alpha向量并将其初始化为0向量 当迭代次数小于最大迭代次数时(外循环): 对数据集中的每个数据向量(内循环): 如果该数据向量可以被优化: 随机选择另外一个数据向量 同时优化这两个向量 如果两个向量都不能被优化,退出内循环 如果所有向量都没有被优化,增加迭代数目,继续下一次循环
实际上SMO算法是一个较为有效的贪心算法。
code
''' Created on Nov 4, 2010 Chapter 5 source file for Machine Learing in Action @author: Peter ''' from numpy import * from time import sleep def loadDataSet(fileName): dataMat = [] labelMat = [] fr = open(fileName) for line in fr.readlines(): lineArr = line.strip().split('\t') dataMat.append([float(lineArr[0]), float(lineArr[1])]) labelMat.append(float(lineArr[2])) return dataMat, labelMat def selectJrand(i, m): j = i # we want to select any J not equal to i while (j == i): j = int(random.uniform(0, m)) return j def clipAlpha(aj, H, L): if aj > H: aj = H if L > aj: aj = L return aj def smoSimple(dataMatIn, classLabels, C, toler, maxIter): dataMatrix = mat(dataMatIn) labelMat = mat(classLabels).transpose() b = 0 m, n = shape(dataMatrix) alphas = mat(zeros((m, 1))) iter = 0 while (iter < maxIter): alphaPairsChanged = 0 for i in range(m): fXi = float(multiply(alphas, labelMat).T * (dataMatrix * dataMatrix[i, :].T)) + b Ei = fXi - float(labelMat[i]) # if checks if an example violates KKT conditions if ((labelMat[i] * Ei < -toler) and (alphas[i] < C)) or ((labelMat[i] * Ei > toler) and (alphas[i] > 0)): j = selectJrand(i, m) fXj = float(multiply(alphas, labelMat).T * (dataMatrix * dataMatrix[j, :].T)) + b Ej = fXj - float(labelMat[j]) alphaIold = alphas[i].copy() alphaJold = alphas[j].copy() if (labelMat[i] != labelMat[j]): L = max(0, alphas[j] - alphas[i]) H = min(C, C + alphas[j] - alphas[i]) else: L = max(0, alphas[j] + alphas[i] - C) H = min(C, alphas[j] + alphas[i]) if L == H: print("L==H") continue eta = 2.0 * dataMatrix[i, :] * dataMatrix[j, :].T - dataMatrix[i, :] * dataMatrix[i, :].T - dataMatrix[ j, :] * dataMatrix[ j, :].T if eta >= 0 : print("eta>=0") continue alphas[j] -= labelMat[j] * (Ei - Ej) / eta alphas[j] = clipAlpha(alphas[j], H, L) if (abs(alphas[j] - alphaJold) < 0.00001): print("j not moving enough") continue alphas[i] += labelMat[j] * labelMat[i] * (alphaJold - alphas[j]) # update i by the same amount as j # the update is in the oppostie direction b1 = b - Ei - labelMat[i] * (alphas[i] - alphaIold) * dataMatrix[i, :] * dataMatrix[i, :].T - labelMat[ j] * (alphas[j] - alphaJold) * dataMatrix[i, :] * dataMatrix[j, :].T b2 = b - Ej - labelMat[i] * (alphas[i] - alphaIold) * dataMatrix[i, :] * dataMatrix[j, :].T - labelMat[ j] * (alphas[j] - alphaJold) * dataMatrix[j, :] * dataMatrix[j, :].T if (0 < alphas[i]) and (C > alphas[i]): b = b1 elif (0 < alphas[j]) and (C > alphas[j]): b = b2 else: b = (b1 + b2) / 2.0 alphaPairsChanged += 1 print("iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged)) if (alphaPairsChanged == 0): iter += 1 else: iter = 0 print("iteration number: %d" % iter) return b, alphas def kernelTrans(X, A, kTup): # calc the kernel or transform data to a higher dimensional space m, n = shape(X) K = mat(zeros((m, 1))) if kTup[0] == 'lin': K = X * A.T # linear kernel elif kTup[0] == 'rbf': for j in range(m): deltaRow = X[j, :] - A K[j] = deltaRow * deltaRow.T K = exp(K / (-1 * kTup[1] ** 2)) # divide in NumPy is element-wise not matrix like Matlab else: raise NameError('Houston We Have a Problem -- \ That Kernel is not recognized') return K class optStruct: def __init__(self, dataMatIn, classLabels, C, toler, kTup): # Initialize the structure with the parameters self.X = dataMatIn self.labelMat = classLabels self.C = C self.tol = toler self.m = shape(dataMatIn)[0] self.alphas = mat(zeros((self.m, 1))) self.b = 0 self.eCache = mat(zeros((self.m, 2))) # first column is valid flag self.K = mat(zeros((self.m, self.m))) for i in range(self.m): self.K[:, i] = kernelTrans(self.X, self.X[i, :], kTup) def calcEk(oS, k): fXk = float(multiply(oS.alphas, oS.labelMat).T * oS.K[:, k] + oS.b) Ek = fXk - float(oS.labelMat[k]) return Ek def selectJ(i, oS, Ei): # this is the second choice -heurstic, and calcs Ej maxK = -1 maxDeltaE = 0 Ej = 0 oS.eCache[i] = [1, Ei] # set valid #choose the alpha that gives the maximum delta E validEcacheList = nonzero(oS.eCache[:, 0].A)[0] if (len(validEcacheList)) > 1: for k in validEcacheList: # loop through valid Ecache values and find the one that maximizes delta E if k == i: continue # don't calc for i, waste of time Ek = calcEk(oS, k) deltaE = abs(Ei - Ek) if (deltaE > maxDeltaE): maxK = k maxDeltaE = deltaE Ej = Ek return maxK, Ej else: # in this case (first time around) we don't have any valid eCache values j = selectJrand(i, oS.m) Ej = calcEk(oS, j) return j, Ej def updateEk(oS, k): # after any alpha has changed update the new value in the cache Ek = calcEk(oS, k) oS.eCache[k] = [1, Ek] def innerL(i, oS): Ei = calcEk(oS, i) if ((oS.labelMat[i] * Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or ( (oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)): j, Ej = selectJ(i, oS, Ei) # this has been changed from selectJrand alphaIold = oS.alphas[i].copy() alphaJold = oS.alphas[j].copy() if (oS.labelMat[i] != oS.labelMat[j]): L = max(0, oS.alphas[j] - oS.alphas[i]) H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i]) else: L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C) H = min(oS.C, oS.alphas[j] + oS.alphas[i]) if L == H: print("L==H") return 0 eta = 2.0 * oS.K[i, j] - oS.K[i, i] - oS.K[j, j] # changed for kernel if eta >= 0: print("eta>=0") return 0 oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta oS.alphas[j] = clipAlpha(oS.alphas[j], H, L) updateEk(oS, j) # added this for the Ecache if (abs(oS.alphas[j] - alphaJold) < 0.00001): print("j not moving enough") return 0 oS.alphas[i] += oS.labelMat[j] * oS.labelMat[i] * (alphaJold - oS.alphas[j]) # update i by the same amount as j updateEk(oS, i) # added this for the Ecache #the update is in the oppostie direction b1 = oS.b - Ei - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, i] - oS.labelMat[j] * ( oS.alphas[j] - alphaJold) * oS.K[i, j] b2 = oS.b - Ej - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, j] - oS.labelMat[j] * ( oS.alphas[j] - alphaJold) * oS.K[j, j] if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1 elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2 else: oS.b = (b1 + b2) / 2.0 return 1 else: return 0 def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup=('lin', 0)): # full Platt SMO oS = optStruct(mat(dataMatIn), mat(classLabels).transpose(), C, toler, kTup) iter = 0 entireSet = True alphaPairsChanged = 0 while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): alphaPairsChanged = 0 if entireSet: # go over all for i in range(oS.m): alphaPairsChanged += innerL(i, oS) print("fullSet, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged)) iter += 1 else: # go over non-bound (railed) alphas nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0] for i in nonBoundIs: alphaPairsChanged += innerL(i, oS) print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged)) iter += 1 if entireSet: entireSet = False # toggle entire set loop elif (alphaPairsChanged == 0): entireSet = True print("iteration number: %d" % iter) return oS.b, oS.alphas def calcWs(alphas, dataArr, classLabels): X = mat(dataArr) labelMat = mat(classLabels).transpose() m, n = shape(X) w = zeros((n, 1)) for i in range(m): w += multiply(alphas[i] * labelMat[i], X[i, :].T) return w def testRbf(k1=1.3): dataArr, labelArr = loadDataSet('testSetRBF.txt') b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', k1)) # C=200 important datMat = mat(dataArr) labelMat = mat(labelArr).transpose() svInd = nonzero(alphas.A > 0)[0] sVs = datMat[svInd] # get matrix of only support vectors labelSV = labelMat[svInd] print("there are %d Support Vectors" % shape(sVs)[0]) m, n = shape(datMat) errorCount = 0 for i in range(m): kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1)) predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print("the training error rate is: %f" % (float(errorCount) / m)) dataArr, labelArr = loadDataSet('testSetRBF2.txt') errorCount = 0 datMat = mat(dataArr) labelMat = mat(labelArr).transpose() m, n = shape(datMat) for i in range(m): kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1)) predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print("the test error rate is: %f" % (float(errorCount) / m)) def img2vector(filename): returnVect = zeros((1, 1024)) fr = open(filename) for i in range(32): lineStr = fr.readline() for j in range(32): returnVect[0, 32 * i + j] = int(lineStr[j]) return returnVect def loadImages(dirName): from os import listdir hwLabels = [] trainingFileList = listdir(dirName) # load the training set m = len(trainingFileList) trainingMat = zeros((m, 1024)) for i in range(m): fileNameStr = trainingFileList[i] fileStr = fileNameStr.split('.')[0] # take off .txt classNumStr = int(fileStr.split('_')[0]) if classNumStr == 9: hwLabels.append(-1) else: hwLabels.append(1) trainingMat[i, :] = img2vector('%s/%s' % (dirName, fileNameStr)) return trainingMat, hwLabels def testDigits(kTup=('rbf', 10)): dataArr, labelArr = loadImages('trainingDigits') b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup) datMat = mat(dataArr) labelMat = mat(labelArr).transpose() svInd = nonzero(alphas.A > 0)[0] sVs = datMat[svInd] labelSV = labelMat[svInd] print("there are %d Support Vectors" % shape(sVs)[0]) m, n = shape(datMat) errorCount = 0 for i in range(m): kernelEval = kernelTrans(sVs, datMat[i, :], kTup) predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print("the training error rate is: %f" % (float(errorCount) / m)) dataArr, labelArr = loadImages('testDigits') errorCount = 0 datMat = mat(dataArr) labelMat = mat(labelArr).transpose() m, n = shape(datMat) for i in range(m): kernelEval = kernelTrans(sVs, datMat[i, :], kTup) predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print("the test error rate is: %f" % (float(errorCount) / m)) '''#######******************************** Non-Kernel VErsions below ''' #######******************************** class optStructK: def __init__(self, dataMatIn, classLabels, C, toler): # Initialize the structure with the parameters self.X = dataMatIn self.labelMat = classLabels self.C = C self.tol = toler self.m = shape(dataMatIn)[0] self.alphas = mat(zeros((self.m, 1))) self.b = 0 self.eCache = mat(zeros((self.m, 2))) # first column is valid flag def calcEkK(oS, k): fXk = float(multiply(oS.alphas, oS.labelMat).T * (oS.X * oS.X[k, :].T)) + oS.b Ek = fXk - float(oS.labelMat[k]) return Ek def selectJK(i, oS, Ei): # this is the second choice -heurstic, and calcs Ej maxK = -1 maxDeltaE = 0 Ej = 0 oS.eCache[i] = [1, Ei] # set valid #choose the alpha that gives the maximum delta E validEcacheList = nonzero(oS.eCache[:, 0].A)[0] if (len(validEcacheList)) > 1: for k in validEcacheList: # loop through valid Ecache values and find the one that maximizes delta E if k == i: continue # don't calc for i, waste of time Ek = calcEk(oS, k) deltaE = abs(Ei - Ek) if (deltaE > maxDeltaE): maxK = k maxDeltaE = deltaE Ej = Ek return maxK, Ej else: # in this case (first time around) we don't have any valid eCache values j = selectJrand(i, oS.m) Ej = calcEk(oS, j) return j, Ej def updateEkK(oS, k): # after any alpha has changed update the new value in the cache Ek = calcEk(oS, k) oS.eCache[k] = [1, Ek] def innerLK(i, oS): Ei = calcEk(oS, i) if ((oS.labelMat[i] * Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or ( (oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)): j, Ej = selectJ(i, oS, Ei) # this has been changed from selectJrand alphaIold = oS.alphas[i].copy() alphaJold = oS.alphas[j].copy() if (oS.labelMat[i] != oS.labelMat[j]): L = max(0, oS.alphas[j] - oS.alphas[i]) H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i]) else: L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C) H = min(oS.C, oS.alphas[j] + oS.alphas[i]) if L == H: print ("L==H") return 0 eta = 2.0 * oS.X[i, :] * oS.X[j, :].T - oS.X[i, :] * oS.X[i, :].T - oS.X[j, :] * oS.X[j, :].T if eta >= 0: print("eta>=0") return 0 oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta oS.alphas[j] = clipAlpha(oS.alphas[j], H, L) updateEk(oS, j) # added this for the Ecache if (abs(oS.alphas[j] - alphaJold) < 0.00001): print("j not moving enough") return 0 oS.alphas[i] += oS.labelMat[j] * oS.labelMat[i] * (alphaJold - oS.alphas[j]) # update i by the same amount as j updateEk(oS, i) # added this for the Ecache #the update is in the oppostie direction b1 = oS.b - Ei - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.X[i, :] * oS.X[i, :].T - oS.labelMat[j] * ( oS.alphas[j] - alphaJold) * oS.X[i, :] * oS.X[j, :].T b2 = oS.b - Ej - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.X[i, :] * oS.X[j, :].T - oS.labelMat[j] * ( oS.alphas[j] - alphaJold) * oS.X[j, :] * oS.X[j, :].T if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1 elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2 else: oS.b = (b1 + b2) / 2.0 return 1 else: return 0 def smoPK(dataMatIn, classLabels, C, toler, maxIter): # full Platt SMO oS = optStruct(mat(dataMatIn), mat(classLabels).transpose(), C, toler) iter = 0 entireSet = True alphaPairsChanged = 0 while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): alphaPairsChanged = 0 if entireSet: # go over all for i in range(oS.m): alphaPairsChanged += innerL(i, oS) print("fullSet, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged)) iter += 1 else: # go over non-bound (railed) alphas nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0] for i in nonBoundIs: alphaPairsChanged += innerL(i, oS) print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged)) iter += 1 if entireSet: entireSet = False # toggle entire set loop elif (alphaPairsChanged == 0): entireSet = True print("iteration number: %d" % iter) return oS.b, oS.alphas
import svm as svmMLiA import numpy as np import matplotlib.pyplot as plt dataArr,labelArr =svmMLiA.loadDataSet('testSet.txt') # 数据集、类别标签、常数C、容错率、退出最大循环数 b,alphas= svmMLiA.smoSimple(dataArr,labelArr,0.6,0.001,40) plt.rcParams['font.sans-serif']=['SimHei'] plt.rcParams['axes.unicode_minus'] = False # matplotlib画图中中文显示会有问题,需要这两行设置默认字体 plt.xlabel('X') plt.ylabel('Y') plt.xlim(xmax=12,xmin=-2.5) plt.ylim(ymax=10,ymin=-10) # 画两条(0-9)的坐标轴并设置轴标签x,y x1=list();x2=list() y1=list();y2=list() for j in range(len(labelArr)): if labelArr[j]==1: x1.append(dataArr[j][0]) y1.append(dataArr[j][1]) else : x2.append(dataArr[j][0]) y2.append(dataArr[j][1]) print(x1) print(y1) colors1 = '#00CED1' colors2 = '#DC143C' area = np.pi*4**2 plt.scatter(x1,y1,s=area,c=colors1,alpha=0.4,label='类别A') plt.scatter(x2,y2,s=area,c=colors2,alpha=0.4,label='类别B') plt.plot() plt.legend() plt.savefig('1.png',dpi=300) plt.show()
结语