1、对抗性搜索
对抗搜索也称为博弈搜索。
在人工智能领域可以定义为:在一定规则条件下,有完整信息的、确定性的、轮流行动的、两个游戏者的零和游戏(如象棋)。
零和游戏:即你死我活,损人利己,为了我取得最佳结果,对手必须取得最差结果。
既然是游戏,那么就可以对其进行建模了:
初始状态:游戏开始时的初始值。
后继函数:进行决策,实现状态之间的迁移。
终止测试:测试判断游戏是否结束,游戏结束的状态称为终止状态。
评估函数:判断游戏的结果,谁输谁赢,以及赢了多少。
2、MiniMax - 冷酷的上帝
Minimax算法常用于棋类等由两方较量的游戏和程序。
Minimax算法是一个零总和算法,即一方要在可选的选项中选择将其优势最大化的选择,另一方则选择令对手优势最小化的方法。
MiniMax也是一个悲观算法,它假定对手是永不犯错的,在完美对手下寻找最小的损失。
MiniMax算法伪码: function minimax(node, depth) if node is a terminal node or depth = 0 //终止条件 return the heuristic value of node if the adversary is to play at node //完美对手,总是选择对其最优的 let α := +∞ foreach child of node α := min(α, minimax(child, depth-1)) else {we are to play at node} // 我方选择,对我方最有利的 let α := -∞ foreach child of node α := max(α, minimax(child, depth-1)) return α
如何将MiniMax应用于多人博弈?
将博弈参与者抽象为我方和对手。
多个对手表现为到对手出手时走多次。
每次对抗,我方走一次,对手走n-1次(n为参与者数量)。
即是用向量来代替最小化值。
CS-188 MiniMax代理实现:
class MinimaxAgent(MultiAgentSearchAgent): """ Your minimax agent (question 2) """ def getAction(self, gameState): """ Returns the minimax action from the current gameState using self.depth and self.evaluationFunction. Here are some method calls that might be useful when implementing minimax. gameState.getLegalActions(agentIndex): Returns a list of legal actions for an agent agentIndex=0 means Pacman, ghosts are >= 1 gameState.generateSuccessor(agentIndex, action): Returns the successor game state after an agent takes an action gameState.getNumAgents(): Returns the total number of agents in the game gameState.isWin(): Returns whether or not the game state is a winning state gameState.isLose(): Returns whether or not the game state is a losing state """ "*** YOUR CODE HERE ***" # 获取agent的index Ghosts = [i for i in range(1, gameState.getNumAgents())] # 判断游戏是否结束或者是否到达根节点 def terminate(state, d): return state.isWin() or state.isLose() or d == self.depth # 计算节点的最小化值 def min_value(state, d, ghost): if terminate(state, d): return self.evaluationFunction(state) # Min节点值初始化为正无穷大 v = float('inf') for action in state.getLegalActions(ghost): if ghost == Ghosts[-1]: v = min(v, max_value(state.generateSuccessor(ghost, action), d + 1)) else: v = min(v, min_value(state.generateSuccessor(ghost, action), d, ghost + 1)) return v # 计算节点的最小化值 def max_value(state, d): if terminate(state, d): return self.evaluationFunction(state) # Min节点值初始化为负无穷大 v = -float('inf') for action in state.getLegalActions(0): v = max(v, min_value(state.generateSuccessor(0, action), d, 1)) return v # 获得每一个可行的action和它的收益 res = [(action, min_value(gameState.generateSuccessor(0, action), 0, 1)) for action in gameState.getLegalActions(0)] # 按收益大小从小到大排序 res.sort(key=lambda k: k[1]) # 返回最大收益对应的action return res[-1][0]
3、Alpha-Beta剪枝
Alpha-beta剪枝是一种找到最佳minimax步骤的方法,同时可以避免搜索不可能被选择的步骤的子树
Alpha:可能步骤的最大下界,初始化为正无穷大
Beta:可能步骤的最小上界,初始化为负无穷大
由于MiniMax算法假设对手是完美的,即我方的选择是在对手选出的最小上界下的最大值,
因此有效路径满足: α ≤ value ≤ β
对于不满足该条件的路径,可以把它剪去,不用去扩展了。
剪枝只影响计算效率,不影响最终结果。
function alphabeta(node, depth, α, β, maximizingPlayer) // node = 节点,depth = 当前层次,maximizingPlayer = 大分玩家 if depth = 0 or node is a terminal node return the heuristic value of the node if we are to play at node v := -∞ foreach child of node v := max(v, alphabeta(child, depth - 1, α, β, FALSE)) // child = 子節點 α := max(α, v) if β ≤ α break // β裁剪 return v else {the adversary is to play at node} v := ∞ foreach child of node v := min(v, alphabeta(child, depth - 1, α, β, TRUE)) β := min(β, v) if β ≤ α break // α裁剪 return v
CS-188 带剪枝的MiniMax代理实现:
class AlphaBetaAgent(MultiAgentSearchAgent): """ Your minimax agent with alpha-beta pruning (question 3) """ def getAction(self, gameState): """ Returns the minimax action using self.depth and self.evaluationFunction """ "*** YOUR CODE HERE ***" now_value = -float('inf') # α初始化为负无穷大 alpha = -float('inf') # β初始化为正无穷大 beta = float('inf') nextAction = Directions.STOP # 在所有的可选节点中,选择最大收益的节点 legal_actions = gameState.getLegalActions(0).copy() for next_action in legal_actions: nextState = gameState.generateSuccessor(0, next_action) next_value = self.get_node_value(nextState, 0, 1, alpha, beta) if next_value > now_value: now_value, nextAction = next_value, next_action alpha = max(alpha, now_value) return nextAction def get_node_value(self, gameState, cur_depth=0, agent_index=0, alpha=-1e10, beta=1e10): """ Using self-defined function, alpha_value(), beta_value() to choose the most appropriate action Only when it's the final state, can we get the value of each node, using the self.evaluationFunction(gameState) Otherwise we just get the alpha/beta value we defined here. """ max_party = [0,] min_party = list(range(1, gameState.getNumAgents())) # 达到尽头,返回评价结果 if cur_depth == self.depth or gameState.isLose() or gameState.isWin(): return self.evaluationFunction(gameState) # 目前节点是最大化节点,尝试更新α的值 elif agent_index in max_party: return self.alpha_value(gameState, cur_depth, agent_index, alpha, beta) # 目前节点是最小化节点,尝试更新β的值 elif agent_index in min_party: return self.beta_value(gameState, cur_depth, agent_index, alpha, beta) else: print('Errors occur in your party division !!! ') def alpha_value(self, gameState, cur_depth, agent_index, alpha=-float('inf'), beta=float('inf')): # α初始化为负无穷大 v = -float('inf') legal_actions = gameState.getLegalActions(agent_index) for index, action in enumerate(legal_actions): next_v = self.get_node_value(gameState.generateSuccessor(agent_index, action), cur_depth, agent_index + 1, alpha, beta) v = max(v, next_v) # 出现了α>β的情况,剪枝 if v > beta: return v alpha = max(alpha, v) return v def beta_value(self, gameState, cur_depth, agent_index, alpha=-float('inf'), beta=float('inf')): # α初始化为正无穷大 v = float('inf') legal_actions = gameState.getLegalActions(agent_index) # 多人博弈,一个pacman,多个ghost for index, action in enumerate(legal_actions): if agent_index == gameState.getNumAgents() - 1: # pacman进行决策 next_v = self.get_node_value(gameState.generateSuccessor(agent_index, action), cur_depth + 1, 0, alpha, beta) v = min(v, next_v) # 出现了α>β的情况,剪枝 if v < alpha: return v else: # ghost进行决策 next_v = self.get_node_value(gameState.generateSuccessor(agent_index, action), cur_depth, agent_index + 1, alpha, beta) v = min(v, next_v) # 出现了α>β的情况,剪枝 if v < alpha: return v beta = min(beta, v) return v
4、ExpectiMax
MiniMax假定对手永不犯错,但是,这不太现实。
而且,在假设对手会犯错的时候,智能体的决策会更好。
Expectimax搜索算法是用于最大化期望效用博弈论算法。它是Minimax 算法的一种变体。
Minimax 假设对手(最小化者)以最佳方式进行游戏,而 Expectimax 则不然。
Expectimax 引入了机会节点,采用机会节点与最大最小化节点交替来使得对手不那么“完美”。
function expectiminimax(node, depth) if node is a terminal node or depth = 0 return the heuristic value of node if the adversary is to play at node // Return value of minimum-valued child node let α := +∞ foreach child of node α := min(α, expectiminimax(child, depth-1)) else if we are to play at node // Return value of maximum-valued child node let α := -∞ foreach child of node α := max(α, expectiminimax(child, depth-1)) else if random event at node // Return weighted average of all child nodes' values let α := 0 foreach child of node α := α + (Probability[child] × expectiminimax(child, depth-1)) return α
CS-188 ExpectiMax代理实现:
class ExpectimaxAgent(MultiAgentSearchAgent): """ Your expectimax agent (question 4) """ def getAction(self, gameState): """ Returns the expectimax action using self.depth and self.evaluationFunction All ghosts should be modeled as choosing uniformly at random from their legal moves. """ "*** YOUR CODE HERE ***" maxValue = -float('inf') maxAction = Directions.STOP for action in gameState.getLegalActions(agentIndex=0): sucState = gameState.generateSuccessor(action=action, agentIndex=0) sucValue = self.expNode(sucState, currentDepth=0, agentIndex=1) if sucValue > maxValue: maxValue = sucValue maxAction = action return maxAction # 判断是否结束 def terminate(self, state, d): return state.isWin() or state.isLose() or d == self.depth # 计算节点的最大化值 def maxNode(self, gameState, currentDepth): if self.terminate(gameState, currentDepth): return self.evaluationFunction(gameState) maxValue = -float('inf') # 计算节点最大化值从它的后继chance节点来获取 for action in gameState.getLegalActions(agentIndex=0): sucState = gameState.generateSuccessor(action=action, agentIndex=0) sucValue = self.expNode(sucState, currentDepth=currentDepth, agentIndex=1) if sucValue > maxValue: maxValue = sucValue return maxValue # 计算chance节点值 def expNode(self, gameState, currentDepth, agentIndex): if self.terminate(gameState, currentDepth): return self.evaluationFunction(gameState) numAction = len(gameState.getLegalActions(agentIndex=agentIndex)) totalValue = 0.0 numAgent = gameState.getNumAgents() for action in gameState.getLegalActions(agentIndex=agentIndex): sucState = gameState.generateSuccessor(agentIndex=agentIndex, action=action) if agentIndex == numAgent - 1: sucValue = self.maxNode(sucState, currentDepth=currentDepth + 1) # 每个ghost都要计算一次(相当于用向量代替了min值) else: sucValue = self.expNode(sucState, currentDepth=currentDepth, agentIndex=agentIndex + 1) totalValue += sucValue # 返回后继节点的平均值 return totalValue / numAction
5、betterEvaluationFunction
除了对抗博弈之外,难道就没有其他的解法了嘛?
其实,自己玩一下pacman游戏就会发现,
我们要做的,就是在不被ghost碰到的同时尽快地吃完食物(离食物尽量近,离ghost尽量远)。
因此,可以对食物和ghost设一些权值,在下一步是食物时,给pacman一些奖励,告诉它往食物方向走。
在下一步要碰到ghost时,给pacman一些惩罚,告诉它不能往那边走。
通过将食物权值设为正,将ghost权值设为负,就可以轻松实现这一点。
权值的大小则对应了食物诱惑的大小和对ghost的恐惧大小,可以进行参数上的调节。
同时,不能忘记:pacman吃了大豆子之后,是可以不怕ghost的,这时可以激进一点,主动去吃ghost。
def betterEvaluationFunction(currentGameState): """ Your extreme ghost-hunting, pellet-nabbing, food-gobbling, unstoppable evaluation function (question 5). """ newPos = currentGameState.getPacmanPosition() newFood = currentGameState.getFood() newGhostStates = currentGameState.getGhostStates() WEIGHT_FOOD = 10.0 # food的权值 WEIGHT_GHOST = -10.0 # Ghost的权值 WEIGHT_SCARED_GHOST = 100.0 # Scared ghost的权值 # 基于目前的得分来进行计算 score = currentGameState.getScore() # 计算和食物的距离 distancesToFoodList = [util.manhattanDistance(newPos, foodPos) for foodPos in newFood.asList()] #距离食物越近,奖励越多 if len(distancesToFoodList) > 0: score += WEIGHT_FOOD / min(distancesToFoodList) else: score += WEIGHT_FOOD for ghost in newGhostStates: distance = manhattanDistance(newPos, ghost.getPosition()) if distance > 0: # 距离Sacred ghost越近奖励越多,鼓励它去进攻 if ghost.scaredTimer > 0: score += WEIGHT_SCARED_GHOST / distance # 否则,躲开 else: score += WEIGHT_GHOST / distance # 下一步就要碰到ghost了,赶紧跑! else: return -float('inf') # Pacman is dead at this point return score # Abbreviation better = betterEvaluationFunction
权值的设定会影响得分,一下为几次调参的结果:
1、WeightFood=10, WeightGhost=-1, WeightSacredGhost=100:
2、WeightFood=5, WeightGhost=-1, WeightSacredGhost=100:
3、WeightFood=10, WeightGhost=-10, WeightSacredGhost=100:
4、WeightFood=5, WeightGhost=-10, WeightSacredGhost=50
5、WeightFood=5, WeightGhost=-10, WeightSacredGhost=100
可以看出:
第二组最好。