首先一个简单的requests请求例子(直接复制到PyCharm运行即可):
import requests url = "https://www.baidu.com" # 将爬虫程序伪装成浏览器,从而获得更完整的响应数据 headers = { "User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36" } # 定义响应 response = requests.get(url, headers = headers) # 打印网页源码的str类型数据 print(response.text)
如果以上例子中requests 标红, 请向下看文章内容
作用
发送http请求,获取响应数据
安装(requests模块非自带模块,需进行下载安装)
在桌面打开win+r 输入cmd
在黑框中输入(等待下载安装结束):
pip install requests
发送get请求
response.text 和 response.content 的区别
respons.text ( == response.content.decode)
response.content
response.content.decode( ) 默认utf-8
response.content.decode(“GBK”)
常见的编码字符集:
手动设定编码格式: respons.encoding = "utf-8" print(text)
等价于
print(response.content.decode())
可以得到更详细的响应(让服务器以为是浏览器在操作)
import requests url = "www.baidu.com" headers = { #内容为json形式 'User_Agent' : '', 'Cookie' : '' } response = requests.get(url, headers = headers) print(response.content.decode()) # 获取响应数据量 print(len(response.content.decode()))
https://www.baidu.com/s?wd=python
2.发送请求的时候设置参数字典(将params添加至get方法中)
# 添加的参数格式 params = { # json 格式 " " : " ", " " : " " }
实例:
import requests url = 'https://www.baidu.com/s?' # 添加请求头 headers = { #内容为json形式 'User-Agent': '' } #构架参数字典 params = { 'wd' : 'python' } # 将参数字典添加至get方法中 response = requests.get(url, headers = headers, params = params) #返回相应网页地址 print(response.url)
用来做状态的保持
方法同ua
使用cookie参数保持会话
cookies = {
’ ’ : ’ ’
}
两种使用cookie的形式:
import requests url = "https://home.jd.com/" headers = { "User-Agent": "", "cookie": "" } response = requests.get(url, headers=headers) print(response.text)
import requests url = "https://home.jd.com/" headers = { "User-Agent": "" } cookie = { "cookie": "" } response = requests.get(url, headers=headers, cookies=cookie) print(response.text)
import requests url = 'https://google.com' # 添加一个timeout参数,当等待2秒还没响应时,自动停止 response = requests.get(url, timeout=2)
分类:
使用:
proxies = { # json格式 "http" : "http://·····" #或 "https" : "https://·····" } response = requests.get(url, proxies=proxies)
response = requests.get(url, verify=False)