Python教程

python爬虫的两种方式

本文主要是介绍python爬虫的两种方式,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!

1、requests方式

(1)无头部信息

import requests
url = "https://www.cnblogs.com/dearvee/p/6558571.html"
response = requests.get(url)
response.encoding = 'utf-8'
print(response.text)

(2)有头部信息

import requests
url = "https://www.cnblogs.com/dearvee/p/6558571.html"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"}
response = requests.get(url, headers=headers)
response.encoding = 'utf-8'
print(response.text)


2、urllib.request方式

(1)无Request请求

from urllib import request
url = "https://www.cnblogs.com/dearvee/p/6558571.html"
response = request.urlopen(url)
print(response.read().decode('utf-8'))

(2)构造Request请求

from urllib import request
url = "https://www.cnblogs.com/dearvee/p/6558571.html"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"}
req = request.Request(url, headers=headers)
response = request.urlopen(req)
print(response.read().decode('utf-8'))

 

3、捕获错误信息

from urllib import request, error
url = "https://www.douban.com"
try:
  req = request.Request(url)
  response = request.urlopen(req)
  print(response.read().decode('utf-8'))
except error.HTTPError as e:
  print(e)

 

 4、随机获取头部信息

from fake_useragent import UserAgent
ua = UserAgent()
print(ua.ie) #随机打印ie浏览器任意版本
print(ua.firefox) #随机打印firefox浏览器任意版本
print(ua.chrome) #随机打印chrome浏览器任意版本
print(ua.random) #随机打印任意厂家的浏览器

 

这篇关于python爬虫的两种方式的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!