Python爬虫之利用requests，BeautifulSoup爬取小说标题、章节

本文主要是介绍Python爬虫之利用requests，BeautifulSoup爬取小说标题、章节，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

爬取雪鹰领主标题和章节内容为列：
查看网页的源代码，如下图所示：
获取html内容部分

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko'}
response = requests.get('https://quanxiaoshuo.com/177913/', headers=headers)

获取标题代码部分

from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser', from_encoding='utf-8')#html.parser或lxml
title = []
for volumn in soup.find_all(class_="volumn"):
    b = volumn.find('b')
    if b!=None:
        b_title = b.string
        title.append({'volumn': b_title})

获取章节代码部分

chapters = []
for chapter in soup.find_all(class_='chapter'):# 获取所有的a标记中url和章节内容
    a = chapter.find('a')
    chapter_title = a.get('title')
    chapters.append({'chapter_title': chapter_title})

保存为json数据部分

import json
with open('xylz_title.json', 'w') as fp:
    json.dump(title, fp=fp, indent=4)
with open('xylz_chapters.json', 'w') as fp:
    json.dump(chapters, fp=fp, indent=4)

完整代码如下：

import requests
from bs4 import BeautifulSoup
import json

#获取html内容
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko'}
response = requests.get('https://quanxiaoshuo.com/177913/', headers=headers)
#分析结构，抽取要标记的位置。获取标题与章节
soup = BeautifulSoup(response.text, 'html.parser', from_encoding='utf-8')#html.parser或lxml
title = []
for volumn in soup.find_all(class_="volumn"):
    b = volumn.find('b')
    if b!=None:
        b_title = b.string# 获取标题
        title.append({'volumn': b_title})
chapters = []
for chapter in soup.find_all(class_='chapter'):# 获取所有的a标记中章节
    a = chapter.find('a')
    chapter_title = a.get('title')
    chapters.append({'chapter_title': chapter_title})
#将标题，章节和链接进行JSON储存
with open('xylz_title.json', 'w') as fp:
    json.dump(title, fp=fp, indent=4)
with open('xylz_chapters.json', 'w') as fp:
    json.dump(chapters, fp=fp, indent=4)

这篇关于Python爬虫之利用requests，BeautifulSoup爬取小说标题、章节的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

Python教程

Python爬虫之利用requests，BeautifulSoup爬取小说标题、章节

前端开发

后端开发

移动端开发

数据库

服务器运维

人工智能

区块链

游戏开发

网站运营

大数据/云计算

软件工程

软件/开发工具使用

资讯