没想到上面好看的跳舞小姐姐蛮多的，【Python爬虫】采集微博视频数据

本文主要是介绍没想到上面好看的跳舞小姐姐蛮多的，【Python爬虫】采集微博视频数据，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

前言

随时随地发现新鲜事！微博带你欣赏世界上每一个精彩瞬间，了解每一个幕后故事。分享你想表达的，让全世界都能听到你的心声！今天我们通过python去采集微博当中好看的视频！

没错，今天的目标是微博数据采集，爬的是那些好看的小姐姐视频

对于本篇文章有疑问的同学可以加【资料白嫖、解答交流群：910981974】

知识点

requests
pprint

开发环境

版本：python 3.8
-编辑器：pycharm 2021.2

爬虫原理

作用：批量获取互联网数据(文本, 图片, 音频, 视频)
本质：一次次的请求与响应

案例实现

1. 导入所需模块

import requests
import pprint

2. 找到目标网址

打开开发者工具，选中Fetch/XHR，选中数据所在的标签，找到目标所在url

 https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor

3. 发送网络请求

headers = {
    'cookie': '',
    'referer': 'https://weibo.com/tv/channel/4379160563414111/editor',
    'user-agent': '',
}
data = {
    'data': '{"Component_Channel_Editor":{"cid":"4379160563414111","count":9}}'
}
url = 'https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor'
json_data = requests.post(url=url, headers=headers, data=data).json()

4. 获取数据

json_data_2 = requests.post(url=url_1, headers=headers, data=data_1).json()

5. 筛选数据

dict_urls = json_data_2['data']['Component_Play_Playinfo']['urls']
video_url = "https:" + dict_urls[list(dict_urls.keys())[0]]
print(title + "\t" + video_url)

6. 保存数据

video_data = requests.get(video_url).content
with open(f'video\\{title}.mp4', mode='wb') as f:
    f.write(video_data)
print(title, "爬取成功................")

完整代码

import requests
import pprint

headers = {
    'cookie': '添加自己的',
    'referer': 'https://weibo.com/tv/channel/4379160563414111/editor',
    'user-agent': '',
}
data = {
    'data': '{"Component_Channel_Editor":{"cid":"4379160563414111","count":9}}'
}
url = 'https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor'
json_data = requests.post(url=url, headers=headers, data=data).json()
print(json_data)

ccs_list = json_data['data']['Component_Channel_Editor']['list']
next_cursor = json_data['data']['Component_Channel_Editor']['next_cursor']
for ccs in ccs_list:
    oid = ccs['oid']
    title = ccs['title']
    data_1 = {
        'data': '{"Component_Play_Playinfo":{"oid":"' + oid + '"}}'
    }
    url_1 = 'https://weibo.com/tv/api/component?page=/tv/show/' + oid
    json_data_2 = requests.post(url=url_1, headers=headers, data=data_1).json()
    dict_urls = json_data_2['data']['Component_Play_Playinfo']['urls']
    video_url = "https:" + dict_urls[list(dict_urls.keys())[0]]
    print(title + "\t" + video_url)

    video_data = requests.get(video_url).content
    with open(f'video\\{title}.mp4', mode='wb') as f:
        f.write(video_data)
    print(title, "爬取成功................")

这篇关于没想到上面好看的跳舞小姐姐蛮多的，【Python爬虫】采集微博视频数据的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

Python教程

没想到上面好看的跳舞小姐姐蛮多的，【Python爬虫】采集微博视频数据

前言

对于本篇文章有疑问的同学可以加【资料白嫖、解答交流群：910981974】

知识点

开发环境

爬虫原理

案例实现

1. 导入所需模块

2. 找到目标网址

3. 发送网络请求

4. 获取数据

5. 筛选数据

6. 保存数据

完整代码

前端开发

后端开发

移动端开发

数据库

服务器运维

人工智能

区块链

游戏开发

网站运营

大数据/云计算

软件工程

软件/开发工具使用

资讯