本文主要是介绍某度图片抓取(代码),对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
import requests
from urllib.parse import quote
import jsonpath
import json
url = r'https://image.baidu.com/search/acjson'
headers = {
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11',
}
if __name__ == '__main__':
save_dir = r'imgs' # 需要自己创建
word=input('请输入需要查询的关键词:')
page=input('请输入需要查询的页数 (默认每页30张图片):')
word = quote(word)
k=0
for i in range(1,int(page)+1):
print(i)
pn = int(page) * 30
pre = r'https://image.baidu.com/search/acjson?'
back = f'tn=resultjson_com&logid=6505551048133465805&ipn=rj&ct=201326592&is=&fp=result&fr=&word={word}&queryWord={word}&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=©right=&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&expermode=&nojc=&isAsync=&pn={pn}&rn=30&gsm=78&1641108482235='
main_url = pre+back
print(main_url)
resp = requests.get(main_url, headers=headers)
# print(resp.text)
js_p = resp.json()
ervery_page_urls=jsonpath.jsonpath(js_p,'$..thumbURL')
for img_src in ervery_page_urls:
print(img_src)
img_resp = requests.get(img_src, headers=headers)
try:
with open(save_dir+ f'/{k}.jpg', mode='wb') as f:
f.write(img_resp.content)
print(f'已经下载了{k}张!!!!正在下载第{i}页的内容!!')
k+=1
except Exception as e:
print(f'第{k}张下载失败!!!!')
pass
这篇关于某度图片抓取(代码)的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!