Java教程

爬取西安雁塔区租房信息

本文主要是介绍爬取西安雁塔区租房信息,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
'''
需求,爬取安居客西安区雁塔租房信息(平台为安租客)
需要爬取的信息有:标题、地址、价格、房东名称、面积、付款方式、房东头像链接
'''
import select

import requests
from bs4 import BeautifulSoup
import time
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'
}
#获取详情页URL的函数
def get_links(url):
    res = requests.get(url,headers=headers)
    soup = BeautifulSoup(res.text,'lxml')
    links = soup.select('div> div.zu-info > h3 > a')
    for link in links:
        href = link.get('href')
        get_info(href)

#获取网页信息的函数
def get_info(url):
    res = requests.get(url,headers=headers)
    soup = BeautifulSoup(res.text,'lxml')
    titles = soup.select('div.wrapper > h1 > div')  #标题
    addresses = soup.select('div.lbox > ul.house-info-zufang.cf > li> a')   #地址
    prices = soup.select('span.price > em > b') #价格
    names = soup.select('div.broker-card > div > h2')   #房东名称
    areas = soup.select('span.info-tag.no-line > em > b')   #面积
    types = soup.select('li.full-line.cf > span.type')  #付款类型
    imgs = soup.select('div.broker-card > div > img')#房东头像链接

    for title, address, price, name, area,typ,img in zip(titles, addresses, prices, names,areas,types, imgs):
        data = {
            '标题' : title.get_text(),
            '地址' : address.get_text(),
            '面积' : area.get_text(),
            '价格' : price.get_text(),
            '付款类型' : typ.get_text().strip(),
            '房东名称' : name.get_text().strip(),
            '房东头像' : img.get('src')

        }
        print(data)
if __name__ == '__main__':
    urls = ['https://xa.zu.anjuke.com/fangyuan/yantaqu/p{}'.format(str(num)) for num in range(1,2)]
    for url in urls:
        get_links(url)
        time.sleep(2)
这篇关于爬取西安雁塔区租房信息的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!