代码如下:
1 from lxml import etree 2 wb_data = """ 3 <html><div> 4 <ul> 5 <li class="item-0"><a href="link1.html">first item</a></li> 6 <li class="item-1"><a href="link2.html">second item</a></li> 7 <li class="item-inactive"><a href="link3.html">third item</a></li> 8 <li class="item-1"><a href="link4.html">fourth item</a></li> 9 <li class="item-0"><a href="link5.html">fifth item</a></li> 10 </ul> 11 </div> 12 </html> 13 """ 14 html = etree.HTML(wb_data) 15 html_data = html.xpath('/html/body/div/ul/li/a') 16 for i in html_data: 17 print(i.text)
结果如下: