您当前的位置：首页 > 默默爬行的虫虫

XPath总结笔记

默默爬行的虫虫发布时间：2022-07-05 00:22:16 ，浏览量：5

接下来我们来讲提取细节，首先加载库

from lxml import etree

提取标签内容， /text()[0]或者/string()[0]
提取标签属性值， /@属性名
*表示任意节点
,@*表示任何属性,
node()表示任意节点

1. 解析html流程说明

url_02 = 'https://www.qdfd.com.cn/qdweb/realweb/fh/FhProjectInfo.jsp'
data_02 = {
   'projectID': shuzi_01}
response_02 = requests.post(url_02, data=data_02,headers=header)
if response.status_code == 200:
	response_02.encoding = 'GBK'
	sleep(random.uniform(0.2, 0.3))  # 生成一个a到b的小数等待时间
	# 请求是否成功
	# print(response_02.status_code)
	
	html_02 = etree.HTML(response_02.text)
	
	
	# #/html/body/div[1]/div[2]/ul[2]/table[2]/tbody/tr[position()>1]/td[2]/a
	shuzi_2 = html_02.xpath('/html/body/div[1]/div[2]/ul[2]//tr[position()>1]/td[2]/a')

a = '''标题

    
        列表1第1项
        列表1第2项
    
    文字1
    文字2
    
        列表2第1项
        列表2第2项
    
'''

from lxml import etree
html = etree.HTML(a)
html.xpath('//title/text()')[0] # '标题'
html.xpath("//p[@class='first']//text()")[0] # '文字1'
html.xpath(


    
        
            
        
        
            
                
                
                    默默爬行的虫虫
                    暂无认证
                
            
            
                
                    
                        5浏览
                        0关注
                        55博文
                        0收益
                    

                    
                        0浏览
                        0点赞
                        0打赏
                        0留言
                    
                
            
            
                私信
                关注
            

        
        
            热门博文
            
                解决UnicodeEncodeError: ‘gbk‘ codec can‘t encode character ‘\ufeff‘ in position 0？？？
python中大批量m3u8文件合并总结
爬虫m3u8标签与属性总结
python中时间模块datetime总结
Python网络爬虫第一课----网络爬虫之数据解析方式大全
Python网络爬虫第三课----数据解析
Python网络爬虫第二课----requests模块应用
Python网络爬虫第一课----Python网络爬虫相关基础概念
win10中jdk安装详细安装过程
爬虫请求头加引号







    [ 申请 ]友情链接：
    
        
        优质稳定机场推荐

        绘画宝宝
        配音宝宝
    


    
        
            关于我们
            服务条款
            广告服务
            联系我们
            网站地图
            免责声明
            WAP
        
        技术支持：
            武汉快勤科技有限公司
            XML网站地图 
            备案号：鄂ICP备18027844号-9
            
        
    




    
        立即登录/注册
        
    
    
        
        微信扫码登录