按照老师的代码输出后是这样的,之前也出现过好多次,具体是什么原因呢?
来源:7-4 实战—自定义文本处理类
破邪返瞳
2022-07-09 22:00:44
import requests # //div[@class='e']/a[@class='el']/p[@class='t']/span/@title url = "https://search.51job.com/list/000000,000000,0000,00,9,99,python,2,1.html" header = { "Accept": "text / html, application / xhtml + xml, application / xml;q = 0.9, image / avif, image / webp, image / apng, * / *;q = 0.8, application / signed - exchange;v = b3;q = 0.9", "Accept - Encoding": "gzip, deflate, br", "Accept - Language": "zh - CN, zh;q = 0.9, en;q = 0.8Cache - Control: max - age = 0", "Connection": "keep - alive", "Cookie": "_uab_collina=165724966571579439025188;guid=e43c9ae66458b2fbcf09698769fca338;partner=class_imooc_com;acw_tc=ac11000116572811682083828e00df52ec398a53440c85411c1a91d34a1c2f;acw_sc__v2=62c81b6a896474647bf585785c630e7a2ac1bc4f; ssxmod_itna=YqUxBQG=KmqDqwxl4iqYKE=xfhQQFDu0me8io=BG8x0vcIeGzDAxn40iDtoOTOBYpwxkC7GiL+hR0icdwt827wxEh4c3SnpqmDB3DEx06i1=YxiiSDCeDIDWeDiDGR8DXO50OD7qiOD7otDj4GS9qGcDYQ2+u4DCOD51GtDI4GMDqDuDGt3EorDYLNDmdtDYfNDjqQDKLX3oeD2mhWMDYPSCDDlCAH502xTsuFkqlb=MlTrdPnrxdab66pMnuXBDYo27crSAIX/j=P38ODW4Sqo9BpijAx3CYme7re5b0zdADpr9AAenB+Qe0GBDDDpPCDK4bYeD;ssxmod_itna2=YqUxBQG=KmqDqwxl4iqYKE=xfhQQFDu0me8io=BixnKLPa4PDsFkqDLBG5X2QvGDUC+9e32QD6im4Ui3o82oWCoBrc+nvdLtpkFXeyZCva8COeKvxXbyC=EWLkpR/j991N5fdOCXyMXLKdNuAfEMBoXCAkAPv85EbWdpmbR31nNE/ot102Kd0WUxIobCYAnNhnExa=2ylq2WOeuI8mX=ba9th=a/nCaDTpR1mpiVA1vmalj7WaqdYa3UlLuf=K5LTpS6V1ICbR96Ps5UdHBRZzSzdvljuhIR=jC9fz/jPvXBm180U/f/rEEgUmWX8m=cp=V66gj6P9UapmneNlPrdQKfmeqco=fI0aPpAXKRrX2pm0tPS2z2r8rbBjUhBiNR538=FDG2C0QD08DiQ1SDPAh8iDdYD===", "Host": "jobs.51job.com", "Referer": "https: // jobs.51job.com / nanjing / 133201744.html", "Sec - Fetch - Dest": "document", "Sec - Fetch - Mode": "navigate", "Sec - Fetch - Site": "same - origin", "Sec - Fetch - User": "?1", "Upgrade - Insecure - Requests": "1", "User - Agent": "Mozilla / 5.0(WindowsNT10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 103.0.0.0Safari / 537.36sec - ch - ua: '.Not/A)Brand';v = '99', 'Google Chrome';v = '103', 'Chromium';v = '103'", "sec - ch - ua - mobile": "?0", "sec - ch - ua - platform": "Windows", } response = requests.get(url=url, headers=header) response.encoding='ascii' print(response.text)
1回答
好帮手慕凡
2022-07-10
同学,你好!以上原因是遇到了一些反爬措施,同学需要先在网页上确保能够访问到网页,再将请求头复制下来。
同学可以直接访问网址:https://search.51job.com/list/000000,000000,0000,00,9,99,python,2,1.html,将需要的一些请求头信息复制下来,如下图:可以将response.encoding改为utf-8,确保不乱码
参考代码:
url = "https://search.51job.com/list/000000,000000,0000,00,9,99,python,2,1.html" header = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", # "Accept-Encoding": "gzip,deflate,r", # "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8Cache-Control:max-age=0", "Connection": "keep-alive", "Cookie": "_uab_collina=165724966571579439025188;guid=e43c9ae66458b2fbcf09698769fca338;partner=class_imooc_com;acw_tc=ac11000116572811682083828e00df52ec398a53440c85411c1a91d34a1c2f;acw_sc__v2=62c81b6a896474647bf585785c630e7a2ac1bc4f; ssxmod_itna=YqUxBQG=KmqDqwxl4iqYKE=xfhQQFDu0me8io=BG8x0vcIeGzDAxn40iDtoOTOBYpwxkC7GiL+hR0icdwt827wxEh4c3SnpqmDB3DEx06i1=YxiiSDCeDIDWeDiDGR8DXO50OD7qiOD7otDj4GS9qGcDYQ2+u4DCOD51GtDI4GMDqDuDGt3EorDYLNDmdtDYfNDjqQDKLX3oeD2mhWMDYPSCDDlCAH502xTsuFkqlb=MlTrdPnrxdab66pMnuXBDYo27crSAIX/j=P38ODW4Sqo9BpijAx3CYme7re5b0zdADpr9AAenB+Qe0GBDDDpPCDK4bYeD;ssxmod_itna2=YqUxBQG=KmqDqwxl4iqYKE=xfhQQFDu0me8io=BixnKLPa4PDsFkqDLBG5X2QvGDUC+9e32QD6im4Ui3o82oWCoBrc+nvdLtpkFXeyZCva8COeKvxXbyC=EWLkpR/j991N5fdOCXyMXLKdNuAfEMBoXCAkAPv85EbWdpmbR31nNE/ot102Kd0WUxIobCYAnNhnExa=2ylq2WOeuI8mX=ba9th=a/nCaDTpR1mpiVA1vmalj7WaqdYa3UlLuf=K5LTpS6V1ICbR96Ps5UdHBRZzSzdvljuhIR=jC9fz/jPvXBm180U/f/rEEgUmWX8m=cp=V66gj6P9UapmneNlPrdQKfmeqco=fI0aPpAXKRrX2pm0tPS2z2r8rbBjUhBiNR538=FDG2C0QD08DiQ1SDPAh8iDdYD===", "Host": "search.51job.com", # "Referer": "https://jobs.51job.com/nanjing/133201744.html", "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "same-origin", "Sec-Fetch-User": "?1", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3610.2 Safari/537.36", } response = requests.get(url=url, headers=header) response.encoding = 'utf-8' print(response.text)
祝学习愉快~
相似问题