Python打印scrapy蜘蛛抓取树结构的方法

yipeiwu_com6年前 (2020-03-06)Python爬虫

本文实例讲述了Python打印scrapy蜘蛛抓取树结构的方法。分享给大家供大家参考。具体如下：

通过下面这段代码可以一目了然的知道scrapy的抓取页面结构，调用也非常简单

#!/usr/bin/env python
import fileinput, re
from collections import defaultdict
def print_urls(allurls, referer, indent=0):
  urls = allurls[referer]
  for url in urls:
    print ' '*indent + referer
    if url in allurls:
      print_urls(allurls, url, indent+2)
def main():
  log_re = re.compile(r'<GET (.*?)> \(referer: (.*?)\)')
  allurls = defaultdict(list)
  for l in fileinput.input():
    m = log_re.search(l)
    if m:
      url, ref = m.groups()
      allurls[ref] += [url]
  print_urls(allurls, 'None')
main()

希望本文所述对大家的Python程序设计有所帮助。

返回列表

上一篇：Python字符转换

下一篇：PHP生成静态页面详解

Python爬虫爬取美剧网站的实现代码

一直有爱看美剧的习惯，一方面锻炼一下英语听力，一方面打发一下时间。之前是能在视频网站上面在线看的，可是自从广电总局的限制令之后，进口的美剧英剧等貌似就不在像以前一样同步更新了。但是，作为...

Python3爬虫爬取英雄联盟高清桌面壁纸功能示例【基于Scrapy框架】

本文实例讲述了Python3爬虫爬取英雄联盟高清桌面壁纸功能。分享给大家供大家参考，具体如下：使用Scrapy爬虫抓取英雄联盟高清桌面壁纸源码地址：https://github.co...

python爬取网页转换为PDF文件

爬虫的起因官方文档或手册虽然可以查阅，但是如果变成纸质版的岂不是更容易翻阅与记忆。如果简单的复制粘贴，不知道何时能够完成。于是便开始想着将Android的官方手册爬下来。全篇的实...

python使用自定义user-agent抓取网页的方法

本文实例讲述了python使用自定义user-agent抓取网页的方法。分享给大家供大家参考。具体如下：下面python代码通过urllib2抓取指定的url的内容，并且使用自定义的u...

python爬虫之快速对js内容进行破解

前言一般js破解有两种方法，一种是用Python重写js逻辑，一种是利用第三方库来调用js内容获取结果。这两种方法各有利弊，第一种方法性能好，但对js和Python要求掌握比较高；第二...

宜配屋

Python打印scrapy蜘蛛抓取树结构的方法

相关文章

Python爬虫爬取美剧网站的实现代码

Python3爬虫爬取英雄联盟高清桌面壁纸功能示例【基于Scrapy框架】

python爬取网页转换为PDF文件

python使用自定义user-agent抓取网页的方法

python爬虫之快速对js内容进行破解

© YiPeiWu.com 【宜配屋】粤ICP备17031333号

Powered By Z-BlogPHP. Theme by TOYEAN.

宜配屋

Python打印scrapy蜘蛛抓取树结构的方法

相关文章

Python爬虫爬取美剧网站的实现代码

Python3爬虫爬取英雄联盟高清桌面壁纸功能示例【基于Scrapy框架】

python爬取网页转换为PDF文件

python使用自定义user-agent抓取网页的方法

python爬虫之快速对js内容进行破解

© YiPeiWu.com 【宜配屋】 粤ICP备17031333号 var _hmt = _hmt || [];(function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?8aa60ae04b767b2af31903508928acc0"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s);})();

Powered By Z-BlogPHP. Theme by TOYEAN.

© YiPeiWu.com 【宜配屋】粤ICP备17031333号