宜配屋

1.背景

项目需求，要求获得github的repo的api，以便可以提取repo的数据进行分析。研究了一天，终于解决了这个问题，虽然效率还是比较低下。

因为github的那个显示repo的api，列出了每个repo的详细信息，而且是json格式的。现在貌似还没有找到可以分析多个json格式数据的方法，所以用的是比较蠢得splite加re的方法。如果大家有更好的方法，不发留言讨论！

2.代码

import re
import os

def GetUrl(num):
 str = os.popen("curl -G https://api.github.com/repositories?since=%d"%(num)).read()
 pattern = '"url"'
 pattern1='repos'
 urls=str.split(',\n')  
 for i in urls:
  if pattern in i and pattern1 in i:   
#  text1=i.splite(':')
  text=re.compile('"(.*?)"').findall(i)[1]
  print text
if __name__=='__main__':
 GetUrl(1000)

其中num的值指的是页面的id，我们可以做一个循环，不断增大num的值，就可以无限提取repo。因为github的api对于流量是有限制的，所以这么做是一个可行的方法。

效果如下（提取下来的repo的api地址）：

https://api.github.com/repos/wycats/merb-core

https://api.github.com/repos/rubinius/rubinius

https://api.github.com/repos/mojombo/god

https://api.github.com/repos/vanpelt/jsawesome

https://api.github.com/repos/wycats/jspec

https://api.github.com/repos/defunkt/exception_logger

https://api.github.com/repos/defunkt/ambition

https://api.github.com/repos/technoweenie/restful-authentication

https://api.github.com/repos/technoweenie/attachment_fu

https://api.github.com/repos/topfunky/bong