* 网页 / JSON转换器 (English Version)
基于 cl-spider,源码在 这里,你可能会想看一下这篇文章:《爬虫: 零编程将网页转换为JSON》。
******** get ********
- 根据提供的 uri 获取网页 DOM 树,根据所提供的 selector 获取相应节点,然后再获取对应的属性,最后转化为 JSON。
调用地址: http://localhost:5000/get
参数:
例:
http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text","href"]
http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text as title","href as uri"]
http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text as title","href as uri"]&callback=myfunction
http://localhost:5000/get?uri=https://news.ycombinator.com/user&selector=table[id=hnmain]>tr:nth-child(3)>td>table>tr&attrs=["text"]¶ms=["id as VitoVan"]
******** get-block ********
- 根据提供的 uri 获取网页 DOM 树,根据所提供的 selector 获取相应节点,再获取其中的子节点,然后再获取对应的属性,最后转化为 JSON。
调用地址: http://localhost:5000/get-block
参数:
例:
http://localhost:5000/get-block?uri=https://news.ycombinator.com/&selector=tr.athing&desires=[{"selector":"span.rank","attrs":["text as rank"]},{"selector":"td.title>a","attrs":["href as uri","text as title"]},{"selector":"span.sitebit.comhead","attrs":["text as site"]}]
http://localhost:5000/get-block?uri=https://news.ycombinator.com/&selector=tr.athing&desires=[{"selector":"span.rank","attrs":["text as rank"]},{"selector":"td.title>a","attrs":["href as uri","text as title"]},{"selector":"span.sitebit.comhead","attrs":["text as site"]}]&callback=myfunction
* 每个请求页面有两分钟的缓存
* localhost:5000 是测试服务器,无法保证其稳定性,请勿在生产中使用。