Such-Cute

* 网页 / JSON转换器 (English Version)

基于 cl-spider,源码在 这里,你可能会想看一下这篇文章:《爬虫: 零编程将网页转换为JSON》


用法:

******** get ********

- 根据提供的 uri 获取网页 DOM 树,根据所提供的 selector 获取相应节点,然后再获取对应的属性,最后转化为 JSON。

调用地址: http://localhost:5000/get

参数:

例:

http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text","href"]

http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text as title","href as uri"]

http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text as title","href as uri"]&callback=myfunction

http://localhost:5000/get?uri=https://news.ycombinator.com/user&selector=table[id=hnmain]>tr:nth-child(3)>td>table>tr&attrs=["text"]&params=["id as VitoVan"]

******** get-block ********

- 根据提供的 uri 获取网页 DOM 树,根据所提供的 selector 获取相应节点,再获取其中的子节点,然后再获取对应的属性,最后转化为 JSON。

调用地址: http://localhost:5000/get-block

参数:

例:

http://localhost:5000/get-block?uri=https://news.ycombinator.com/&selector=tr.athing&desires=[{"selector":"span.rank","attrs":["text as rank"]},{"selector":"td.title>a","attrs":["href as uri","text as title"]},{"selector":"span.sitebit.comhead","attrs":["text as site"]}]

http://localhost:5000/get-block?uri=https://news.ycombinator.com/&selector=tr.athing&desires=[{"selector":"span.rank","attrs":["text as rank"]},{"selector":"td.title>a","attrs":["href as uri","text as title"]},{"selector":"span.sitebit.comhead","attrs":["text as site"]}]&callback=myfunction


* 每个请求页面有两分钟的缓存

* localhost:5000 是测试服务器,无法保证其稳定性,请勿在生产中使用。