Such-Cute

* 网页 / JSON转换器 (English Version)

基于 cl-spider，源码在这里，你可能会想看一下这篇文章：《爬虫: 零编程将网页转换为JSON》。

用法：

******** get ********

- 根据提供的 uri 获取网页 DOM 树，根据所提供的 selector 获取相应节点，然后再获取对应的属性，最后转化为 JSON。

调用地址： http://localhost:5000/get

参数：

uri --- 页面地址。例：https://news.ycombinator.com/
selector --- css 选择器。例：a or tr.athing，更多参见：CSS 选择器参考手册
attrs --- 节点属性，应当为 JSON 格式的列表，属性可设置别名。例：["text","href"] 或 ["text as title","href as uri"]
params --- 请求参数，JSON列表。例： ["name as vito","sex as alien"]
callback --- JSONP 回调函数。

例：

http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text","href"]

http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text as title","href as uri"]

http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text as title","href as uri"]&callback=myfunction

http://localhost:5000/get?uri=https://news.ycombinator.com/user&selector=table[id=hnmain]>tr:nth-child(3)>td>table>tr&attrs=["text"]&params=["id as VitoVan"]

******** get-block ********

- 根据提供的 uri 获取网页 DOM 树，根据所提供的 selector 获取相应节点，再获取其中的子节点，然后再获取对应的属性，最后转化为 JSON。

调用地址： http://localhost:5000/get-block

参数：

uri --- 页面地址。例：https://news.ycombinator.com/
selector --- css 选择器。例：a or tr.athing，更多参见：CSS 选择器参考手册
desires --- 字节点以及其对应的属性，一个 desire 由一个 css 选择器和一个或多个属性组成，应当为 JSON 格式的列表。例： [{"selector":"td.title>a","attrs":["href","text"]}] 或
[{"selector":"span.rank","attrs":["text as rank"]},{"selector":"td.title>a","attrs":["href as uri","text as title"]},{"selector":"span.sitebit.comhead","attrs":["text as site"]}]
callback --- JSONP 回调函数。

例：

http://localhost:5000/get-block?uri=https://news.ycombinator.com/&selector=tr.athing&desires=[{"selector":"span.rank","attrs":["text as rank"]},{"selector":"td.title>a","attrs":["href as uri","text as title"]},{"selector":"span.sitebit.comhead","attrs":["text as site"]}]

* 每个请求页面有两分钟的缓存

* localhost:5000 是测试服务器，无法保证其稳定性，请勿在生产中使用。