Such-Cute

* Translate Web Page to JSON (中文版)

Based on cl-spider, source is here, and you may like to read this article: Web Spider: Translate Web Page to JSON, Without Coding.


Usage:

******** get ********

- Get html dom from uri, find the nodes by selector, grab the data of attributes you want, convert them to JSON.

Uri: http://localhost:5000/get

Params:

Example:

http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text","href"]

http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text as title","href as uri"]

http://localhost:5000/get?uri=https://news.ycombinator.com/&selector=a&attrs=["text as title","href as uri"]&callback=myfunction

http://localhost:5000/get?uri=https://news.ycombinator.com/user&selector=table[id=hnmain]>tr:nth-child(3)>td>table>tr&attrs=["text"]&params=["id as VitoVan"]

******** get-block ********

- Get html dom from uri, find the block nodes by selector, then find the sub nodes by your desires, and grab the data of their attributes as you want, convert them to JSON.

Uri: http://localhost:5000/get-block

Params:

Example:

http://localhost:5000/get-block?uri=https://news.ycombinator.com/&selector=tr.athing&desires=[{"selector":"span.rank","attrs":["text as rank"]},{"selector":"td.title>a","attrs":["href as uri","text as title"]},{"selector":"span.sitebit.comhead","attrs":["text as site"]}]

http://localhost:5000/get-block?uri=https://news.ycombinator.com/&selector=tr.athing&desires=[{"selector":"span.rank","attrs":["text as rank"]},{"selector":"td.title>a","attrs":["href as uri","text as title"]},{"selector":"span.sitebit.comhead","attrs":["text as site"]}]&callback=myfunction


* 2 minutes cache on each uri

* localhost:5000 is just a demo for testing usage, please set up your own server in production