Commit Graph

50 Commits (rrix)

Author SHA1 Message Date
Ryan 3d7a52338d remove the scrapers 2020-05-01 13:11:32 -07:00
Simon Lipp f077c13882 scraper-yggtorrent: bugfix 2017-08-12 19:30:11 +02:00
Simon Lipp 2d296bb526 new scraper: bookys 2017-08-10 10:36:34 +02:00
Simon Lipp b615704d54 scraper-yggtorrent: fix title parsing 2017-08-10 10:33:02 +02:00
Simon Lipp 0efc90c9f9 scraper-yggtorrent: typo 2017-07-27 15:55:23 +02:00
Simon Lipp 28054b6bfd Scrapers overhaul
* Switch all python scrapers to scrapy
 * Allow scrapers to be directly called, instead of
   using `scrapy runspider`
 * Prefix scapers with `ua-scraper-` for clarity
 * Update documentation
2017-07-26 12:08:32 +02:00
Simon Lipp 874c844f78 remove unmaintained scrapers
ipboard2json can be replaced by weboobmsg2json + weboob tapatalk module
2017-07-26 09:39:19 +02:00
Simon Lipp df56355fe1 new scraper for yggtorrent.com 2017-07-26 09:36:06 +02:00
Simon Lipp 913f961c42 new scraper for torrent9 2017-07-26 09:10:18 +02:00
Simon Lipp b0b5ea7a9a python3 support for some scrapers 2017-07-26 09:10:18 +02:00
Simon Lipp 4b20c6d92e remove torrent9 scraper (not working anymore) 2017-07-26 09:10:18 +02:00
Simon Lipp 325dab92c6 bugfix 2017-02-24 08:37:47 +01:00
Simon Lipp e4168d7b4f ua: update doc 2017-02-23 14:02:10 +01:00
Simon Lipp 1f7d3ab3a1 ua-proxify: add doc 2017-02-23 14:01:39 +01:00
Simon Lipp 84b48c6205 ggs: add support for -once 2017-02-23 14:00:24 +01:00
Simon Lipp e658b62bab ggs: better logging & timeout handling 2017-02-23 13:57:47 +01:00
Simon Lipp e0c1edec6d maildir-put: change redis cache storage format 2017-01-11 16:11:23 +01:00
Simon Lipp 5d517ae1ff edxcourses: add session to id 2017-01-11 16:10:48 +01:00
Simon Lipp 25a2bd713f scraper for torrent9 2016-12-26 19:17:06 +01:00
Simon Lipp 80cbd4a9cc scraper for bm-lyon 2016-12-26 19:08:20 +01:00
Simon Lipp c1c70412fc new scraper for t411.li 2016-12-26 19:05:55 +01:00
Simon Lipp 4537e967e9 new scrapper for myanimelist.net 2016-12-26 18:35:57 +01:00
Simon Lipp cb327a5dbc ggs: use jq for json generation in configuration file 2016-04-04 09:27:59 +02:00
Simon Lipp 4e54d30fa2 ggs: reload configuration when receiving SIGUSR1 2016-04-01 14:55:54 +02:00
Simon Lipp 2941ef15df ggs: get rid of global variables 2016-03-29 10:35:10 +02:00
Simon Lipp 83e1c1c308 update .gitignore 2016-03-23 13:56:04 +01:00
Simon Lipp 83504e2944 edxcourses: add date 2016-03-23 13:55:02 +01:00
Simon Lipp 896ce9b657 new filter: ua-proxify 2016-03-23 13:54:40 +01:00
Simon Lipp 2ee32d8d81 ua-inline: use attachment for images 2016-03-21 11:56:25 +01:00
Simon Lipp be51bd77f6 edx-courses scrapper: fixes 2016-02-26 10:20:45 +01:00
Simon Lipp f1d31d9505 add scrapper edxcourses 2016-02-25 15:42:01 +01:00
Simon Lipp 62d67fc0ae update doc 2016-02-12 09:10:13 +01:00
Simon Lipp 49b68e3b7c maildir-put: add redis support for messages cache 2016-02-12 09:06:59 +01:00
Simon Lipp de86e77838 Use default $GOPATH 2016-02-11 17:12:57 +01:00
Simon Lipp 99e4299b18 gofmt 2016-02-11 14:40:33 +01:00
Simon Lipp 36f0c8392e New scrapper: weboobmsg2json
Use [weboob](http://weboob.org) backends to get messages
2016-02-10 10:47:26 +01:00
Simon Lipp 52176a4b1d ipboard2json: shorten message ids
Since hostname is already present is right part, it’s useless to also
put it in the left part of the ID.
2016-02-09 14:05:45 +01:00
Simon Lipp 7e49bf3b76 maildir-put: better msg-id encoding
maildir-put currently uses a very crude scheme to generate
RFC2822-compliant message ids from ids provided by scrappers: it just
sha256-encode the ID.

Since this is not exactly optimal for debugging purposes, change this by
properly encoding message ids according to RFC2822.
2016-02-09 11:12:10 +01:00
Simon Lipp 7c35a852eb ipboard scrapper: add type=cite attribute to blockquote tags
This allow certain UA (Thunderbird) to present the tag as a citation
2016-02-03 10:24:13 +01:00
Simon Lipp ac15762448 scraplib.py: close response one redirection 2016-02-03 10:23:08 +01:00
Simon Lipp 4b20fe9c64 ua-inline: inline <style> tags 2016-02-03 10:22:19 +01:00
Simon Lipp 32b584e7fe maildir-put: use LF line endings 2016-02-03 10:21:40 +01:00
Simon Lipp 661ca5e4e5 Fix compilation issue 2016-02-03 10:15:11 +01:00
Simon Lipp d37c513fb6 update Makefile 2014-03-22 16:39:13 +01:00
Simon Lipp 4b1d4af8b2 new scrapper: medscape2json 2014-03-22 16:38:06 +01:00
Simon Lipp f4355b8b1a scraplib.py: add a non persistant cookiejar 2014-03-22 15:49:11 +01:00
Simon Lipp e03c90d635 scraplib.py: simplify cookie managment 2014-03-22 15:48:46 +01:00
Simon Lipp 7913eb9d77 add usage for ipboard2json and mangareader2json 2014-03-22 11:34:28 +01:00
Simon Lipp 00be42bdd9 Fix a compilation problem when GOPATH is not set 2014-03-18 16:56:58 +01:00
Simon Lipp 78b8a6a447 initial import 2014-03-18 15:01:19 +01:00