Commit graph

1186 commits

Author SHA1 Message Date
mbrandis 25cf4888ae - Adapted ShermansLagoon
- Better version of OnTheFastTrack
2014-11-14 20:37:06 +01:00
mbrandis c63f927e5c - Modified OnTheFasttrack adapting the new API. 2014-11-14 20:09:42 +01:00
mbrandis cd48801b0d - Added next and previous day at end of page. 2014-11-14 15:39:42 +01:00
Dirk Reiners fda654b5e0 Some fixes...
AbstruseGoose: fixed prev
Carciphona: fixed latest
Curtailed: fixed image and prev (moved to WP)
DorkTower: fixed image search
GrrlPower: fixed site name issue
MadamAndEve: archive not updated in a long time, but current strip is.
Works, but needs to be run daily.
PennyArcade: fixed namer
PvPonline: fixed prev
2014-10-24 16:42:32 -05:00
Dirk Reiners 77a5e09c10 Minor fix for using pathes to pick comics 2014-10-24 16:39:40 -05:00
Tobias Gruetzmacher 6769e1eb36 Add StrongFemaleProtagonist.
This uses the _ParserScraper and CSS selectors.
2014-10-13 23:39:50 +02:00
Tobias Gruetzmacher 1d52d6a152 Add support for CSS selectors to HTML parser.
Each comic module author can decide if she wants to use CSS or XPath,
not a mix of both. Using CSS needs the cssselect python module and the
module gets disabled if it is unavailable.
2014-10-13 22:43:06 +02:00
Tobias Gruetzmacher 17bc454132 Bugfix: Don't assume RE patterns in base class. 2014-10-13 22:29:47 +02:00
Tobias Gruetzmacher e92a3fb3a1 New feature: Comic modules ca be "disabled".
This is modeled parallel to the "adult" feature, except the user can't
override it via the command line. Each comic module can override the
classmethod getDisabledReasons and give the user a reason why this
module is disabled. The user can see the reason in the comic list (-l or
--singlelist) and the comic module refuses to run, showing the same
message.

This is currently used to disable modules that use the _ParserScraper if
the LXML python module is missing.
2014-10-13 21:43:46 +02:00
Tobias Gruetzmacher d495d95ee0 Refactor: Move repeated check into its own function. 2014-10-13 21:29:54 +02:00
Tobias Gruetzmacher 3235b8b312 Pass unicode strings to lxml.
This reverts commit fcde86e9c0 & some
more. This lets python-requests do all the encoding stuff and leaves
LXML with (hopefully) clean unicode HTML to parse.
2014-10-13 19:39:48 +02:00
zac9 6ca200419a Update s.py 2014-09-28 19:48:26 -07:00
zac9 5b7ab5a711 Update o.py 2014-09-28 19:41:29 -07:00
zac9 491b5457b2 Added comic ShotgunShuffle 2014-09-28 06:29:02 -07:00
Bastian Kleineidam 731291979d Fixed RedMeat. 2014-09-22 22:14:31 +02:00
Bastian Kleineidam e43694c156 Don't crash on multiple HTML output runs per day. 2014-09-22 22:00:16 +02:00
Bastian Kleineidam bed49c19ad Bump up version. 2014-09-22 21:59:26 +02:00
Bastian Kleineidam 2e5114c2ec Updated votes
[ci skip]
2014-09-10 02:04:30 +02:00
Bastian Kleineidam e86586226c Updated votes
[ci skip]
2014-08-20 01:49:22 +02:00
Bastian Kleineidam e87f5993b8 Merge branch 'master' into htmlparser 2014-08-07 18:10:15 +02:00
Bastian Kleineidam f76006d89d Merge branch 'master' of github.com:wummel/dosage 2014-08-06 20:01:46 +02:00
Bastian Kleineidam b9f7fb23e7 Updated votes
[ci skip]
2014-08-06 01:56:37 +02:00
Tobias Gruetzmacher 08175d28c9 Fix Ruthe (see #73). 2014-07-31 21:27:49 +02:00
Tobias Gruetzmacher ca2d722d39 Fix DieFruehreifen (closes #73). 2014-07-31 21:18:15 +02:00
Tobias Gruetzmacher 6c7fb176b1 Add Blade Kitten as an example for the new parser. 2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher f9f0b75d7c Create new HTML parser based scraper class. 2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher fcde86e9c0 Change getPageContent to (optionally) return raw text.
This allows LXML to do its own "magic" encoding detection
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher 0e03eca8f0 Move all regular expression operation into the new class.
- Move fetchUrls, fetchUrl and fetchText.
- Move base URL handling.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher fde1fdced6 Fix some typos. 2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher 2567bd4e57 Convert starters and other helpers to new interface.
This allows those starters to work with future scrapers.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher 4265053846 Refactor: Move regualar expression scraping into a new class.
- This also makes "<base href>" handling an internal detail of the regular
  expression scraper, future scrapers might not need that or handle it in
  another way.
2014-07-26 11:28:43 +02:00
Bastian Kleineidam 3a929ceea6 Allow comic text to be optional. Patch from TobiX 2014-07-24 20:49:57 +02:00
Bastian Kleineidam 950dd2932c Remove stray print statement. 2014-07-21 20:20:15 +02:00
Bastian Kleineidam bc6279f2ab Merge branch 'master' of github.com:wummel/dosage 2014-07-21 20:19:17 +02:00
Tobias Gruetzmacher ea5d533e30 Fix index lookups for SnowFlame and SnowFlakes. 2014-07-19 13:23:42 +02:00
Bastian Kleineidam 05f0afdf99 Updated votes
[ci skip]
2014-07-16 02:02:14 +02:00
Bastian Kleineidam dd51f1618d Updated votes
[ci skip]
2014-07-09 01:40:43 +02:00
Bastian Kleineidam 011ef49b94 Updated webpage meta info
[ci skip]
2014-07-03 22:01:51 +02:00
Bastian Kleineidam c6debcfe1c Bump up version 2014-07-03 21:49:02 +02:00
Bastian Kleineidam 920a7302a2 Set release date.
[ci skip]
2014-07-03 18:44:57 +02:00
Bastian Kleineidam 4d49d4394b Fix doc 2014-07-03 18:42:06 +02:00
Bastian Kleineidam f194e430bc TheThinHLine: fetch bigger images and name image files from sequence number. 2014-07-03 18:41:25 +02:00
Bastian Kleineidam 4845a4ccc1 Merge branch 'master' of github.com:wummel/dosage 2014-07-03 17:12:42 +02:00
Bastian Kleineidam 641daa738b Updated list of comics 2014-07-03 17:12:25 +02:00
Bastian Kleineidam 93fe5d5987 Minor useragent refactoring 2014-07-03 17:12:25 +02:00
Bastian Kleineidam 4c2a339e25 Fix some comics. 2014-07-02 19:51:53 +02:00
Luc Fouin cb76198da7 added the thin H line, fixes #67 2014-07-02 17:14:33 +02:00
Luc Fouin 763f9b02a2 added the thin H line 2014-07-02 17:11:33 +02:00
Bastian Kleineidam b03ba158ef Fixed LookingForGroup 2014-07-01 23:44:01 +02:00
Bastian Kleineidam 2170b5a7ad Updated votes
[ci skip]
2014-06-25 01:47:24 +02:00