Commit graph

87 commits

Author SHA1 Message Date
Techwolf f5b7b067b7 Switch AGirlAndHerFed to parser scraper 2019-11-03 21:37:05 +01:00
Tobias Gruetzmacher fbb3a18c91 Enable warnings and fix some of them 2018-05-23 00:54:40 +02:00
Tobias Gruetzmacher 75aa7207ea Some minor fixes to make some modules work again. 2017-11-27 01:04:35 +01:00
Tobias Gruetzmacher 7e0adf1d96 Unify more WordPress-based modules. 2017-05-22 01:17:05 +02:00
Tobias Gruetzmacher b8484cde50 Fix some more modules. 2017-05-15 00:27:28 +02:00
Tobias Gruetzmacher ebbb27d05d Move xpath_class to helpers module. 2017-02-13 22:41:17 +01:00
Tobias Gruetzmacher 20ca5d7fc2 Fix some modules. 2017-02-06 00:05:05 +01:00
Tobias Gruetzmacher c4a184d173 Remove some vanished modules. 2017-01-12 02:01:10 +01:00
Tobias Gruetzmacher 3f9feec041 Allow modules to ignore some HTTP error codes.
This is neccessary since it seems some webservers out there are
misconfigured to deliver actual content with an HTTP error code...
2016-11-01 18:25:02 +01:00
Tobias Gruetzmacher 9a6a310b76 Fixup copyright years. 2016-10-29 00:21:41 +02:00
Tobias Gruetzmacher b1d2650615 Fix some modules (a&b). 2016-09-29 01:29:01 +02:00
Tobias Gruetzmacher 4006ced43d Move all HijinksEnsue comics into alphabetic files. 2016-05-02 01:25:34 +02:00
Tobias Gruetzmacher c3f32dfef7 Refactor: Make namer a method.
When #42 is realized, the naming of files might differ between comic
modules, so the namer's logical location is the instance, not the class.
2016-04-21 08:20:49 +02:00
Tobias Gruetzmacher 0468f2f31a Refactor: Convert starter to simple method. 2016-04-13 20:01:51 +02:00
Tobias Gruetzmacher 42e43fa4e6 Read starter parameters from class.
This allows to specify starters in a more declarative and dynamic way.
2016-04-12 23:11:39 +02:00
Tobias Gruetzmacher fa98f6ddbf Move more comics to common WordPressScraper. 2016-04-10 23:04:34 +02:00
Tobias Gruetzmacher bb5b6ffcec Fix comics in module a.py. 2016-04-07 23:21:31 +02:00
Tobias Gruetzmacher 8768ff07b6 Fix AhoiPolloi, be a bit smarter about encoding.
HTML character encoding in the context of HTTP is quite tricky to get
right and honestly, I'm not sure if I did get it right this time. But I
think, the current behaviour matches best what web browsers try to do:

1. Let Requests figure out the content from the HTTP header. This
   overrides everything else. We need to "trick" LXML to accept our
   decision if the document contains an XML declaration which might
   disagree with the HTTP header.
2. If the HTTP headers don't specify any encoding, let LXML guess the
   encoding and be done with it.
2016-04-06 22:22:22 +02:00
Tobias Gruetzmacher 0d453a6858 Move Flowerlark Studios into alphabetical files. 2016-04-03 22:58:01 +02:00
Tobias Gruetzmacher bb1f20d867 Remove make_scraper for most WordPress comics.
- Dropped KatzenfutterGeleespritzer, because robots.txt.
- Move all WordPress/ComicPress scrapers into alphabetical files.
- Move _WordPressScraper & _ComicPress scraper into common.py.
- Some smaller PEP8 fixes.
2016-04-02 00:19:53 +02:00
Tobias Gruetzmacher 7f1e136d8b Sort comics alphabetically & PEP8 style fixes. 2016-03-31 23:13:54 +02:00
Helge Stasch 17fbdf2bf7 Added comic "Ahoy Earth" 2015-09-27 00:44:47 +02:00
Damjan Košir c19806b681 added AoiHouse 2015-07-31 23:33:30 +12:00
Tobias Gruetzmacher 7d3bd15c2f Remove AbleAndBaker, site is gone. 2015-07-16 00:49:48 +02:00
Damjan Košir 545a67111e fixed Alice 2015-06-01 15:15:34 +12:00
Damjan Košir 5569439c43 fixed 16 comics 2015-05-25 21:57:06 +12:00
DirkReiners b8ef6958b9 Merge branch 'master' of https://github.com/webcomics/dosage 2015-04-24 15:38:36 -05:00
Tobias Gruetzmacher ff21df596b Remove descriptions and genres (closes #9).
Maintaining the descriptions creates quite a bit of overhead (finding
them, copying them, checking if they are still correct) for a minimal
user benefit.

PS: Viewing this diff should be easier in a difftool that shows changes
in a line, for example kdiff3.
2015-04-20 20:29:09 +02:00
DirkReiners 8f3a9f660a Fixed ASofterWorld 2015-04-16 18:35:21 -05:00
Dirk Reiners fda654b5e0 Some fixes...
AbstruseGoose: fixed prev
Carciphona: fixed latest
Curtailed: fixed image and prev (moved to WP)
DorkTower: fixed image search
GrrlPower: fixed site name issue
MadamAndEve: archive not updated in a long time, but current strip is.
Works, but needs to be run daily.
PennyArcade: fixed namer
PvPonline: fixed prev
2014-10-24 16:42:32 -05:00
Bastian Kleineidam 875e431edc Provide page data in shouldSkipUrl() function 2014-02-10 21:58:09 +01:00
Bastian Kleineidam cc5ee572fb Fix some comics 2014-01-24 23:17:21 +01:00
Bastian Kleineidam 4d63920434 Updated copyright. 2014-01-05 16:50:57 +01:00
Bastian Kleineidam f488935072 Fix AbstruseGoose and QuestionabelContent. 2013-12-22 08:01:58 +01:00
Peter B 36dcadc7d4 Store alt text from AbstruseGoose 2013-12-03 21:56:54 -05:00
Bastian Kleineidam 48e417c647 Fixed some comics. 2013-11-18 22:01:30 +01:00
Bastian Kleineidam 7760985601 Fix broken comics 2013-11-12 18:33:14 +01:00
Bastian Kleineidam 45a5ef9064 Removed AetheriaEpics 2013-11-07 21:23:15 +01:00
Bastian Kleineidam ef4ae435a5 Fix several comics. 2013-07-18 20:39:53 +02:00
Bastian Kleineidam 1c1b0aaf18 Comic fixes. 2013-05-25 23:24:33 +02:00
Bastian Kleineidam dcacbf0b9a Fix some comics. 2013-04-28 19:58:38 +02:00
Bastian Kleineidam 51d84131eb Added ARedTailsDream 2013-04-25 20:37:27 +02:00
Bastian Kleineidam 4988e79e6e Added some descriptions. 2013-04-19 06:31:12 +02:00
Bastian Kleineidam e37a80fdc1 Add some descriptions. 2013-04-14 09:02:14 +02:00
Bastian Kleineidam f15f993851 s/baseurl/baseUrl/g 2013-04-13 20:58:00 +02:00
Bastian Kleineidam 7e593cf7e8 Add firstStripUrls. 2013-04-10 23:57:09 +02:00
Bastian Kleineidam fb05c10808 Sort entries. 2013-04-10 18:36:33 +02:00
Bastian Kleineidam 5127d4c895 Use re.escape and add some firstStripUrl. 2013-04-10 18:19:11 +02:00
Bastian Kleineidam 17fe58b864 Fix some comics. 2013-03-19 20:45:18 +01:00
Bastian Kleineidam 88e28f3923 Fix some comics and add language tag. 2013-03-08 22:33:05 +01:00