Commit graph

844 commits

Author SHA1 Message Date
Tobias Gruetzmacher 215d597573 Remove DrunkDuck for now.
- It's been disabled for ages
- Needs a major rework
- I don't want to add that many comics anyways...
- This also gets rid of make_scraper :)
2016-06-05 22:22:17 +02:00
Tobias Gruetzmacher 67d0d38100 Migrate SnafuComics to single-class module. 2016-06-05 22:12:16 +02:00
Tobias Gruetzmacher 125c96e9dc Remove command to download ALL comics... 2016-06-05 21:57:56 +02:00
Tobias Gruetzmacher df2048cb34 Keep track of removed and moved comics (fixes #41).
I plan on keeping this list for at least ~ 2 releases and then purging
older entries...
2016-06-05 21:47:58 +02:00
Tobias Gruetzmacher 9b755a7e6c Restore BobWhite. 2016-06-05 18:32:27 +02:00
Tobias Gruetzmacher 603fd62a1e Fix workaround for PyInstaller... 2016-06-05 16:01:35 +02:00
Tobias Gruetzmacher 295b53a2d3 Fix name overrides (broken by 51008a). 2016-06-05 10:03:29 +02:00
Tobias Gruetzmacher 844bec09ba Remove another dead comic from ComicFury. 2016-06-05 01:06:04 +02:00
Tobias Gruetzmacher 12123961a4 Fix error in PyInstaller packaged application. 2016-06-05 00:34:16 +02:00
André-Patrick Bubel 2b8e948868 Add String Theory comic 2016-06-01 11:19:17 +00:00
André-Patrick Bubel 192751073c Add KillSixBillionDemons comic 2016-05-31 07:28:32 +00:00
Tobias Gruetzmacher 807bee6342 Migrate GoComics to single-class module. 2016-05-23 00:01:10 +02:00
Tobias Gruetzmacher 2c8e57bdea Migrate Creators to single-class module. 2016-05-22 23:56:59 +02:00
Tobias Gruetzmacher f5dff27b0a Migrate SmackJeeves to single-class module. 2016-05-22 23:54:21 +02:00
Tobias Gruetzmacher 1ea20e1743 Migrate WebcomicFactory to single-class module. 2016-05-22 23:40:58 +02:00
Tobias Gruetzmacher c62a7283a2 Migrate ComicFury to single-class module. 2016-05-22 23:31:53 +02:00
Tobias Gruetzmacher 1834bf179f Migrate Arcamax to single-class module. 2016-05-22 23:17:24 +02:00
Tobias Gruetzmacher f29472c143 Make auto-update script more flexible. 2016-05-22 23:06:05 +02:00
Tobias Gruetzmacher e4650d5941 Remove make_scraper from Nitrocosm. 2016-05-21 14:35:53 +02:00
Tobias Gruetzmacher b6eb8ab8ef Remove make_scraper from SandraAndWoo 2016-05-21 14:12:11 +02:00
Tobias Gruetzmacher 4630ea047c Implement Oglaf's strange navigation (fixes #33)
(also should fix wummel#91)
2016-05-21 02:38:07 +02:00
Tobias Gruetzmacher 51008a975b Refactor: Introduce generator methods for scrapers
This allows one comic module class to generate multiple scrapers. This
change is to support a more dynamic module system as described in #42.
2016-05-21 01:29:36 +02:00
Tobias Gruetzmacher 89cfd9d310 Add comics from catomix.com. 2016-05-16 23:55:41 +02:00
Tobias Gruetzmacher a6cf4e7040 Fix some more comic modules. 2016-05-16 23:16:29 +02:00
Tobias Gruetzmacher be1a63da0c Update GoComics comic list. 2016-05-16 18:26:45 +02:00
Tobias Gruetzmacher 6d3f74142c Move command line tool into package.
This way we can use the default Python console_scripts install process.
2016-05-16 14:57:47 +02:00
Tobias Gruetzmacher b9d9564085 Fix Dilbert (fixes #44). 2016-05-16 01:21:23 +02:00
Tobias Gruetzmacher e9b3c487c0 Remove some dead comics. 2016-05-16 01:10:20 +02:00
Tobias Gruetzmacher bd60155d9f Some more ComicFury comics gone... 2016-05-16 00:53:22 +02:00
Tobias Gruetzmacher 849e60e795 Remove make_scraper magic from webcomiceu. 2016-05-07 03:20:01 +02:00
Tobias Gruetzmacher 975d2376bf Another round of comic module fixes. 2016-05-07 01:50:10 +02:00
Tobias Gruetzmacher efe1308db2 Replace home-grown Python2/3 compat. with six. 2016-05-05 23:33:48 +02:00
Tobias Gruetzmacher 77ed0218e0 Fix some comic modules. 2016-05-05 20:55:14 +02:00
Tobias Gruetzmacher bb2ac39639 Fix some URLs. 2016-05-05 10:12:03 +02:00
Tobias Gruetzmacher d05316e3ac Seems ComicFury is deleting comics regularly...
Well, there's nothing we can do: Remove them.
2016-05-04 08:26:53 +02:00
Tobias Gruetzmacher 0c1aa9e8bd Move libxml < 2.9.3 workaround to base class. 2016-05-02 23:22:06 +02:00
Tobias Gruetzmacher b93a8fde65 Move PensAndTales comics and fix them. 2016-05-02 22:32:14 +02:00
Tobias Gruetzmacher 4006ced43d Move all HijinksEnsue comics into alphabetic files. 2016-05-02 01:25:34 +02:00
Tobias Gruetzmacher d5f91ecfd2 Fix some modules in m.py. 2016-04-30 01:59:28 +02:00
Tobias Gruetzmacher 1d52d33311 Remove missing SmackJeeves comics. 2016-04-30 00:56:20 +02:00
Tobias Gruetzmacher d796f3476c Fix some modules in d.py. 2016-04-30 00:44:18 +02:00
Tobias Gruetzmacher cc16fea880 Fix some modules in c.py 2016-04-29 00:35:02 +02:00
Tobias Gruetzmacher 1d94439715 Fix some more comic modules. 2016-04-27 00:31:27 +02:00
Tobias Gruetzmacher 8b1ac4eb35 Fix "tagsoup" on SmackJeeves
Unfortunatly, browsers render < outside of HTML tags differently then
libXML until recently (libXML 2.9.3), so we need to preprocess pages
before parsing them...

(This was fixed in libXML commit 140c25)
2016-04-26 08:05:38 +02:00
Tobias Gruetzmacher 035d6e94e4 Allow output level for warnings and errors. 2016-04-26 07:53:53 +02:00
Tobias Gruetzmacher 8ddf553eb4 Fix some more SmackJeeves modules. 2016-04-22 01:04:47 +02:00
Tobias Gruetzmacher fd85c8583a Unify similar code in fetchUrl and fetchText 2016-04-22 00:42:46 +02:00
Tobias Gruetzmacher 6574997e01 Refactor: All the other class methods.
Turns out, it would have been better if all methods had been instance
methods and not class methods. This finished a big chunk of the rework
needed for #42.
2016-04-21 23:52:31 +02:00
Tobias Gruetzmacher 0d436b8ca9 Refactor: url modifiers to normal methods.
As before, to implement #42 these might want to access information from
the instance, so they should be normal methods.
2016-04-21 21:39:25 +02:00
Tobias Gruetzmacher c3f32dfef7 Refactor: Make namer a method.
When #42 is realized, the naming of files might differ between comic
modules, so the namer's logical location is the instance, not the class.
2016-04-21 08:20:49 +02:00
Tobias Gruetzmacher 5bd2a49f48 Add debug output on matched XPath/CSS expression. 2016-04-20 23:51:54 +02:00
Tobias Gruetzmacher fe51a449df Update SmackJeeves
- Now uses _ParserScraper, which makes the pattern quite a bit more
  generic and IMHO more readable
- remove make_scraper magic
- No new comics, only fixed existing ones and removed some dead ones.
2016-04-20 23:36:45 +02:00
Tobias Gruetzmacher 190cd3b063 Convert language & getDisabledReasons to methods.
Both are more properties of a webcomic (this is part of the design
changes for #42)
2016-04-19 23:53:46 +02:00
Tobias Gruetzmacher df46907f39 Register EXSLT extensions by default.
This allows comic module authors to use the full power of regular
expressions in XPath expression, see http://exslt.org/regexp/regexp.html
for usage. Please be aware that these use the prefix re: instead of
regexp: here.
2016-04-19 23:48:14 +02:00
Tobias Gruetzmacher 4204f5f1e4 Send "If-Modified-Since" header for images. 2016-04-19 00:36:50 +02:00
Tobias Gruetzmacher 13a3409854 Remove some comics that are gone or block us. 2016-04-17 19:42:43 +02:00
Tobias Gruetzmacher 1fbc844077 Update GoComics. 2016-04-17 18:40:09 +02:00
Tobias Gruetzmacher 73e958670d Update ComicFury (again). 2016-04-17 16:19:44 +02:00
Tobias Gruetzmacher b0481a01f7 Update languages. 2016-04-16 13:14:12 +02:00
Tobias Gruetzmacher 3329027e4b Update ComicFury. 2016-04-16 13:13:47 +02:00
Tobias Gruetzmacher ee99c087d7 Remove prevUrlMatchesStripUrl.
It was only used for one test.
2016-04-16 01:14:26 +02:00
Tobias Gruetzmacher 92a688457a Remove useless indirection. 2016-04-15 23:42:24 +02:00
Tobias Gruetzmacher 52515b5fc5 Update GoComics. 2016-04-15 00:26:14 +02:00
Tobias Gruetzmacher 031a523846 Fix SnafuComics. 2016-04-14 23:52:35 +02:00
Tobias Gruetzmacher 7626b1e100 Webcomics Nation is gone. 2016-04-14 22:46:52 +02:00
Tobias Gruetzmacher 497653c448 Remove make_scraper magic from Arcamax. 2016-04-14 00:17:59 +02:00
Tobias Gruetzmacher db87ed95e7 Use new features to make modules simpler. 2016-04-13 23:28:43 +02:00
Tobias Gruetzmacher b266e28ae1 Remove debugging prints 😭 2016-04-13 22:59:06 +02:00
Tobias Gruetzmacher ff3b824311 Fix variable shadowing... 2016-04-13 22:43:34 +02:00
Tobias Gruetzmacher 060281e5ff Use concrete scraper objects everywhere.
This is a first step for #42. Since most access to the scraper classes
is through instances, modules can now dynamically override url and name
(name is now a property).
2016-04-13 22:17:30 +02:00
Tobias Gruetzmacher 0468f2f31a Refactor: Convert starter to simple method. 2016-04-13 20:01:51 +02:00
Tobias Gruetzmacher 16004e43e4 Use default bounceStarter for site modules. 2016-04-13 01:24:13 +02:00
Tobias Gruetzmacher 9028724a74 Clean up update helper scripts. 2016-04-13 00:52:16 +02:00
Tobias Gruetzmacher 42e43fa4e6 Read starter parameters from class.
This allows to specify starters in a more declarative and dynamic way.
2016-04-12 23:11:39 +02:00
Tobias Gruetzmacher b865a171f9 Remove some broken comics. 2016-04-12 08:21:06 +02:00
Tobias Gruetzmacher 4e2e4ac529 Prevent scraper from moving to a different comic. 2016-04-12 08:10:47 +02:00
Tobias Gruetzmacher 443ab119e9 Refresh GoComics list from online directory. 2016-04-12 00:36:33 +02:00
Tobias Gruetzmacher 0e385a3697 Update GoComics (no change in supported comics)
- remove make_scraper magic
- switch to _ParserScraper
2016-04-11 22:42:01 +02:00
Tobias Gruetzmacher ad7a297964 Fix WLP comics. 2016-04-11 01:07:21 +02:00
Damjan Košir af2e57d850 Added comic ScurryAndCover...
- Yay, funky JavaScript parsing!
- Start page isn't latest comic...

Updated-by: Tobias Gruetzmacher <tobias-git@23.gs>
2016-04-11 00:09:53 +02:00
Tobias Gruetzmacher fa98f6ddbf Move more comics to common WordPressScraper. 2016-04-10 23:04:34 +02:00
Tobias Gruetzmacher f6e605e146 Fix unicode error in text search. 2016-04-10 13:16:30 +02:00
Tobias Gruetzmacher bc10bd9a4d Streamline color output.
- Depend on external colorama instead of embedding an old copy.
- Move most output code into output module.
- Convert pager to context manager.
2016-04-10 03:45:00 +02:00
Tobias Gruetzmacher bb5b6ffcec Fix comics in module a.py. 2016-04-07 23:21:31 +02:00
Tobias Gruetzmacher 0033a8046b Fix creators module. 2016-04-07 00:20:03 +02:00
Tobias Gruetzmacher 8768ff07b6 Fix AhoiPolloi, be a bit smarter about encoding.
HTML character encoding in the context of HTTP is quite tricky to get
right and honestly, I'm not sure if I did get it right this time. But I
think, the current behaviour matches best what web browsers try to do:

1. Let Requests figure out the content from the HTTP header. This
   overrides everything else. We need to "trick" LXML to accept our
   decision if the document contains an XML declaration which might
   disagree with the HTTP header.
2. If the HTTP headers don't specify any encoding, let LXML guess the
   encoding and be done with it.
2016-04-06 22:22:22 +02:00
Tobias Gruetzmacher 183d18e7bc Skip non-image on xkcd. 2016-04-06 00:50:01 +02:00
Tobias Gruetzmacher 9feaf245f2 Fixed & removed some comics in s.py. 2016-04-06 00:40:13 +02:00
Tobias Gruetzmacher 6bbdcfb341 BloomingFaeries: Don't download every page twice.
(Also, simplify namer, switch to _ParserScraper)
2016-04-05 23:58:43 +02:00
Tobias Gruetzmacher 8db6f8e8b7 Fix ZapComics, remove ZebraGirl.
- ZebraGirl is now ComicFury/ZebraGirl...
2016-04-04 00:27:11 +02:00
Tobias Gruetzmacher 0bcfb8a82e Move ComicControl into common module.
- Move all comics using ComicControl into alphabetical files.
- Add BalderDash & Picklewhistle
2016-04-04 00:12:53 +02:00
Tobias Gruetzmacher 0d453a6858 Move Flowerlark Studios into alphabetical files. 2016-04-03 22:58:01 +02:00
Tobias Gruetzmacher a9f0dfdce4 Merge pull request #39 from peterjanes/peterjanes/sherman-fix
Fix Sherman's Lagoon
2016-04-03 22:20:04 +02:00
Tobias Gruetzmacher 926439cd14 Every comic need an url. 2016-04-03 22:03:16 +02:00
Tobias Gruetzmacher 2c6decb7f5 Move WebcomicFactory in its own module.
Also, add an updater script for it.
2016-04-03 21:31:56 +02:00
Peter Janes 759bd0c360 Fix Sherman's Lagoon 2016-04-03 14:54:41 -04:00
Tobias Gruetzmacher bb1f20d867 Remove make_scraper for most WordPress comics.
- Dropped KatzenfutterGeleespritzer, because robots.txt.
- Move all WordPress/ComicPress scrapers into alphabetical files.
- Move _WordPressScraper & _ComicPress scraper into common.py.
- Some smaller PEP8 fixes.
2016-04-02 00:19:53 +02:00
Tobias Gruetzmacher 7f1e136d8b Sort comics alphabetically & PEP8 style fixes. 2016-03-31 23:13:54 +02:00
Tobias Gruetzmacher d6db1d0b81 Fix a conflict with IPython. 2016-03-20 23:57:07 +01:00
Tobias Gruetzmacher 90dfceaeb1 Remove dead modules (& format). 2016-03-20 20:48:42 +01:00