Tobias Gruetzmacher
be1a63da0c
Update GoComics comic list.
2016-05-16 18:26:45 +02:00
Tobias Gruetzmacher
b9d9564085
Fix Dilbert ( fixes #44 ).
2016-05-16 01:21:23 +02:00
Tobias Gruetzmacher
e9b3c487c0
Remove some dead comics.
2016-05-16 01:10:20 +02:00
Tobias Gruetzmacher
bd60155d9f
Some more ComicFury comics gone...
2016-05-16 00:53:22 +02:00
Tobias Gruetzmacher
849e60e795
Remove make_scraper magic from webcomiceu.
2016-05-07 03:20:01 +02:00
Tobias Gruetzmacher
975d2376bf
Another round of comic module fixes.
2016-05-07 01:50:10 +02:00
Tobias Gruetzmacher
efe1308db2
Replace home-grown Python2/3 compat. with six.
2016-05-05 23:33:48 +02:00
Tobias Gruetzmacher
77ed0218e0
Fix some comic modules.
2016-05-05 20:55:14 +02:00
Tobias Gruetzmacher
bb2ac39639
Fix some URLs.
2016-05-05 10:12:03 +02:00
Tobias Gruetzmacher
d05316e3ac
Seems ComicFury is deleting comics regularly...
...
Well, there's nothing we can do: Remove them.
2016-05-04 08:26:53 +02:00
Tobias Gruetzmacher
0c1aa9e8bd
Move libxml < 2.9.3 workaround to base class.
2016-05-02 23:22:06 +02:00
Tobias Gruetzmacher
b93a8fde65
Move PensAndTales comics and fix them.
2016-05-02 22:32:14 +02:00
Tobias Gruetzmacher
4006ced43d
Move all HijinksEnsue comics into alphabetic files.
2016-05-02 01:25:34 +02:00
Tobias Gruetzmacher
d5f91ecfd2
Fix some modules in m.py.
2016-04-30 01:59:28 +02:00
Tobias Gruetzmacher
1d52d33311
Remove missing SmackJeeves comics.
2016-04-30 00:56:20 +02:00
Tobias Gruetzmacher
d796f3476c
Fix some modules in d.py.
2016-04-30 00:44:18 +02:00
Tobias Gruetzmacher
cc16fea880
Fix some modules in c.py
2016-04-29 00:35:02 +02:00
Tobias Gruetzmacher
1d94439715
Fix some more comic modules.
2016-04-27 00:31:27 +02:00
Tobias Gruetzmacher
8b1ac4eb35
Fix "tagsoup" on SmackJeeves
...
Unfortunatly, browsers render < outside of HTML tags differently then
libXML until recently (libXML 2.9.3), so we need to preprocess pages
before parsing them...
(This was fixed in libXML commit 140c25)
2016-04-26 08:05:38 +02:00
Tobias Gruetzmacher
8ddf553eb4
Fix some more SmackJeeves modules.
2016-04-22 01:04:47 +02:00
Tobias Gruetzmacher
6574997e01
Refactor: All the other class methods.
...
Turns out, it would have been better if all methods had been instance
methods and not class methods. This finished a big chunk of the rework
needed for #42 .
2016-04-21 23:52:31 +02:00
Tobias Gruetzmacher
0d436b8ca9
Refactor: url modifiers to normal methods.
...
As before, to implement #42 these might want to access information from
the instance, so they should be normal methods.
2016-04-21 21:39:25 +02:00
Tobias Gruetzmacher
c3f32dfef7
Refactor: Make namer a method.
...
When #42 is realized, the naming of files might differ between comic
modules, so the namer's logical location is the instance, not the class.
2016-04-21 08:20:49 +02:00
Tobias Gruetzmacher
fe51a449df
Update SmackJeeves
...
- Now uses _ParserScraper, which makes the pattern quite a bit more
generic and IMHO more readable
- remove make_scraper magic
- No new comics, only fixed existing ones and removed some dead ones.
2016-04-20 23:36:45 +02:00
Tobias Gruetzmacher
13a3409854
Remove some comics that are gone or block us.
2016-04-17 19:42:43 +02:00
Tobias Gruetzmacher
1fbc844077
Update GoComics.
2016-04-17 18:40:09 +02:00
Tobias Gruetzmacher
73e958670d
Update ComicFury (again).
2016-04-17 16:19:44 +02:00
Tobias Gruetzmacher
3329027e4b
Update ComicFury.
2016-04-16 13:13:47 +02:00
Tobias Gruetzmacher
ee99c087d7
Remove prevUrlMatchesStripUrl.
...
It was only used for one test.
2016-04-16 01:14:26 +02:00
Tobias Gruetzmacher
52515b5fc5
Update GoComics.
2016-04-15 00:26:14 +02:00
Tobias Gruetzmacher
031a523846
Fix SnafuComics.
2016-04-14 23:52:35 +02:00
Tobias Gruetzmacher
7626b1e100
Webcomics Nation is gone.
2016-04-14 22:46:52 +02:00
Tobias Gruetzmacher
497653c448
Remove make_scraper magic from Arcamax.
2016-04-14 00:17:59 +02:00
Tobias Gruetzmacher
db87ed95e7
Use new features to make modules simpler.
2016-04-13 23:28:43 +02:00
Tobias Gruetzmacher
060281e5ff
Use concrete scraper objects everywhere.
...
This is a first step for #42 . Since most access to the scraper classes
is through instances, modules can now dynamically override url and name
(name is now a property).
2016-04-13 22:17:30 +02:00
Tobias Gruetzmacher
0468f2f31a
Refactor: Convert starter to simple method.
2016-04-13 20:01:51 +02:00
Tobias Gruetzmacher
16004e43e4
Use default bounceStarter for site modules.
2016-04-13 01:24:13 +02:00
Tobias Gruetzmacher
42e43fa4e6
Read starter parameters from class.
...
This allows to specify starters in a more declarative and dynamic way.
2016-04-12 23:11:39 +02:00
Tobias Gruetzmacher
b865a171f9
Remove some broken comics.
2016-04-12 08:21:06 +02:00
Tobias Gruetzmacher
4e2e4ac529
Prevent scraper from moving to a different comic.
2016-04-12 08:10:47 +02:00
Tobias Gruetzmacher
443ab119e9
Refresh GoComics list from online directory.
2016-04-12 00:36:33 +02:00
Tobias Gruetzmacher
0e385a3697
Update GoComics (no change in supported comics)
...
- remove make_scraper magic
- switch to _ParserScraper
2016-04-11 22:42:01 +02:00
Tobias Gruetzmacher
ad7a297964
Fix WLP comics.
2016-04-11 01:07:21 +02:00
Damjan Košir
af2e57d850
Added comic ScurryAndCover...
...
- Yay, funky JavaScript parsing!
- Start page isn't latest comic...
Updated-by: Tobias Gruetzmacher <tobias-git@23.gs>
2016-04-11 00:09:53 +02:00
Tobias Gruetzmacher
fa98f6ddbf
Move more comics to common WordPressScraper.
2016-04-10 23:04:34 +02:00
Tobias Gruetzmacher
bb5b6ffcec
Fix comics in module a.py.
2016-04-07 23:21:31 +02:00
Tobias Gruetzmacher
0033a8046b
Fix creators module.
2016-04-07 00:20:03 +02:00
Tobias Gruetzmacher
8768ff07b6
Fix AhoiPolloi, be a bit smarter about encoding.
...
HTML character encoding in the context of HTTP is quite tricky to get
right and honestly, I'm not sure if I did get it right this time. But I
think, the current behaviour matches best what web browsers try to do:
1. Let Requests figure out the content from the HTTP header. This
overrides everything else. We need to "trick" LXML to accept our
decision if the document contains an XML declaration which might
disagree with the HTTP header.
2. If the HTTP headers don't specify any encoding, let LXML guess the
encoding and be done with it.
2016-04-06 22:22:22 +02:00
Tobias Gruetzmacher
183d18e7bc
Skip non-image on xkcd.
2016-04-06 00:50:01 +02:00
Tobias Gruetzmacher
9feaf245f2
Fixed & removed some comics in s.py.
2016-04-06 00:40:13 +02:00