Tobias Gruetzmacher
17bc454132
Bugfix: Don't assume RE patterns in base class.
2014-10-13 22:29:47 +02:00
Tobias Gruetzmacher
e92a3fb3a1
New feature: Comic modules ca be "disabled".
...
This is modeled parallel to the "adult" feature, except the user can't
override it via the command line. Each comic module can override the
classmethod getDisabledReasons and give the user a reason why this
module is disabled. The user can see the reason in the comic list (-l or
--singlelist) and the comic module refuses to run, showing the same
message.
This is currently used to disable modules that use the _ParserScraper if
the LXML python module is missing.
2014-10-13 21:43:46 +02:00
Tobias Gruetzmacher
d495d95ee0
Refactor: Move repeated check into its own function.
2014-10-13 21:29:54 +02:00
Tobias Gruetzmacher
3235b8b312
Pass unicode strings to lxml.
...
This reverts commit fcde86e9c0
& some
more. This lets python-requests do all the encoding stuff and leaves
LXML with (hopefully) clean unicode HTML to parse.
2014-10-13 19:39:48 +02:00
Bastian Kleineidam
e87f5993b8
Merge branch 'master' into htmlparser
2014-08-07 18:10:15 +02:00
Bastian Kleineidam
f76006d89d
Merge branch 'master' of github.com:wummel/dosage
2014-08-06 20:01:46 +02:00
Bastian Kleineidam
b9f7fb23e7
Updated votes
...
[ci skip]
2014-08-06 01:56:37 +02:00
Tobias Gruetzmacher
08175d28c9
Fix Ruthe (see #73 ).
2014-07-31 21:27:49 +02:00
Tobias Gruetzmacher
ca2d722d39
Fix DieFruehreifen ( closes #73 ).
2014-07-31 21:18:15 +02:00
Tobias Gruetzmacher
6c7fb176b1
Add Blade Kitten as an example for the new parser.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
f9f0b75d7c
Create new HTML parser based scraper class.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
fcde86e9c0
Change getPageContent to (optionally) return raw text.
...
This allows LXML to do its own "magic" encoding detection
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
0e03eca8f0
Move all regular expression operation into the new class.
...
- Move fetchUrls, fetchUrl and fetchText.
- Move base URL handling.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
fde1fdced6
Fix some typos.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
2567bd4e57
Convert starters and other helpers to new interface.
...
This allows those starters to work with future scrapers.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
4265053846
Refactor: Move regualar expression scraping into a new class.
...
- This also makes "<base href>" handling an internal detail of the regular
expression scraper, future scrapers might not need that or handle it in
another way.
2014-07-26 11:28:43 +02:00
Bastian Kleineidam
3a929ceea6
Allow comic text to be optional. Patch from TobiX
2014-07-24 20:49:57 +02:00
Bastian Kleineidam
950dd2932c
Remove stray print statement.
2014-07-21 20:20:15 +02:00
Bastian Kleineidam
bc6279f2ab
Merge branch 'master' of github.com:wummel/dosage
2014-07-21 20:19:17 +02:00
Tobias Gruetzmacher
ea5d533e30
Fix index lookups for SnowFlame and SnowFlakes.
2014-07-19 13:23:42 +02:00
Bastian Kleineidam
05f0afdf99
Updated votes
...
[ci skip]
2014-07-16 02:02:14 +02:00
Bastian Kleineidam
dd51f1618d
Updated votes
...
[ci skip]
2014-07-09 01:40:43 +02:00
Bastian Kleineidam
011ef49b94
Updated webpage meta info
...
[ci skip]
2014-07-03 22:01:51 +02:00
Bastian Kleineidam
c6debcfe1c
Bump up version
2014-07-03 21:49:02 +02:00
Bastian Kleineidam
920a7302a2
Set release date.
...
[ci skip]
2014-07-03 18:44:57 +02:00
Bastian Kleineidam
4d49d4394b
Fix doc
2014-07-03 18:42:06 +02:00
Bastian Kleineidam
f194e430bc
TheThinHLine: fetch bigger images and name image files from sequence number.
2014-07-03 18:41:25 +02:00
Bastian Kleineidam
4845a4ccc1
Merge branch 'master' of github.com:wummel/dosage
2014-07-03 17:12:42 +02:00
Bastian Kleineidam
641daa738b
Updated list of comics
2014-07-03 17:12:25 +02:00
Bastian Kleineidam
93fe5d5987
Minor useragent refactoring
2014-07-03 17:12:25 +02:00
Bastian Kleineidam
4c2a339e25
Fix some comics.
2014-07-02 19:51:53 +02:00
Luc Fouin
cb76198da7
added the thin H line, fixes #67
2014-07-02 17:14:33 +02:00
Luc Fouin
763f9b02a2
added the thin H line
2014-07-02 17:11:33 +02:00
Bastian Kleineidam
b03ba158ef
Fixed LookingForGroup
2014-07-01 23:44:01 +02:00
Bastian Kleineidam
2170b5a7ad
Updated votes
...
[ci skip]
2014-06-25 01:47:24 +02:00
Bastian Kleineidam
3485e2ac54
Added Whomp.
2014-06-24 20:48:49 +02:00
wummel
a0086bfcd8
Merge pull request #63 from sehrgut/master
...
Updated GirlGenius to new markup
2014-06-24 20:40:15 +02:00
Bastian Kleineidam
923b4d73d5
Merge branch 'master' of github.com:wummel/dosage
2014-06-23 22:46:47 +02:00
Bastian Kleineidam
fc6c54709f
Remove freecode submit code.
2014-06-23 22:21:03 +02:00
Peter B
8f1c864ec3
Added Safely Endangered
2014-06-17 01:05:11 -04:00
Keith Beckman
236b840363
Updated GirlGenius to new markup
...
GG markup has changed, so I fixed the prevSearch regex to find the
"previous" button on the redesigned page.
As well, I set multipleImagesPerStrip to true, since there are quite a
few comics with multiple images that were being discarded.
2014-06-13 16:43:40 -04:00
Bastian Kleineidam
94090da813
Use Pypi download.
...
[ci skip]
2014-06-09 14:24:18 +02:00
Bastian Kleineidam
54a10568d9
Don't check since author_email is not set.
2014-06-09 13:53:18 +02:00
Bastian Kleineidam
52b8a0aef1
No need to rename dist file.
2014-06-09 13:50:42 +02:00
Bastian Kleineidam
68afeaf82d
Make appname lowercase.
2014-06-09 13:24:58 +02:00
Bastian Kleineidam
531e612834
Updated webpage meta info
...
[ci skip]
2014-06-09 13:23:35 +02:00
Bastian Kleineidam
4a87741eff
Correct distribution file name
2014-06-09 08:22:30 +02:00
Bastian Kleineidam
bb8021e3ea
Use sdist to construct release.
2014-06-09 07:55:40 +02:00
Bastian Kleineidam
cc651b18ac
Bump up version and upload to Pypi.
2014-06-08 21:58:14 +02:00
Bastian Kleineidam
c3f69cd6bb
Updated changelog.
...
[ci skip]
2014-06-08 13:41:03 +02:00