Tobias Gruetzmacher
d495d95ee0
Refactor: Move repeated check into its own function.
2014-10-13 21:29:54 +02:00
Tobias Gruetzmacher
3235b8b312
Pass unicode strings to lxml.
...
This reverts commit fcde86e9c0
& some
more. This lets python-requests do all the encoding stuff and leaves
LXML with (hopefully) clean unicode HTML to parse.
2014-10-13 19:39:48 +02:00
Bastian Kleineidam
e87f5993b8
Merge branch 'master' into htmlparser
2014-08-07 18:10:15 +02:00
Bastian Kleineidam
f76006d89d
Merge branch 'master' of github.com:wummel/dosage
2014-08-06 20:01:46 +02:00
Bastian Kleineidam
b9f7fb23e7
Updated votes
...
[ci skip]
2014-08-06 01:56:37 +02:00
Tobias Gruetzmacher
08175d28c9
Fix Ruthe (see #73 ).
2014-07-31 21:27:49 +02:00
Tobias Gruetzmacher
ca2d722d39
Fix DieFruehreifen ( closes #73 ).
2014-07-31 21:18:15 +02:00
Tobias Gruetzmacher
6c7fb176b1
Add Blade Kitten as an example for the new parser.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
f9f0b75d7c
Create new HTML parser based scraper class.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
fcde86e9c0
Change getPageContent to (optionally) return raw text.
...
This allows LXML to do its own "magic" encoding detection
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
0e03eca8f0
Move all regular expression operation into the new class.
...
- Move fetchUrls, fetchUrl and fetchText.
- Move base URL handling.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
fde1fdced6
Fix some typos.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
2567bd4e57
Convert starters and other helpers to new interface.
...
This allows those starters to work with future scrapers.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
4265053846
Refactor: Move regualar expression scraping into a new class.
...
- This also makes "<base href>" handling an internal detail of the regular
expression scraper, future scrapers might not need that or handle it in
another way.
2014-07-26 11:28:43 +02:00
Bastian Kleineidam
3a929ceea6
Allow comic text to be optional. Patch from TobiX
2014-07-24 20:49:57 +02:00
Bastian Kleineidam
950dd2932c
Remove stray print statement.
2014-07-21 20:20:15 +02:00
Bastian Kleineidam
bc6279f2ab
Merge branch 'master' of github.com:wummel/dosage
2014-07-21 20:19:17 +02:00
Tobias Gruetzmacher
ea5d533e30
Fix index lookups for SnowFlame and SnowFlakes.
2014-07-19 13:23:42 +02:00
Bastian Kleineidam
05f0afdf99
Updated votes
...
[ci skip]
2014-07-16 02:02:14 +02:00
Bastian Kleineidam
dd51f1618d
Updated votes
...
[ci skip]
2014-07-09 01:40:43 +02:00
Bastian Kleineidam
011ef49b94
Updated webpage meta info
...
[ci skip]
2014-07-03 22:01:51 +02:00
Bastian Kleineidam
c6debcfe1c
Bump up version
2014-07-03 21:49:02 +02:00
Bastian Kleineidam
920a7302a2
Set release date.
...
[ci skip]
2014-07-03 18:44:57 +02:00
Bastian Kleineidam
4d49d4394b
Fix doc
2014-07-03 18:42:06 +02:00
Bastian Kleineidam
f194e430bc
TheThinHLine: fetch bigger images and name image files from sequence number.
2014-07-03 18:41:25 +02:00
Bastian Kleineidam
4845a4ccc1
Merge branch 'master' of github.com:wummel/dosage
2014-07-03 17:12:42 +02:00
Bastian Kleineidam
641daa738b
Updated list of comics
2014-07-03 17:12:25 +02:00
Bastian Kleineidam
93fe5d5987
Minor useragent refactoring
2014-07-03 17:12:25 +02:00
Bastian Kleineidam
4c2a339e25
Fix some comics.
2014-07-02 19:51:53 +02:00
Luc Fouin
cb76198da7
added the thin H line, fixes #67
2014-07-02 17:14:33 +02:00
Luc Fouin
763f9b02a2
added the thin H line
2014-07-02 17:11:33 +02:00
Bastian Kleineidam
b03ba158ef
Fixed LookingForGroup
2014-07-01 23:44:01 +02:00
Bastian Kleineidam
2170b5a7ad
Updated votes
...
[ci skip]
2014-06-25 01:47:24 +02:00
Bastian Kleineidam
3485e2ac54
Added Whomp.
2014-06-24 20:48:49 +02:00
wummel
a0086bfcd8
Merge pull request #63 from sehrgut/master
...
Updated GirlGenius to new markup
2014-06-24 20:40:15 +02:00
Bastian Kleineidam
923b4d73d5
Merge branch 'master' of github.com:wummel/dosage
2014-06-23 22:46:47 +02:00
Bastian Kleineidam
fc6c54709f
Remove freecode submit code.
2014-06-23 22:21:03 +02:00
Peter B
8f1c864ec3
Added Safely Endangered
2014-06-17 01:05:11 -04:00
Keith Beckman
236b840363
Updated GirlGenius to new markup
...
GG markup has changed, so I fixed the prevSearch regex to find the
"previous" button on the redesigned page.
As well, I set multipleImagesPerStrip to true, since there are quite a
few comics with multiple images that were being discarded.
2014-06-13 16:43:40 -04:00
Bastian Kleineidam
94090da813
Use Pypi download.
...
[ci skip]
2014-06-09 14:24:18 +02:00
Bastian Kleineidam
54a10568d9
Don't check since author_email is not set.
2014-06-09 13:53:18 +02:00
Bastian Kleineidam
52b8a0aef1
No need to rename dist file.
2014-06-09 13:50:42 +02:00
Bastian Kleineidam
68afeaf82d
Make appname lowercase.
2014-06-09 13:24:58 +02:00
Bastian Kleineidam
531e612834
Updated webpage meta info
...
[ci skip]
2014-06-09 13:23:35 +02:00
Bastian Kleineidam
4a87741eff
Correct distribution file name
2014-06-09 08:22:30 +02:00
Bastian Kleineidam
bb8021e3ea
Use sdist to construct release.
2014-06-09 07:55:40 +02:00
Bastian Kleineidam
cc651b18ac
Bump up version and upload to Pypi.
2014-06-08 21:58:14 +02:00
Bastian Kleineidam
c3f69cd6bb
Updated changelog.
...
[ci skip]
2014-06-08 13:41:03 +02:00
Bastian Kleineidam
00e424aed0
Fix zenpencils.
2014-06-08 13:40:42 +02:00
Bastian Kleineidam
687d27d534
Stripping should be done in normaliseUrl.
2014-06-08 10:12:33 +02:00