Commit graph

960 commits

Author SHA1 Message Date
Tobias Gruetzmacher
d495d95ee0 Refactor: Move repeated check into its own function. 2014-10-13 21:29:54 +02:00
Tobias Gruetzmacher
3235b8b312 Pass unicode strings to lxml.
This reverts commit fcde86e9c0 & some
more. This lets python-requests do all the encoding stuff and leaves
LXML with (hopefully) clean unicode HTML to parse.
2014-10-13 19:39:48 +02:00
Bastian Kleineidam
e87f5993b8 Merge branch 'master' into htmlparser 2014-08-07 18:10:15 +02:00
Bastian Kleineidam
f76006d89d Merge branch 'master' of github.com:wummel/dosage 2014-08-06 20:01:46 +02:00
Bastian Kleineidam
b9f7fb23e7 Updated votes
[ci skip]
2014-08-06 01:56:37 +02:00
Tobias Gruetzmacher
08175d28c9 Fix Ruthe (see #73). 2014-07-31 21:27:49 +02:00
Tobias Gruetzmacher
ca2d722d39 Fix DieFruehreifen (closes #73). 2014-07-31 21:18:15 +02:00
Tobias Gruetzmacher
6c7fb176b1 Add Blade Kitten as an example for the new parser. 2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
f9f0b75d7c Create new HTML parser based scraper class. 2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
fcde86e9c0 Change getPageContent to (optionally) return raw text.
This allows LXML to do its own "magic" encoding detection
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
0e03eca8f0 Move all regular expression operation into the new class.
- Move fetchUrls, fetchUrl and fetchText.
- Move base URL handling.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
fde1fdced6 Fix some typos. 2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
2567bd4e57 Convert starters and other helpers to new interface.
This allows those starters to work with future scrapers.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
4265053846 Refactor: Move regualar expression scraping into a new class.
- This also makes "<base href>" handling an internal detail of the regular
  expression scraper, future scrapers might not need that or handle it in
  another way.
2014-07-26 11:28:43 +02:00
Bastian Kleineidam
3a929ceea6 Allow comic text to be optional. Patch from TobiX 2014-07-24 20:49:57 +02:00
Bastian Kleineidam
950dd2932c Remove stray print statement. 2014-07-21 20:20:15 +02:00
Bastian Kleineidam
bc6279f2ab Merge branch 'master' of github.com:wummel/dosage 2014-07-21 20:19:17 +02:00
Tobias Gruetzmacher
ea5d533e30 Fix index lookups for SnowFlame and SnowFlakes. 2014-07-19 13:23:42 +02:00
Bastian Kleineidam
05f0afdf99 Updated votes
[ci skip]
2014-07-16 02:02:14 +02:00
Bastian Kleineidam
dd51f1618d Updated votes
[ci skip]
2014-07-09 01:40:43 +02:00
Bastian Kleineidam
011ef49b94 Updated webpage meta info
[ci skip]
2014-07-03 22:01:51 +02:00
Bastian Kleineidam
c6debcfe1c Bump up version 2014-07-03 21:49:02 +02:00
Bastian Kleineidam
920a7302a2 Set release date.
[ci skip]
2014-07-03 18:44:57 +02:00
Bastian Kleineidam
4d49d4394b Fix doc 2014-07-03 18:42:06 +02:00
Bastian Kleineidam
f194e430bc TheThinHLine: fetch bigger images and name image files from sequence number. 2014-07-03 18:41:25 +02:00
Bastian Kleineidam
4845a4ccc1 Merge branch 'master' of github.com:wummel/dosage 2014-07-03 17:12:42 +02:00
Bastian Kleineidam
641daa738b Updated list of comics 2014-07-03 17:12:25 +02:00
Bastian Kleineidam
93fe5d5987 Minor useragent refactoring 2014-07-03 17:12:25 +02:00
Bastian Kleineidam
4c2a339e25 Fix some comics. 2014-07-02 19:51:53 +02:00
Luc Fouin
cb76198da7 added the thin H line, fixes #67 2014-07-02 17:14:33 +02:00
Luc Fouin
763f9b02a2 added the thin H line 2014-07-02 17:11:33 +02:00
Bastian Kleineidam
b03ba158ef Fixed LookingForGroup 2014-07-01 23:44:01 +02:00
Bastian Kleineidam
2170b5a7ad Updated votes
[ci skip]
2014-06-25 01:47:24 +02:00
Bastian Kleineidam
3485e2ac54 Added Whomp. 2014-06-24 20:48:49 +02:00
wummel
a0086bfcd8 Merge pull request #63 from sehrgut/master
Updated GirlGenius to new markup
2014-06-24 20:40:15 +02:00
Bastian Kleineidam
923b4d73d5 Merge branch 'master' of github.com:wummel/dosage 2014-06-23 22:46:47 +02:00
Bastian Kleineidam
fc6c54709f Remove freecode submit code. 2014-06-23 22:21:03 +02:00
Peter B
8f1c864ec3 Added Safely Endangered 2014-06-17 01:05:11 -04:00
Keith Beckman
236b840363 Updated GirlGenius to new markup
GG markup has changed, so I fixed the prevSearch regex to find the
"previous" button on the redesigned page.

As well, I set multipleImagesPerStrip to true, since there are quite a
few comics with multiple images that were being discarded.
2014-06-13 16:43:40 -04:00
Bastian Kleineidam
94090da813 Use Pypi download.
[ci skip]
2014-06-09 14:24:18 +02:00
Bastian Kleineidam
54a10568d9 Don't check since author_email is not set. 2014-06-09 13:53:18 +02:00
Bastian Kleineidam
52b8a0aef1 No need to rename dist file. 2014-06-09 13:50:42 +02:00
Bastian Kleineidam
68afeaf82d Make appname lowercase. 2014-06-09 13:24:58 +02:00
Bastian Kleineidam
531e612834 Updated webpage meta info
[ci skip]
2014-06-09 13:23:35 +02:00
Bastian Kleineidam
4a87741eff Correct distribution file name 2014-06-09 08:22:30 +02:00
Bastian Kleineidam
bb8021e3ea Use sdist to construct release. 2014-06-09 07:55:40 +02:00
Bastian Kleineidam
cc651b18ac Bump up version and upload to Pypi. 2014-06-08 21:58:14 +02:00
Bastian Kleineidam
c3f69cd6bb Updated changelog.
[ci skip]
2014-06-08 13:41:03 +02:00
Bastian Kleineidam
00e424aed0 Fix zenpencils. 2014-06-08 13:40:42 +02:00
Bastian Kleineidam
687d27d534 Stripping should be done in normaliseUrl. 2014-06-08 10:12:33 +02:00