Commit graph

1544 commits

Author SHA1 Message Date
Tobias Gruetzmacher
abb72a3a24 Fix CloneManga modules. 2017-02-13 23:41:45 +01:00
Tobias Gruetzmacher
ebbb27d05d Move xpath_class to helpers module. 2017-02-13 22:41:17 +01:00
Tobias Gruetzmacher
20ab279cde Clean up SmackJeeves...
Currently only covers already existing modules: Removed 11 broken
modules, added 2 and tried to update comic names and the adult and
endOfLife flags from their index. This isn't helped by the fact that
their search seems to skip some comics...
2017-02-13 01:46:49 +01:00
Tobias Gruetzmacher
83187b0554 Fix ViiviJaWagner. 2017-02-12 20:29:57 +01:00
Tobias Gruetzmacher
657e61811d Update list of old and removed modules. 2017-02-12 20:17:07 +01:00
Tobias Gruetzmacher
3b6af33ecb Some small module fixes. 2017-02-12 20:15:25 +01:00
Tobias Gruetzmacher
5359dd8629 Update ComicFury again... 2017-02-12 19:50:51 +01:00
Tobias Gruetzmacher
9895014655 Fix PHD with an ugly hack... 2017-02-12 16:21:36 +01:00
Tobias Gruetzmacher
b57945efd1 Update GoComic modules. 2017-02-12 12:21:01 +01:00
Tobias Gruetzmacher
ebe98bc8ba Fix some modules. 2017-02-12 02:16:38 +01:00
Tobias Gruetzmacher
20ca5d7fc2 Fix some modules. 2017-02-06 00:05:05 +01:00
gruetzkopf
edb49faa8b Add support for 'The Monster under the Bed' 2017-01-22 00:11:05 +01:00
Tobias Gruetzmacher
c4a184d173 Remove some vanished modules. 2017-01-12 02:01:10 +01:00
Tobias Gruetzmacher
36ac459bed Add removed GoComics modules to old list. 2017-01-12 01:22:13 +01:00
Tobias Gruetzmacher
a183e812ae Update GoComics module for new site layout.
(fixes #77)
2017-01-11 02:21:05 +01:00
Tobias Gruetzmacher
061efaac6e New module for ComicSherpa (removed from GoComics) 2017-01-11 01:34:52 +01:00
John Safrit
969e633877 Fix pattern for The Devils Panties 2017-01-08 17:39:59 -05:00
Tobias Gruetzmacher
3f9feec041 Allow modules to ignore some HTTP error codes.
This is neccessary since it seems some webservers out there are
misconfigured to deliver actual content with an HTTP error code...
2016-11-01 18:25:02 +01:00
Tobias Gruetzmacher
46b7a374f6 Small GoComics update. 2016-11-01 02:51:00 +01:00
Tobias Gruetzmacher
f7f4e130bf Small fix to the WLP module. 2016-11-01 02:27:29 +01:00
Tobias Gruetzmacher
bc755d09a3 Apply link modifier to all links.
This was previously only the "previous link modifier", now it can also
modify "next" and "latest" links. Additionally, the modifier is given
the current URL, so those cases can be distinguished.
2016-11-01 01:50:44 +01:00
Tobias Gruetzmacher
7fc05f75f5 Remove broken PetiteSymphony comics. 2016-10-31 07:16:10 +01:00
Tobias Gruetzmacher
69e6318f87 Remove ScurryAndCover, too much JavaScript. 2016-10-31 07:04:00 +01:00
Tobias Gruetzmacher
47e2502ec7 Fix a bunch of comic modules. 2016-10-31 06:57:47 +01:00
Tobias Gruetzmacher
446b81fc45 Fix Wumo and friends. 2016-10-30 15:28:54 +01:00
Tobias Gruetzmacher
51ed898f5d Fix some SmackJeeves comics. 2016-10-30 14:30:45 +01:00
Tobias Gruetzmacher
b6d99945f6 Merge pull request #73 from acaranta/master
Added several SmackJeeves Comics
2016-10-30 11:55:17 +01:00
Tobias Gruetzmacher
3b9f30affd Update ComicFury modules. 2016-10-30 11:04:45 +01:00
Tobias Gruetzmacher
a02660a7d3 Replace custom @memoized with stdlib @lru_cache. 2016-10-29 00:46:49 +02:00
Tobias Gruetzmacher
9a6a310b76 Fixup copyright years. 2016-10-29 00:21:41 +02:00
acaranta
83880a3cbd corrected RainbowMansion 2016-10-27 09:53:34 +02:00
acaranta
0ed823175c Added even more Smackjeeves comics 2016-10-27 06:58:57 +02:00
acaranta
a5c9a3c35c Added several SmackJeeves Comics 2016-10-26 05:25:13 +02:00
Peter Brunner
19445a83ae Fix smbc 2016-10-18 21:28:42 -04:00
Tobias Gruetzmacher
f94caa8a16 Use terminal size calculation from standard library. 2016-10-14 23:55:10 +02:00
Tobias Gruetzmacher
06be2a026b Move some ex-KeenSpot comics to shorter names. 2016-10-14 14:23:33 +02:00
Tobias Gruetzmacher
b17d6e5f22 Rework/fix KeenSpot modules. 2016-10-14 00:14:53 +02:00
Tobias Gruetzmacher
064e7976ec Add namer for Extra Fabulous Comics. 2016-10-06 00:42:50 +02:00
mostlyuseful
fce7dfff19 Add "Extra Fabulous Comics" comic 2016-10-04 17:06:50 +02:00
Tobias Gruetzmacher
f342a93aa1 Update GoComics module. 2016-10-01 03:39:36 +02:00
Tobias Gruetzmacher
c0d945a563 Update ComicFury modules. 2016-10-01 02:52:33 +02:00
Tobias Gruetzmacher
98c98ddfab Fix some more comic modules (c-f). 2016-09-30 00:15:45 +02:00
Tobias Gruetzmacher
b1d2650615 Fix some modules (a&b). 2016-09-29 01:29:01 +02:00
Damjan Košir
c04c62e92b xkcd now hone with xpaths 2016-08-18 21:28:25 +12:00
Damjan Košir
9ba184eb43 fixing LoadingArtist 2016-08-16 21:20:35 +12:00
Hubert Figuière
afcd19bf5b Added Prince of Sartar Comic 2016-08-08 09:18:33 -04:00
Hubert Figuière
81821dc450 Added Space Junk Arlia comic 2016-08-08 09:18:33 -04:00
Tobias Gruetzmacher
fb37f946e0 Speed up comic module tests.
This fakes an If-Modified-Since header, so most web servers don't need
to send comic images at all. This should also reduce the amount of data
that needs to be fetched for comic module tests.
2016-08-01 00:44:34 +02:00
Tobias Gruetzmacher
4f80016bf0 Change robotparser import to make PyInstaller happy. 2016-06-06 22:42:01 +02:00
Tobias Gruetzmacher
64c8e502ca Ignore case for comic download directories.
Since we already match comics case-insensitive on the command line, this
was a logical step, even if this means changing quite a bit of code that
all tries to resolve the "comic directory" in a slightly different
way...
2016-06-06 00:08:29 +02:00
Tobias Gruetzmacher
215d597573 Remove DrunkDuck for now.
- It's been disabled for ages
- Needs a major rework
- I don't want to add that many comics anyways...
- This also gets rid of make_scraper :)
2016-06-05 22:22:17 +02:00
Tobias Gruetzmacher
67d0d38100 Migrate SnafuComics to single-class module. 2016-06-05 22:12:16 +02:00
Tobias Gruetzmacher
125c96e9dc Remove command to download ALL comics... 2016-06-05 21:57:56 +02:00
Tobias Gruetzmacher
df2048cb34 Keep track of removed and moved comics (fixes #41).
I plan on keeping this list for at least ~ 2 releases and then purging
older entries...
2016-06-05 21:47:58 +02:00
Tobias Gruetzmacher
9b755a7e6c Restore BobWhite. 2016-06-05 18:32:27 +02:00
Tobias Gruetzmacher
603fd62a1e Fix workaround for PyInstaller... 2016-06-05 16:01:35 +02:00
Tobias Gruetzmacher
295b53a2d3 Fix name overrides (broken by 51008a). 2016-06-05 10:03:29 +02:00
Tobias Gruetzmacher
844bec09ba Remove another dead comic from ComicFury. 2016-06-05 01:06:04 +02:00
Tobias Gruetzmacher
12123961a4 Fix error in PyInstaller packaged application. 2016-06-05 00:34:16 +02:00
André-Patrick Bubel
2b8e948868 Add String Theory comic 2016-06-01 11:19:17 +00:00
André-Patrick Bubel
192751073c Add KillSixBillionDemons comic 2016-05-31 07:28:32 +00:00
Tobias Gruetzmacher
807bee6342 Migrate GoComics to single-class module. 2016-05-23 00:01:10 +02:00
Tobias Gruetzmacher
2c8e57bdea Migrate Creators to single-class module. 2016-05-22 23:56:59 +02:00
Tobias Gruetzmacher
f5dff27b0a Migrate SmackJeeves to single-class module. 2016-05-22 23:54:21 +02:00
Tobias Gruetzmacher
1ea20e1743 Migrate WebcomicFactory to single-class module. 2016-05-22 23:40:58 +02:00
Tobias Gruetzmacher
c62a7283a2 Migrate ComicFury to single-class module. 2016-05-22 23:31:53 +02:00
Tobias Gruetzmacher
1834bf179f Migrate Arcamax to single-class module. 2016-05-22 23:17:24 +02:00
Tobias Gruetzmacher
f29472c143 Make auto-update script more flexible. 2016-05-22 23:06:05 +02:00
Tobias Gruetzmacher
e4650d5941 Remove make_scraper from Nitrocosm. 2016-05-21 14:35:53 +02:00
Tobias Gruetzmacher
b6eb8ab8ef Remove make_scraper from SandraAndWoo 2016-05-21 14:12:11 +02:00
Tobias Gruetzmacher
4630ea047c Implement Oglaf's strange navigation (fixes #33)
(also should fix wummel#91)
2016-05-21 02:38:07 +02:00
Tobias Gruetzmacher
51008a975b Refactor: Introduce generator methods for scrapers
This allows one comic module class to generate multiple scrapers. This
change is to support a more dynamic module system as described in #42.
2016-05-21 01:29:36 +02:00
Tobias Gruetzmacher
89cfd9d310 Add comics from catomix.com. 2016-05-16 23:55:41 +02:00
Tobias Gruetzmacher
a6cf4e7040 Fix some more comic modules. 2016-05-16 23:16:29 +02:00
Tobias Gruetzmacher
be1a63da0c Update GoComics comic list. 2016-05-16 18:26:45 +02:00
Tobias Gruetzmacher
6d3f74142c Move command line tool into package.
This way we can use the default Python console_scripts install process.
2016-05-16 14:57:47 +02:00
Tobias Gruetzmacher
b9d9564085 Fix Dilbert (fixes #44). 2016-05-16 01:21:23 +02:00
Tobias Gruetzmacher
e9b3c487c0 Remove some dead comics. 2016-05-16 01:10:20 +02:00
Tobias Gruetzmacher
bd60155d9f Some more ComicFury comics gone... 2016-05-16 00:53:22 +02:00
Tobias Gruetzmacher
849e60e795 Remove make_scraper magic from webcomiceu. 2016-05-07 03:20:01 +02:00
Tobias Gruetzmacher
975d2376bf Another round of comic module fixes. 2016-05-07 01:50:10 +02:00
Tobias Gruetzmacher
efe1308db2 Replace home-grown Python2/3 compat. with six. 2016-05-05 23:33:48 +02:00
Tobias Gruetzmacher
77ed0218e0 Fix some comic modules. 2016-05-05 20:55:14 +02:00
Tobias Gruetzmacher
bb2ac39639 Fix some URLs. 2016-05-05 10:12:03 +02:00
Tobias Gruetzmacher
d05316e3ac Seems ComicFury is deleting comics regularly...
Well, there's nothing we can do: Remove them.
2016-05-04 08:26:53 +02:00
Tobias Gruetzmacher
0c1aa9e8bd Move libxml < 2.9.3 workaround to base class. 2016-05-02 23:22:06 +02:00
Tobias Gruetzmacher
b93a8fde65 Move PensAndTales comics and fix them. 2016-05-02 22:32:14 +02:00
Tobias Gruetzmacher
4006ced43d Move all HijinksEnsue comics into alphabetic files. 2016-05-02 01:25:34 +02:00
Tobias Gruetzmacher
d5f91ecfd2 Fix some modules in m.py. 2016-04-30 01:59:28 +02:00
Tobias Gruetzmacher
1d52d33311 Remove missing SmackJeeves comics. 2016-04-30 00:56:20 +02:00
Tobias Gruetzmacher
d796f3476c Fix some modules in d.py. 2016-04-30 00:44:18 +02:00
Tobias Gruetzmacher
cc16fea880 Fix some modules in c.py 2016-04-29 00:35:02 +02:00
Tobias Gruetzmacher
1d94439715 Fix some more comic modules. 2016-04-27 00:31:27 +02:00
Tobias Gruetzmacher
8b1ac4eb35 Fix "tagsoup" on SmackJeeves
Unfortunatly, browsers render < outside of HTML tags differently then
libXML until recently (libXML 2.9.3), so we need to preprocess pages
before parsing them...

(This was fixed in libXML commit 140c25)
2016-04-26 08:05:38 +02:00
Tobias Gruetzmacher
035d6e94e4 Allow output level for warnings and errors. 2016-04-26 07:53:53 +02:00
Tobias Gruetzmacher
8ddf553eb4 Fix some more SmackJeeves modules. 2016-04-22 01:04:47 +02:00
Tobias Gruetzmacher
fd85c8583a Unify similar code in fetchUrl and fetchText 2016-04-22 00:42:46 +02:00
Tobias Gruetzmacher
6574997e01 Refactor: All the other class methods.
Turns out, it would have been better if all methods had been instance
methods and not class methods. This finished a big chunk of the rework
needed for #42.
2016-04-21 23:52:31 +02:00
Tobias Gruetzmacher
0d436b8ca9 Refactor: url modifiers to normal methods.
As before, to implement #42 these might want to access information from
the instance, so they should be normal methods.
2016-04-21 21:39:25 +02:00
Tobias Gruetzmacher
c3f32dfef7 Refactor: Make namer a method.
When #42 is realized, the naming of files might differ between comic
modules, so the namer's logical location is the instance, not the class.
2016-04-21 08:20:49 +02:00
Tobias Gruetzmacher
5bd2a49f48 Add debug output on matched XPath/CSS expression. 2016-04-20 23:51:54 +02:00
Tobias Gruetzmacher
fe51a449df Update SmackJeeves
- Now uses _ParserScraper, which makes the pattern quite a bit more
  generic and IMHO more readable
- remove make_scraper magic
- No new comics, only fixed existing ones and removed some dead ones.
2016-04-20 23:36:45 +02:00
Tobias Gruetzmacher
190cd3b063 Convert language & getDisabledReasons to methods.
Both are more properties of a webcomic (this is part of the design
changes for #42)
2016-04-19 23:53:46 +02:00
Tobias Gruetzmacher
df46907f39 Register EXSLT extensions by default.
This allows comic module authors to use the full power of regular
expressions in XPath expression, see http://exslt.org/regexp/regexp.html
for usage. Please be aware that these use the prefix re: instead of
regexp: here.
2016-04-19 23:48:14 +02:00
Tobias Gruetzmacher
4204f5f1e4 Send "If-Modified-Since" header for images. 2016-04-19 00:36:50 +02:00
Tobias Gruetzmacher
13a3409854 Remove some comics that are gone or block us. 2016-04-17 19:42:43 +02:00
Tobias Gruetzmacher
1fbc844077 Update GoComics. 2016-04-17 18:40:09 +02:00
Tobias Gruetzmacher
73e958670d Update ComicFury (again). 2016-04-17 16:19:44 +02:00
Tobias Gruetzmacher
b0481a01f7 Update languages. 2016-04-16 13:14:12 +02:00
Tobias Gruetzmacher
3329027e4b Update ComicFury. 2016-04-16 13:13:47 +02:00
Tobias Gruetzmacher
ee99c087d7 Remove prevUrlMatchesStripUrl.
It was only used for one test.
2016-04-16 01:14:26 +02:00
Tobias Gruetzmacher
92a688457a Remove useless indirection. 2016-04-15 23:42:24 +02:00
Tobias Gruetzmacher
52515b5fc5 Update GoComics. 2016-04-15 00:26:14 +02:00
Tobias Gruetzmacher
031a523846 Fix SnafuComics. 2016-04-14 23:52:35 +02:00
Tobias Gruetzmacher
7626b1e100 Webcomics Nation is gone. 2016-04-14 22:46:52 +02:00
Tobias Gruetzmacher
497653c448 Remove make_scraper magic from Arcamax. 2016-04-14 00:17:59 +02:00
Tobias Gruetzmacher
db87ed95e7 Use new features to make modules simpler. 2016-04-13 23:28:43 +02:00
Tobias Gruetzmacher
b266e28ae1 Remove debugging prints 😭 2016-04-13 22:59:06 +02:00
Tobias Gruetzmacher
ff3b824311 Fix variable shadowing... 2016-04-13 22:43:34 +02:00
Tobias Gruetzmacher
060281e5ff Use concrete scraper objects everywhere.
This is a first step for #42. Since most access to the scraper classes
is through instances, modules can now dynamically override url and name
(name is now a property).
2016-04-13 22:17:30 +02:00
Tobias Gruetzmacher
0468f2f31a Refactor: Convert starter to simple method. 2016-04-13 20:01:51 +02:00
Tobias Gruetzmacher
16004e43e4 Use default bounceStarter for site modules. 2016-04-13 01:24:13 +02:00
Tobias Gruetzmacher
9028724a74 Clean up update helper scripts. 2016-04-13 00:52:16 +02:00
Tobias Gruetzmacher
42e43fa4e6 Read starter parameters from class.
This allows to specify starters in a more declarative and dynamic way.
2016-04-12 23:11:39 +02:00
Tobias Gruetzmacher
b865a171f9 Remove some broken comics. 2016-04-12 08:21:06 +02:00
Tobias Gruetzmacher
4e2e4ac529 Prevent scraper from moving to a different comic. 2016-04-12 08:10:47 +02:00
Tobias Gruetzmacher
443ab119e9 Refresh GoComics list from online directory. 2016-04-12 00:36:33 +02:00
Tobias Gruetzmacher
0e385a3697 Update GoComics (no change in supported comics)
- remove make_scraper magic
- switch to _ParserScraper
2016-04-11 22:42:01 +02:00
Tobias Gruetzmacher
ad7a297964 Fix WLP comics. 2016-04-11 01:07:21 +02:00
Damjan Košir
af2e57d850 Added comic ScurryAndCover...
- Yay, funky JavaScript parsing!
- Start page isn't latest comic...

Updated-by: Tobias Gruetzmacher <tobias-git@23.gs>
2016-04-11 00:09:53 +02:00
Tobias Gruetzmacher
fa98f6ddbf Move more comics to common WordPressScraper. 2016-04-10 23:04:34 +02:00
Tobias Gruetzmacher
f6e605e146 Fix unicode error in text search. 2016-04-10 13:16:30 +02:00
Tobias Gruetzmacher
bc10bd9a4d Streamline color output.
- Depend on external colorama instead of embedding an old copy.
- Move most output code into output module.
- Convert pager to context manager.
2016-04-10 03:45:00 +02:00
Tobias Gruetzmacher
bb5b6ffcec Fix comics in module a.py. 2016-04-07 23:21:31 +02:00
Tobias Gruetzmacher
0033a8046b Fix creators module. 2016-04-07 00:20:03 +02:00
Tobias Gruetzmacher
8768ff07b6 Fix AhoiPolloi, be a bit smarter about encoding.
HTML character encoding in the context of HTTP is quite tricky to get
right and honestly, I'm not sure if I did get it right this time. But I
think, the current behaviour matches best what web browsers try to do:

1. Let Requests figure out the content from the HTTP header. This
   overrides everything else. We need to "trick" LXML to accept our
   decision if the document contains an XML declaration which might
   disagree with the HTTP header.
2. If the HTTP headers don't specify any encoding, let LXML guess the
   encoding and be done with it.
2016-04-06 22:22:22 +02:00
Tobias Gruetzmacher
183d18e7bc Skip non-image on xkcd. 2016-04-06 00:50:01 +02:00
Tobias Gruetzmacher
9feaf245f2 Fixed & removed some comics in s.py. 2016-04-06 00:40:13 +02:00
Tobias Gruetzmacher
6bbdcfb341 BloomingFaeries: Don't download every page twice.
(Also, simplify namer, switch to _ParserScraper)
2016-04-05 23:58:43 +02:00
Tobias Gruetzmacher
8db6f8e8b7 Fix ZapComics, remove ZebraGirl.
- ZebraGirl is now ComicFury/ZebraGirl...
2016-04-04 00:27:11 +02:00
Tobias Gruetzmacher
0bcfb8a82e Move ComicControl into common module.
- Move all comics using ComicControl into alphabetical files.
- Add BalderDash & Picklewhistle
2016-04-04 00:12:53 +02:00
Tobias Gruetzmacher
0d453a6858 Move Flowerlark Studios into alphabetical files. 2016-04-03 22:58:01 +02:00
Tobias Gruetzmacher
a9f0dfdce4 Merge pull request #39 from peterjanes/peterjanes/sherman-fix
Fix Sherman's Lagoon
2016-04-03 22:20:04 +02:00
Tobias Gruetzmacher
926439cd14 Every comic need an url. 2016-04-03 22:03:16 +02:00
Tobias Gruetzmacher
2c6decb7f5 Move WebcomicFactory in its own module.
Also, add an updater script for it.
2016-04-03 21:31:56 +02:00
Peter Janes
759bd0c360 Fix Sherman's Lagoon 2016-04-03 14:54:41 -04:00
Tobias Gruetzmacher
bb1f20d867 Remove make_scraper for most WordPress comics.
- Dropped KatzenfutterGeleespritzer, because robots.txt.
- Move all WordPress/ComicPress scrapers into alphabetical files.
- Move _WordPressScraper & _ComicPress scraper into common.py.
- Some smaller PEP8 fixes.
2016-04-02 00:19:53 +02:00
Tobias Gruetzmacher
7f1e136d8b Sort comics alphabetically & PEP8 style fixes. 2016-03-31 23:13:54 +02:00
Tobias Gruetzmacher
d6db1d0b81 Fix a conflict with IPython. 2016-03-20 23:57:07 +01:00
Tobias Gruetzmacher
90dfceaeb1 Remove dead modules (& format). 2016-03-20 20:48:42 +01:00
Tobias Gruetzmacher
f243096d49 Fix GastroPhobia, remove GeneralProtectionFault.
(& formatting)
2016-03-20 20:11:21 +01:00
Tobias Gruetzmacher
cfcfcc2468 Switch plugin loading to pkgutil.
This should work with all PEP-302 loaders that implement iter_modules.
Unfortunatly, PyInstaller (which I plan to use for Windows releases)
does not support it, so we don't get around a special case. Anyways,
this should help for #22.
2016-03-20 15:13:24 +01:00
Tobias Gruetzmacher
1af022895e Fix NuklearPower (fixes #38).
Also remove make_scraper magic.
2016-03-17 23:19:52 +01:00
Tobias Gruetzmacher
552f29e5fc Update ComicFury comics. (+871, -245)
- Remove make_scraper magic
- Switch to HTML parser
- Update parsing of comic listing.
2016-03-17 00:44:06 +01:00
Tobias Gruetzmacher
6727e9b559 Use vendored urllib3.
As long as requests ships with urllib3, we can't fall back to the
"system" urllib3, since that breaks class-identity checks.
2016-03-16 23:18:19 +01:00
Damjan Košir
615f094ef3 fixing EdmundFinney 2016-03-14 20:32:18 +13:00
Tobias Gruetzmacher
c4fcd985dd Let urllib3 handle all retries. 2016-03-13 21:30:36 +01:00
Tobias Gruetzmacher
78e13962f9 Sort scraper modules (mostly for test stability). 2016-03-13 20:24:21 +01:00
Tobias Gruetzmacher
017d35cb3c Fallback version if pkg_resources not available.
This helps for Windows packaging.
2016-03-03 01:05:36 +01:00
Johannes Schöpp
351fa7154e Modified maximum page size
Fixes #36
2016-03-01 22:19:44 +01:00
Damjan Košir
b0dc510b08 adding LastNerdsOnEarth 2016-01-03 14:16:58 +13:00
Damjan Košir
a1e79cbbf2 fixing Fragile 2016-01-03 14:08:49 +13:00
Tobias Gruetzmacher
81827f83bc Use GitHub releases API for update checks. 2015-11-06 23:07:19 +01:00
Tobias Gruetzmacher
a41574e31a Make version fetching a bit more robust (use pbr). 2015-11-06 22:08:14 +01:00
Tobias Gruetzmacher
64f7e313d5 Remove make_scraper magic from footloosecomic.py. 2015-11-05 00:03:13 +01:00
Tobias Gruetzmacher
7f7a69818b Remove make_scraper magic from creators module. 2015-11-04 23:43:31 +01:00
Tobias Gruetzmacher
94470d564c Fix import for Python 3. 2015-11-03 23:40:45 +01:00
Tobias Gruetzmacher
b819afec39 Switch build to PBR.
This gets us:
- Automatic changelog
- Automatic authors list
- Automatic git version management
2015-11-03 23:27:53 +01:00
Tobias Gruetzmacher
dc22d7b32a Add CatNine comic. 2015-11-02 23:29:56 +01:00
Tobias Gruetzmacher
10d9eac574 Remove support for very old versions of "requests". 2015-11-02 23:24:01 +01:00
MariusK
3e1ea816cc Fixed 'Ruthe' 2015-10-02 13:52:44 +02:00
Helge Stasch
48d8519efd Changed Goblins comic - moved to new scraper and fixed minor issues with some comics (old scrapper was unstable for some comics of Goblins) 2015-09-28 23:50:15 +02:00
Helge Stasch
17fbdf2bf7 Added comic "Ahoy Earth" 2015-09-27 00:44:47 +02:00
Tobias Gruetzmacher
d72ceb92d5 BloomingFaeries: Remove imageUrlModifier (not needed). 2015-09-04 00:37:05 +02:00
Tobias Gruetzmacher
abd80a1d35 Merge pull request #28 from KevinAnthony/master
added comic Blooming Faeries
2015-09-03 23:26:37 +02:00
Tobias Gruetzmacher
b737218182 ZenPencils: Allow multiple images per page. 2015-09-03 23:24:28 +02:00
Kevin Anthony
62ec1f1d18 Removed debugging print state 2015-09-02 11:22:24 -04:00
Kevin Anthony
d7180eaf99 removed bad whitespace 2015-09-02 11:04:32 -04:00
Kevin Anthony
6e8231e78a Added Namer to BloomingFaeries since the web comic author doesn't seem intrested in sticking to any kind of file naming convention 2015-09-02 11:01:48 -04:00
Kevin Anthony
1045bb7d4a added comic Blooming Faeries 2015-09-02 10:13:42 -04:00
Damjan Košir
11f0aa3989 created Wordpress Scraper class 2015-08-11 21:31:45 +12:00
Damjan Košir
0a5b792c32 added Fragile (English and Spanish) 2015-08-07 23:37:10 +12:00
Damjan Košir
fd9c480d9c adding bonus panel to SWBC and multiple images flag to ParserScraper 2015-08-03 22:58:44 +12:00
Damjan Košir
f8a163a361 added a CMS ComicControl, moved some existing comics there, added StreetFighter and Metacarpolis 2015-08-03 22:40:06 +12:00
Damjan Košir
648a84e38e added Sharksplode 2015-08-03 22:20:17 +12:00
Damjan Košir
c19806b681 added AoiHouse 2015-07-31 23:33:30 +12:00
Damjan Košir
2201c9877a added KiwiBlitz 2015-07-31 23:09:56 +12:00
Damjan Košir
fe22df5e5b added LetsSpeakEnglish 2015-07-31 23:06:06 +12:00
Damjan Košir
79ec427fc0 added CatVersusHuman 2015-07-30 22:16:34 +12:00
Tobias Gruetzmacher
303432fc68 Also use css expressions for textSearch. 2015-07-18 01:22:40 +02:00
Tobias Gruetzmacher
6a70bf4671 Enable some comics based on current policy. 2015-07-18 01:21:29 +02:00
Tobias Gruetzmacher
6b0046f9b3 Fix small typos. 2015-07-18 00:11:44 +02:00
Tobias Gruetzmacher
68d4dd463a Revert robots.txt handling.
This brings us back to only honouring robots.txt on page downloads, not
on image downloads.

Rationale: Dosage is not a "robot" in the classical sense. It's not
designed to spider huge amounts of web sites in search for some content
to index, it's only intended to help users keep a personal archive of
comics he is interested in. We try very hard to never download any image
twice. This fixes #24.

(Precedent for this rationale: Google Feedfetcher:
https://support.google.com/webmasters/answer/178852?hl=en#robots)
2015-07-17 20:46:56 +02:00
Tobias Gruetzmacher
7d3bd15c2f Remove AbleAndBaker, site is gone. 2015-07-16 00:49:48 +02:00
Tobias Gruetzmacher
472afa24d3 GoComics doesn't allow spiders, disable them...
This removes 757 comics, including quite popular ones like Calvin and
Hobbes, Garfield, FoxTrot, etc. :(
2015-07-16 00:36:10 +02:00
Tobias Gruetzmacher
7c15ea50d8 Also check robots.txt on image downloads.
We DO want to honour if images are blocked by robots.txt
2015-07-15 23:50:57 +02:00
Tobias Gruetzmacher
5affd8af68 More relaxed robots.txt handling.
This is in line with how Perl's LWP::RobotUA and Google handles server
errors when fetching robots.txt: Just assume access is allowed.

See https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
2015-07-15 19:11:55 +02:00
Tobias Gruetzmacher
88e387ad15 Add Sleepless Domain. 2015-07-12 18:31:21 +02:00
Tobias Gruetzmacher
0b6d7425e1 Remove BladeKitten.
It's not available online anymore, only in print or as a PDF download.
2015-07-11 01:29:21 +02:00
Tobias Gruetzmacher
808b624e5f Remove hard dependency on pycountry again.
This basically reverts commit 86b31dc12b.

It now works like this: If the use has pycountry installed, it is used.
If not, Dosage falls back to a small internal list generated from
pycountry by scripts/mklanguages.py.

This means additional work if we ever decide to translate Dosage, since
pycountry already has all the translations for language names...

This fixes #23.
2015-07-11 01:27:39 +02:00
Tobias Gruetzmacher
d97a9c63e4 Add Erstwhile. 2015-07-10 01:14:56 +02:00
Damjan Košir
7abca1222b added NerfNow 2015-07-07 22:18:06 +12:00
Damjan Košir
119a3cd13a added text to ScandinaviaAndTheWorld 2015-07-07 19:48:25 +12:00
Damjan Košir
5f243e3868 not a comic 2015-07-05 18:33:14 +12:00
Damjan Košir
5e7ad33fc8 Nnewts disabled 2015-07-05 18:32:33 +12:00
Damjan Košir
45012ff9c3 BladeKitten disabled 2015-07-05 18:31:38 +12:00
Tobias Gruetzmacher
0c6feec8cd Fix module name EastCoastVsWestCoast. 2015-06-24 00:51:42 +02:00
Damjan Košir
96572e8cba added TheMelvinChronicles 2015-06-12 21:00:11 +12:00
Damjan Košir
6412e6e542 fixed Spinnerette 2015-06-08 20:31:13 +12:00
Damjan Košir
3d8a49d228 realised TheWebcomicFactory is actually 28 comics... added them 2015-06-07 21:33:59 +12:00
Damjan Košir
05bb22b3ef added TheWebcomicFactory 2015-06-06 14:25:32 +12:00
Damjan Košir
c98800388e added Sithrah 2015-06-04 19:24:55 +12:00
Damjan Košir
010b4bf669 renaming comicpress to wordpress (as it's not just for the comicpress theme) 2015-06-04 19:12:40 +12:00
Damjan Košir
bc91f5f1fb added MistyTheMouse 2015-06-04 19:06:40 +12:00
Damjan Košir
e2d01e4924 fixed ScandinaviaAndTheWorld 2015-06-04 18:58:59 +12:00
Damjan Košir
545a67111e fixed Alice 2015-06-01 15:15:34 +12:00
Damjan Košir
a08ad2dc80 fixed GoGetARoomie 2015-06-01 15:11:16 +12:00
Damjan Košir
ceb19ed2bc fixed Wulffmorgenthaler (now Wumo), added TruthFacts and MeAndDanielle 2015-06-01 12:14:52 +12:00
Damjan Košir
4cd88ecdc0 fixed WormWorldSaga 2015-06-01 11:45:22 +12:00
Damjan Košir
ea6cb925a6 fixed LoadingArtist 2015-06-01 11:33:50 +12:00
Damjan Košir
e268b09567 fixed EarthsongSaga 2015-06-01 11:19:02 +12:00
Damjan Košir
29c8d2eea0 fixed Meek 2015-05-31 23:41:12 +12:00
Damjan Košir
9be6f613e4 fixed MysteriesOfTheArcana 2015-05-31 23:39:04 +12:00
Damjan Košir
3ea8236224 fixed FowlLanguage 2015-05-31 23:29:34 +12:00
Damjan Košir
c1245a85ad moved Footloose, added Cherry, Desigaspring 2015-05-31 23:23:02 +12:00
Damjan Košir
01aeebfbe4 fixed Footloose 2015-05-31 23:16:12 +12:00
Damjan Košir
029fa74067 fixed Bardsworth 2015-05-31 23:03:40 +12:00
Damjan Košir
f3036de8fd fixed Pimpette 2015-05-31 22:57:25 +12:00
Damjan Košir
df7404fd7c fixed CatsAndCameras 2015-05-31 22:50:17 +12:00
Damjan Košir
d4cc8ac857 added buni 2015-05-27 20:36:11 +12:00
Damjan Košir
9beeceffad added BusinessCat and HappyJar 2015-05-27 20:34:51 +12:00
Damjan Košir
d970d27b14 removing duplicate 2015-05-27 00:10:46 +12:00
Damjan Košir
33abd95348 fixed TheGentlemansArmchair 2015-05-26 23:48:22 +12:00
Damjan Košir
5e123ae79e fixed DarkWings (now available under the real name Eryl as well), added Ashes, Laiyu, NoMoreSavePoints and EasilyAmused 2015-05-26 23:43:15 +12:00
Damjan Košir
9adb020fc2 fixed DemolitionSquad 2015-05-26 22:59:25 +12:00
Damjan Košir
605c5f8619 fixed PokeyThePenguin 2015-05-26 22:31:43 +12:00
Damjan Košir
766b7ba99d fixed ProperBarn, added 2214 and OTE 2015-05-26 22:16:55 +12:00
Damjan Košir
2c41435ceb fixing HijiNKS ENSUE and added all 4 comics on that page 2015-05-26 22:06:55 +12:00
Damjan Košir
465e7eaf6f fixing CowboyJedi kinda... there is currently no comic on the front page and the author knows it 2015-05-26 21:35:36 +12:00
Damjan Košir
529a41397a fixing CorydonCafe 2015-05-26 21:32:25 +12:00
Damjan Košir
c3abb93e99 fixing ChainsawSuit 2015-05-26 19:53:04 +12:00
Damjan Košir
f8690af029 fixing Curvy 2015-05-26 19:47:31 +12:00
Damjan Košir
36c790fa4b fixing CraftedFables 2015-05-26 19:32:12 +12:00
Damjan Košir
7067c51056 fixed CheckerboardNightmare 2015-05-25 22:19:36 +12:00
Damjan Košir
5569439c43 fixed 16 comics 2015-05-25 21:57:06 +12:00
Damjan Košir
3edaa97fb9 fixing KatzenfutterGeleespritzer 2015-05-25 20:06:58 +12:00
Damjan Košir
8a245e1d10 fixing BloodBound 2015-05-21 00:04:07 +12:00
Damjan Košir
dc2349951a moving BroodHollow to comicpress 2015-05-21 00:00:35 +12:00
Damjan Košir
a05ae9c75d fixing PandyLand 2015-05-20 23:56:49 +12:00
Damjan Košir
fd60065591 fixing OnTheEdge 2015-05-20 23:50:18 +12:00
Damjan Košir
80b783c016 fixing CourtingDisaster 2015-05-20 23:16:54 +12:00
Damjan Košir
ff239ff58e Merge branch 'comicpress' 2015-05-20 23:12:03 +12:00
Damjan Košir
77c5dbce9b better prevSearch for comic press 2015-05-20 23:08:02 +12:00
Damjan Košir
bc4e7a03f2 fixed BroodHollow 2015-05-20 23:03:15 +12:00
Damjan Košir
8de620c78b fixed CigarroAndCerveja 2015-05-20 22:58:13 +12:00
Damjan Košir
4529fdee3b adding no downsize option 2015-05-20 22:38:29 +12:00
Damjan Košir
77a9cce00d fixing Hipsters 2015-05-19 19:49:45 +12:00
Damjan Košir
79d775a8d9 adding comicpress scraper 2015-05-16 00:15:32 +12:00
Damjan Košir
962286d391 fixed OctopusPie 2015-05-14 23:06:12 +12:00
Damjan Košir
3bbf2d5c23 fixing neko the kitty 2015-05-14 22:42:04 +12:00
Damjan Košir
f75fc62e84 fixing pebbleversion 2015-05-14 22:33:46 +12:00
Helge Stasch
5a1ef9b791 Fixed problem with LookingForGroup comic 2015-05-07 13:57:10 +02:00
Damjan Košir
9a009018c7 adding strip Moonsticks 2015-05-07 23:00:55 +12:00
Helge Stasch
64a875388f Added Comic MaxOveracts 2015-05-04 14:06:01 +02:00
Marc Winkelmann
69e5b8ad93 Shermans Lagoon and On The Fastrack working again. Also corrected name. 2015-05-02 22:27:08 +02:00
DirkReiners
1438330a94 Fixes and Additions...
Fixed SabrinaOnline
Fixed SMBC
Added StandStillStaySilent (partial, prevsearch not working yet)
2015-04-29 10:37:14 -05:00
DirkReiners
749beff7a3 Added MareInternum (marecomic.com) 2015-04-29 10:36:12 -05:00
DirkReiners
273b429fcd Merge branch 'master' of https://github.com/webcomics/dosage 2015-04-29 09:51:47 -05:00
Damjan Košir
391313972c fixed ManlyGuysDoingManlyThings 2015-04-26 23:47:38 +12:00
Damjan Košir
9837a87a43 fixed omake teather 2015-04-26 23:32:22 +12:00
Damjan Košir
8df9d20556 added doctor cat 2015-04-26 22:32:52 +12:00
Damjan Košir
dc427d6066 fixed the gamercat 2015-04-26 21:52:31 +12:00
Damjan Košir
561005887a unneeded max 2015-04-26 00:23:45 +12:00
Damjan Košir
ac7b0d7e0e adding parallel run option 2015-04-26 00:19:08 +12:00
Damjan Košir
1e94a3c7c5 now the same as offical version 2015-04-25 20:52:03 +12:00
Damjan Košir
dae2698102 removing mismerge 2015-04-25 20:40:28 +12:00
Damjan Košir
dc014a7cb4 Merge remote-tracking branch 'upstream/master'
Conflicts:
	dosagelib/plugins/e.py
	dosagelib/plugins/i.py
	dosagelib/plugins/n.py
	dosagelib/plugins/s.py
	dosagelib/plugins/t.py
	dosagelib/plugins/w.py
2015-04-25 20:28:27 +12:00
DirkReiners
b8ef6958b9 Merge branch 'master' of https://github.com/webcomics/dosage 2015-04-24 15:38:36 -05:00
Helge Stasch
4cdd92dcd7 Added comic Magellan 2015-04-23 09:12:24 +02:00
Tobias Gruetzmacher
9f33c31c68 Merge pull request #12 from Freestila/master
Changed comic name, since comic is named FowlLanguage instead of FoulLan...

Conflicts:
	dosagelib/plugins/f.py
2015-04-22 22:24:26 +02:00
Tobias Gruetzmacher
bf9f45b380 Switch to setuptools and cleanup metadata.
py2exe support is gone for now, will be restored later.
2015-04-22 22:22:03 +02:00
Helge Stasch
8218e805b2 Changed comic name, since comic is named FowlLanguage instead of FoulLanguage 2015-04-22 21:25:10 +02:00
Tobias Gruetzmacher
bf9bf5e9b0 Merge pull request #11 from Freestila/master
Added "Ralf the Destroyer"
2015-04-21 23:46:11 +02:00
Tobias Gruetzmacher
86b31dc12b Depend on pycountry directly. 2015-04-21 21:56:54 +02:00
Helge Stasch
d7e9c8eb94 Added "Ralf the Destroyer" 2015-04-21 19:12:40 +02:00
Tobias Gruetzmacher
d5e7690419 Fix size comparison for RSS & HTML output.
This was always broken, but somehow worked with Python 2.7 (WTF?). Now
that we test with Pillow, this code path runs with Python 3 and throws
an error.
2015-04-21 00:01:23 +02:00
Tobias Gruetzmacher
ff21df596b Remove descriptions and genres (closes #9).
Maintaining the descriptions creates quite a bit of overhead (finding
them, copying them, checking if they are still correct) for a minimal
user benefit.

PS: Viewing this diff should be easier in a difftool that shows changes
in a line, for example kdiff3.
2015-04-20 20:29:09 +02:00
Tobias Gruetzmacher
3b33129e58 Fix ViiviJaWagner. 2015-04-18 22:45:13 +02:00
Tobias Gruetzmacher
e8af5adcb8 Update list of supported GoComics comics. 2015-04-18 02:04:31 +02:00
Tobias Gruetzmacher
f0831a1f0f Fix and update ArcaMax (fixes #8). 2015-04-17 21:53:13 +02:00
DirkReiners
99f33151e2 Merge branch 'master' of https://github.com/webcomics/dosage 2015-04-16 18:36:42 -05:00
DirkReiners
8f3a9f660a Fixed ASofterWorld 2015-04-16 18:35:21 -05:00
DirkReiners
49b964cb3c Added PS238 2015-04-16 18:20:14 -05:00
Manabi
65c021ef2b Fixed IAmArg 2015-04-15 14:43:06 -04:00
Manabi
475739ea60 Fixing DogHouseDiaries 2015-04-15 12:56:03 -04:00
Manabi
c0619e8dca Fixing DogHouseDiaries 2015-04-15 12:51:45 -04:00
Manabi
2b98a9023e Added Peanuts Begins & Wizard of Id Classics 2015-04-13 22:26:12 -04:00
Tobias Gruetzmacher
974752951b Fix xkcd (closes #3), remove adult tag (fixes wummel#85). 2015-04-12 20:06:34 +02:00
Tobias Gruetzmacher
5934f03453 Merge branch 'htmlparser' - I think it's ready.
This closes pull request #70.
2015-04-01 22:13:55 +02:00
Tobias Gruetzmacher
614c25e278 Fix coding style. 2015-03-22 17:13:53 +01:00
Tobias Gruetzmacher
e94e2ae432 Merge pull request #95 from serenitas50/master
Added comic Beetlebum (http://blog.beetlebum.de/).
2015-03-22 17:04:36 +01:00
Tobias Gruetzmacher
b5ed4c56b6 Merge pull request #94 from Manabi/master
Added definition for Drive comic

Conflicts:
	dosagelib/plugins/g.py
2015-03-22 16:34:07 +01:00
Tobias Gruetzmacher
b5368b366a Merge Gaia(German), SandraAndWoo(German) into common base.
This also fixes #97 by correcting the imageSearch regex.
2015-02-04 19:41:52 +01:00
Manabi
f85464ccb2 Fixed unclosed ' error
Lines 293/294 should have been one line, this is now fixed.
2015-02-02 04:35:49 -05:00
Manabi
190f53ee4d Fixing name of GunnkriggCourt
Existing name was missing a g.
2015-02-02 04:24:32 -05:00
Serenitas50
94004846cd Added comic Beetlebum (http://blog.beetlebum.de/). 2015-01-31 22:07:35 -02:00
Manabi
a5b0d0c5de Added definition for Drive comic 2015-01-26 04:21:24 -05:00
Dirk Reiners
b710d3fa81 Merge branch 'master' of https://github.com/wummel/dosage 2015-01-16 13:24:48 -06:00
Dirk Reiners
c6f0dd6117 PiledHigherAndDeeper: Fix for new website format 2015-01-16 12:06:17 -06:00
Dirk Reiners
e25270c866 Dilbert: Fix for new websitre format 2015-01-16 12:05:53 -06:00
Dirk Reiners
3724eba835 Cyanide And Happiness: Fix for new website format 2015-01-16 12:05:36 -06:00
Tobias Gruetzmacher
f8531eca57 Move SinFest back to KeenSpot namespace. 2015-01-16 00:16:28 +01:00
Tobias Gruetzmacher
4733153d01 Merge pull request #87 from rpglover64/master
Update SinFest to work with new website.
2015-01-16 00:15:04 +01:00
Alex Rozenshteyn
a0506b22f0 Update ZenPencils URL. 2014-12-16 13:51:52 -05:00
Alex Rozenshteyn
51996e45ed Update SinFest to work with new website. 2014-12-16 12:01:54 -05:00
Tobias Gruetzmacher
2c1ff889fa Fix scope in HTML output. 2014-12-10 00:57:17 +01:00
Tobias Gruetzmacher
b7bc16650a Merge branch 'carlosefonseca/master' 2014-12-10 00:07:21 +01:00
Tobias Gruetzmacher
5af4f45505 Merge branch 'zac9/patch-2' 2014-12-10 00:03:08 +01:00
Tobias Gruetzmacher
32265c99d7 Merge branch 'zac9/patch-1' 2014-12-10 00:00:51 +01:00
Carlos Fonseca
04cc07a466 Added comic Nimona 2014-12-08 13:28:37 +00:00
mbrandis
25cf4888ae - Adapted ShermansLagoon
- Better version of OnTheFastTrack
2014-11-14 20:37:06 +01:00
mbrandis
c63f927e5c - Modified OnTheFasttrack adapting the new API. 2014-11-14 20:09:42 +01:00
mbrandis
cd48801b0d - Added next and previous day at end of page. 2014-11-14 15:39:42 +01:00
Dirk Reiners
fda654b5e0 Some fixes...
AbstruseGoose: fixed prev
Carciphona: fixed latest
Curtailed: fixed image and prev (moved to WP)
DorkTower: fixed image search
GrrlPower: fixed site name issue
MadamAndEve: archive not updated in a long time, but current strip is.
Works, but needs to be run daily.
PennyArcade: fixed namer
PvPonline: fixed prev
2014-10-24 16:42:32 -05:00
Dirk Reiners
77a5e09c10 Minor fix for using pathes to pick comics 2014-10-24 16:39:40 -05:00
Tobias Gruetzmacher
6769e1eb36 Add StrongFemaleProtagonist.
This uses the _ParserScraper and CSS selectors.
2014-10-13 23:39:50 +02:00
Tobias Gruetzmacher
1d52d6a152 Add support for CSS selectors to HTML parser.
Each comic module author can decide if she wants to use CSS or XPath,
not a mix of both. Using CSS needs the cssselect python module and the
module gets disabled if it is unavailable.
2014-10-13 22:43:06 +02:00
Tobias Gruetzmacher
17bc454132 Bugfix: Don't assume RE patterns in base class. 2014-10-13 22:29:47 +02:00
Tobias Gruetzmacher
e92a3fb3a1 New feature: Comic modules ca be "disabled".
This is modeled parallel to the "adult" feature, except the user can't
override it via the command line. Each comic module can override the
classmethod getDisabledReasons and give the user a reason why this
module is disabled. The user can see the reason in the comic list (-l or
--singlelist) and the comic module refuses to run, showing the same
message.

This is currently used to disable modules that use the _ParserScraper if
the LXML python module is missing.
2014-10-13 21:43:46 +02:00
Tobias Gruetzmacher
d495d95ee0 Refactor: Move repeated check into its own function. 2014-10-13 21:29:54 +02:00
Tobias Gruetzmacher
3235b8b312 Pass unicode strings to lxml.
This reverts commit fcde86e9c0 & some
more. This lets python-requests do all the encoding stuff and leaves
LXML with (hopefully) clean unicode HTML to parse.
2014-10-13 19:39:48 +02:00
zac9
6ca200419a Update s.py 2014-09-28 19:48:26 -07:00
zac9
5b7ab5a711 Update o.py 2014-09-28 19:41:29 -07:00
zac9
491b5457b2 Added comic ShotgunShuffle 2014-09-28 06:29:02 -07:00
Bastian Kleineidam
731291979d Fixed RedMeat. 2014-09-22 22:14:31 +02:00
Bastian Kleineidam
e43694c156 Don't crash on multiple HTML output runs per day. 2014-09-22 22:00:16 +02:00
Bastian Kleineidam
e87f5993b8 Merge branch 'master' into htmlparser 2014-08-07 18:10:15 +02:00
Tobias Gruetzmacher
08175d28c9 Fix Ruthe (see #73). 2014-07-31 21:27:49 +02:00
Tobias Gruetzmacher
ca2d722d39 Fix DieFruehreifen (closes #73). 2014-07-31 21:18:15 +02:00
Tobias Gruetzmacher
6c7fb176b1 Add Blade Kitten as an example for the new parser. 2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
f9f0b75d7c Create new HTML parser based scraper class. 2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
fcde86e9c0 Change getPageContent to (optionally) return raw text.
This allows LXML to do its own "magic" encoding detection
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
0e03eca8f0 Move all regular expression operation into the new class.
- Move fetchUrls, fetchUrl and fetchText.
- Move base URL handling.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
fde1fdced6 Fix some typos. 2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
2567bd4e57 Convert starters and other helpers to new interface.
This allows those starters to work with future scrapers.
2014-07-26 11:28:43 +02:00
Tobias Gruetzmacher
4265053846 Refactor: Move regualar expression scraping into a new class.
- This also makes "<base href>" handling an internal detail of the regular
  expression scraper, future scrapers might not need that or handle it in
  another way.
2014-07-26 11:28:43 +02:00
Bastian Kleineidam
3a929ceea6 Allow comic text to be optional. Patch from TobiX 2014-07-24 20:49:57 +02:00
Bastian Kleineidam
950dd2932c Remove stray print statement. 2014-07-21 20:20:15 +02:00
Tobias Gruetzmacher
ea5d533e30 Fix index lookups for SnowFlame and SnowFlakes. 2014-07-19 13:23:42 +02:00
Bastian Kleineidam
4d49d4394b Fix doc 2014-07-03 18:42:06 +02:00
Bastian Kleineidam
f194e430bc TheThinHLine: fetch bigger images and name image files from sequence number. 2014-07-03 18:41:25 +02:00
Bastian Kleineidam
4845a4ccc1 Merge branch 'master' of github.com:wummel/dosage 2014-07-03 17:12:42 +02:00
Bastian Kleineidam
641daa738b Updated list of comics 2014-07-03 17:12:25 +02:00
Bastian Kleineidam
93fe5d5987 Minor useragent refactoring 2014-07-03 17:12:25 +02:00
Bastian Kleineidam
4c2a339e25 Fix some comics. 2014-07-02 19:51:53 +02:00
Luc Fouin
cb76198da7 added the thin H line, fixes #67 2014-07-02 17:14:33 +02:00
Luc Fouin
763f9b02a2 added the thin H line 2014-07-02 17:11:33 +02:00
Bastian Kleineidam
b03ba158ef Fixed LookingForGroup 2014-07-01 23:44:01 +02:00
Bastian Kleineidam
3485e2ac54 Added Whomp. 2014-06-24 20:48:49 +02:00
wummel
a0086bfcd8 Merge pull request #63 from sehrgut/master
Updated GirlGenius to new markup
2014-06-24 20:40:15 +02:00
Peter B
8f1c864ec3 Added Safely Endangered 2014-06-17 01:05:11 -04:00
Keith Beckman
236b840363 Updated GirlGenius to new markup
GG markup has changed, so I fixed the prevSearch regex to find the
"previous" button on the redesigned page.

As well, I set multipleImagesPerStrip to true, since there are quite a
few comics with multiple images that were being discarded.
2014-06-13 16:43:40 -04:00
Bastian Kleineidam
68afeaf82d Make appname lowercase. 2014-06-09 13:24:58 +02:00
Bastian Kleineidam
00e424aed0 Fix zenpencils. 2014-06-08 13:40:42 +02:00
Bastian Kleineidam
687d27d534 Stripping should be done in normaliseUrl. 2014-06-08 10:12:33 +02:00
Bastian Kleineidam
c528fd1822 Merge branch 'master' of github.com:wummel/dosage 2014-06-08 10:07:36 +02:00
Bastian Kleineidam
0ee5c08771 Match zoom image for GoComics pages. 2014-06-08 10:06:34 +02:00
Peter B
78954da9d7 fix StandStillStaySilent, strip urls when downloading 2014-06-04 01:58:16 -04:00
Peter B
71ed9ad69d fixed foul language 2014-06-04 01:35:40 -04:00
Bastian Kleineidam
62a3a55b82 Fixed LoadingArtist 2014-03-26 19:59:42 +01:00
Bastian Kleineidam
813e6876fc Add missing @classmethod 2014-03-26 19:59:42 +01:00
Bastian Kleineidam
c2cf58560e Remove unused import. 2014-03-26 19:59:42 +01:00
Bastian Kleineidam
4bb31953ad Fix PennyArcade 2014-03-26 19:59:42 +01:00
Freestila
0faf4a722b Update o.py
Removed procedure for "I am over 18" button, sicne this button no longer exists
2014-03-05 09:28:34 +01:00
Bastian Kleineidam
348dd5e6c0 Add documentation 2014-03-04 20:53:19 +01:00
Bastian Kleineidam
3108c9124a Fix thread import for py3 2014-03-04 20:50:34 +01:00
Bastian Kleineidam
18972d3830 Remove old waitSeconds parameter. 2014-03-04 18:38:46 +01:00
Bastian Kleineidam
15ef59262a Make threads interruptable. 2014-03-04 18:38:46 +01:00
Tobias Gruetzmacher
33801376f9 Fix indentation. 2014-02-27 22:31:21 +01:00
Tobias Gruetzmacher
1bcac66c03 Mark MonsieurLeChien as french. 2014-02-27 22:30:02 +01:00
Tobias Gruetzmacher
8e2ba15410 Merge pull request #60 from Freestila/master
Added comics - looks good
2014-02-27 22:24:57 +01:00
Luc Fouin
da9f518a7a add french commit M. Le Chien 2014-02-27 17:45:29 +01:00
Freestila
53ebb51b10 Added comic DungeonsAndDenizens 2014-02-27 15:08:07 +01:00
Freestila
b8fefb37c0 Added comic Underling 2014-02-20 12:54:40 +01:00
Freestila
3d19d45e81 Added wait 1 sek because of permanent Timeout / connection pool exceed from server 2014-02-20 12:54:13 +01:00
Freestila
67c31284f1 Added comic GrimTales from Down Below 2014-02-18 21:12:29 +01:00
Freestila
de0bb1c9d5 Added comic "The Landscaper" 2014-02-18 21:00:43 +01:00
Freestila
96f61542ee Added comic "Die Fruehreifen" 2014-02-18 21:00:19 +01:00
Peter B
b44b751efa Fixed EvilInc comics. Closes #58 2014-02-14 19:33:13 -05:00
Bastian Kleineidam
f50ef910be Skip CyanideAndHappiness videos 2014-02-10 21:58:26 +01:00
Bastian Kleineidam
875e431edc Provide page data in shouldSkipUrl() function 2014-02-10 21:58:09 +01:00
Bastian Kleineidam
73e1af7aba Fixed FredoAndPidjin 2014-02-06 19:57:56 +01:00
Peter B
d86442efed Added Oh Joy Sex Toy. 2014-01-30 22:45:50 -05:00
Peter B
add63d6d6c Added The Gentleman's Armchair Comic. 2014-01-30 22:32:46 -05:00
Tobias Gruetzmacher
44ef1831bf Sluggy Freelance has some pages with multiple comics.
See for example SluggyFreelance:010422
2014-01-28 19:08:39 +01:00
wummel
6b8854e7b2 Merge pull request #55 from Lugoues/upstream
Added MrLovenstein Comic
2014-01-26 05:49:50 -08:00
Bastian Kleineidam
cc5ee572fb Fix some comics 2014-01-24 23:17:21 +01:00
Peter B
66f6b08163 Added MrLovenstein Comic 2014-01-23 20:23:24 -05:00
Bastian Kleineidam
1a56fbb3dd Fix DemolitionSquad 2014-01-20 19:01:47 +01:00
Bastian Kleineidam
8b0f149c2b Updated copyright 2014-01-19 13:16:22 +01:00
Peter B
740bcb72ce Added Eat That Toast 2014-01-12 19:08:02 -05:00
Peter B
124cf99665 Added Poorly Drawn lines replacing GoComic's version. 2014-01-12 19:08:02 -05:00
Bastian Kleineidam
e738454cb1 Correct drunkduck disablement comment. 2014-01-11 20:04:52 +01:00
Peter B
d0031b65c8 Added "Stand Still. Stay Silent." comic. 2014-01-08 11:08:19 -05:00
Bastian Kleineidam
69bffc9c92 Fix invalid description. 2014-01-06 16:25:42 +01:00
Bastian Kleineidam
264a20a4db Disable disallowed drunkduck comics. 2014-01-06 09:58:24 +01:00
Bastian Kleineidam
3f4be55332 Merge branch 'upstream' of https://github.com/Lugoues/dosage into Lugoues-upstream 2014-01-06 09:38:25 +01:00
Bastian Kleineidam
d98c2a52dd Skip phdcomic video URL. 2014-01-06 08:20:58 +01:00
Peter B
ceca4ba102 Added FoulLanguage Comic 2014-01-06 00:34:37 -05:00
Peter B
1de57ea1fe added Camp Comic 2014-01-05 23:09:19 -05:00
Bastian Kleineidam
ef17268ace Fix comic list output. 2014-01-05 17:37:13 +01:00
Bastian Kleineidam
5fe48d013a Increase wait interval. 2014-01-05 17:14:19 +01:00
Bastian Kleineidam
4d63920434 Updated copyright. 2014-01-05 16:50:57 +01:00
Bastian Kleineidam
b6c913e2d5 Wait some time between requests. 2014-01-05 16:23:45 +01:00
Bastian Kleineidam
1affe58370 Use thread name in log output. 2014-01-05 16:17:34 +01:00
Bastian Kleineidam
bb18295798 Use realpath to detect symlinked instances. 2014-01-05 11:16:57 +01:00
Bastian Kleineidam
d9edeb1343 Limit cyanideandhappiness filename length 2014-01-05 11:08:15 +01:00
Bastian Kleineidam
9172aba146 Remove stray print 2014-01-05 10:50:25 +01:00
Bastian Kleineidam
1f38895681 Ensure only on instance of dosage is running to prevent accedental DoS on sites with multiple comics. 2014-01-05 10:36:22 +01:00
Bastian Kleineidam
732b50811d Only ensure the maximum width. 2013-12-22 13:38:29 +01:00
Bastian Kleineidam
f488935072 Fix AbstruseGoose and QuestionabelContent. 2013-12-22 08:01:58 +01:00
Bastian Kleineidam
a1a773dd52 Fix loader in frozen executables. 2013-12-18 20:55:23 +01:00
Bastian Kleineidam
5c5aa166c7 Fix gocomic image matcher 2013-12-12 22:54:03 +01:00
Bastian Kleineidam
799d3040f0 Refactoring 2013-12-11 17:54:39 +01:00
Bastian Kleineidam
f23aa86a2c Get larger Gocomic images. 2013-12-11 17:53:52 +01:00
Bastian Kleineidam
b5d973e2d4 Only resize really big images. 2013-12-11 00:01:29 +01:00
Bastian Kleineidam
5ad423c15e Limit image size also in HTML. 2013-12-10 19:59:19 +01:00
Bastian Kleineidam
c3078ed855 Added EdmundFinney, Gaia, GaiaGerman, InternetWebcomic,
NotInventedHere, RedsPlanet, RomanticallyApocalyptic,
  ScandinaviaAndTheWorld, TheGamerCat, Weregeek
2013-12-10 19:50:21 +01:00
Damjan Košir
4e40f02642 added comic Gaia in German 2013-12-10 18:02:20 +13:00
Damjan Košir
4e5717be57 added comic Gaia 2013-12-10 17:08:15 +13:00
Damjan Košir
f48b22b512 added comic Not Invented Here 2013-12-10 16:40:44 +13:00
Damjan Košir
e181b287c9 added comic Romantically Apocalyptic 2013-12-10 16:39:30 +13:00
Damjan Košir
58b62dbad3 added comic Scandinavia and the World 2013-12-10 16:37:35 +13:00
Damjan Košir
5982e27c7b added comic Red's Planet 2013-12-10 16:34:47 +13:00
Damjan Košir
4f47792dee added comic The Gamer Cat 2013-12-10 16:33:07 +13:00
Damjan Košir
b53ca04ee7 added comic Internet Webcomic 2013-12-10 16:32:16 +13:00
Damjan Košir
f095f6309e added comic Edmund Finney's Quest to Find the Meaning of Life 2013-12-10 16:31:03 +13:00
Bastian Kleineidam
67c2203e7e Ensure maxium aspect ratio in RSS images. 2013-12-08 15:55:39 +01:00
Bastian Kleineidam
df9a381ae4 Document getfp() function. 2013-12-08 11:46:26 +01:00
Bastian Kleineidam
03fff069ee Apply same file checks files as for image files. 2013-12-05 18:29:15 +01:00
Bastian Kleineidam
599672acbf Fix xkcd text regex. Closes #46 2013-12-05 18:29:15 +01:00
Bastian Kleineidam
7343932a5a Strip whitespace from image text. 2013-12-04 18:07:13 +01:00
wummel
0378c9d855 Merge pull request #45 from Lugoues/master
Store alt text from AbstruseGoose
2013-12-04 09:01:50 -08:00
Bastian Kleineidam
c583e8717e Store large xkcd images. 2013-12-04 17:56:54 +01:00
Bastian Kleineidam
0e5c59133c Provide HTML page data for image URL modifier function. 2013-12-04 17:54:55 +01:00
Peter B
36dcadc7d4 Store alt text from AbstruseGoose 2013-12-03 21:56:54 -05:00
Bastian Kleineidam
3c5424c2ef Add text in RSS and HTML output. 2013-11-29 20:32:54 +01:00
Bastian Kleineidam
142c418dc0 Store alt text from xkcd comics. 2013-11-29 20:27:11 +01:00
Bastian Kleineidam
0eaf9a3139 Add text search in comic strips. 2013-11-29 20:26:49 +01:00
Bastian Kleineidam
468b34034b cyanideandhappiness skip URL 2013-11-29 18:31:34 +01:00
Bastian Kleineidam
9514a8eeae Fixed ForLackOfABetterComic 2013-11-27 20:49:35 +01:00
Bastian Kleineidam
7d05b666da Updated RSS link name 2013-11-25 21:20:48 +01:00
Bastian Kleineidam
01085d56c2 Regenerated. 2013-11-24 12:19:54 +01:00
Bastian Kleineidam
48e417c647 Fixed some comics. 2013-11-18 22:01:30 +01:00
Bastian Kleineidam
f6fc604745 Fix GoComics image URL. 2013-11-14 21:30:51 +01:00
Bastian Kleineidam
44f8c81111 Updated from edits. 2013-11-12 20:13:58 +01:00
Bastian Kleineidam
7760985601 Fix broken comics 2013-11-12 18:33:14 +01:00
Bastian Kleineidam
45a5ef9064 Removed AetheriaEpics 2013-11-07 21:23:15 +01:00
Bastian Kleineidam
f74b18c2e5 Remove unused import. 2013-11-07 21:22:49 +01:00
Bastian Kleineidam
ca17332942 Call self.starter() on indexed comics since it might set cookies. 2013-11-07 20:48:10 +01:00
Bastian Kleineidam
74cca6bac3 Fixed oglaf comic skipping. 2013-11-07 20:47:31 +01:00
Bastian Kleineidam
1f282147dc Fix drunkduck comics. 2013-11-07 17:12:38 +01:00
Bastian Kleineidam
3e6414e0e5 Updated plugins 2013-11-07 07:28:47 +01:00
Bastian Kleineidam
86257c8364 Remove duplicate variable 2013-08-28 20:50:07 +02:00
Faldrian
93318c1d0c Added DarthsAndDroids 2013-08-19 20:14:47 +02:00
Bastian Kleineidam
ef4ae435a5 Fix several comics. 2013-07-18 20:39:53 +02:00
Bastian Kleineidam
eb4ee1a251 Add EatLiver and JimBenton 2013-07-16 18:01:44 +02:00
Bastian Kleineidam
934546954b Added MarriedToTheSea, NatalieDee 2013-07-10 18:43:53 +02:00
Bastian Kleineidam
d5172074d5 Fix some comics. 2013-07-09 22:21:17 +02:00
Bastian Kleineidam
8d5ae7b1bb Updated plugins. 2013-07-09 22:21:12 +02:00
Bastian Kleineidam
38f2e9e625 Fix typo. 2013-07-04 21:00:06 +02:00
wummel
dc6c90dbf3 Merge pull request #36 from pataluc/master
Added Go Get A Roomie
2013-07-04 11:56:31 -07:00
Bastian Kleineidam
327cb35aee Add dosagelib.__version__ 2013-07-04 20:55:43 +02:00
Bastian Kleineidam
02132893b2 Fix shermanslagoon namer. 2013-07-04 20:20:26 +02:00
Bastian Kleineidam
f78d28fba8 Merge branch 'master' of https://github.com/mbrandis/dosage into mbrandis-master 2013-07-04 20:03:11 +02:00
Bastian Kleineidam
8559184d69 Updated plugins 2013-07-04 12:22:36 +02:00
Bastian Kleineidam
a27ab5460b Add ICanBarelyDraw 2013-07-04 12:22:20 +02:00
Luc Fouin
495b1149bd added GoGetARoomie 2013-07-04 11:08:16 +02:00
mbrandis
ccf50cad89 Corrected description. 2013-06-24 22:46:39 +02:00
Bastian Kleineidam
da957ce329 Updated linuxcom 2013-06-24 20:27:43 +02:00
Bastian Kleineidam
36b8dcea04 Merge branch 'patch-2' of https://github.com/mbrandis/dosage into mbrandis-patch-2 2013-06-24 20:23:10 +02:00
Bastian Kleineidam
6bd534eaed Fix OnTheFasttrack 2013-06-24 20:19:33 +02:00
mbrandis
3b0393ccf6 Update s.py
Added Sherman's Lagoon, namer is not perfect.
2013-06-23 23:30:57 +03:00
mbrandis
265c03fc82 Update l.py
Added Linux.com Friday Funnies.
2013-06-23 22:28:08 +02:00
mbrandis
ef5ac2128b Update o.py
Please consider adding this comic.
2013-06-23 22:26:32 +02:00
Bastian Kleineidam
1c1b0aaf18 Comic fixes. 2013-05-25 23:24:33 +02:00
Bastian Kleineidam
66dccef537 Fix keenspot description type. 2013-05-22 22:29:20 +02:00
Bastian Kleineidam
ef878eed7c Updated plugins. 2013-05-22 07:19:16 +02:00
Bastian Kleineidam
b41fdf99ae Fix LookingForGroup. 2013-05-21 18:48:03 +02:00
Bastian Kleineidam
1478f22099 Output fixes. 2013-04-30 20:26:36 +02:00
Bastian Kleineidam
5f6ed7e05d More output stuff. 2013-04-30 07:24:54 +02:00
Bastian Kleineidam
1a6416eb1b Updated wormworld chapter. 2013-04-30 06:42:49 +02:00
Bastian Kleineidam
ebdc1e6359 More unicode output fixes. 2013-04-30 06:41:19 +02:00
Bastian Kleineidam
d6ca5aa7fd SnowFlakes is end-of-life. 2013-04-29 20:31:07 +02:00
Bastian Kleineidam
cec08b86dd DrFun is end-of-life. 2013-04-29 20:29:56 +02:00
Bastian Kleineidam
79273deb23 Correct output encoding. 2013-04-29 20:25:05 +02:00
Bastian Kleineidam
80d7defcd2 Unicode descriptions. 2013-04-29 07:35:56 +02:00
Bastian Kleineidam
459156fc1a Description must be unicode. 2013-04-29 07:27:59 +02:00
Bastian Kleineidam
64bf618b87 xkcd 2013-04-29 07:19:58 +02:00
Bastian Kleineidam
dcacbf0b9a Fix some comics. 2013-04-28 19:58:38 +02:00
Bastian Kleineidam
f9a48e6cb9 Updated scripted comic plugins. 2013-04-27 07:47:17 +02:00
Bastian Kleineidam
8783b53012 Fix GirlGenious strip url. 2013-04-26 19:52:45 +02:00
Bastian Kleineidam
cafa37fcb1 All scrapers must have an URL. 2013-04-26 06:53:05 +02:00
Bastian Kleineidam
05dbc51d3e Detect completed end-of-life comics. 2013-04-25 22:40:06 +02:00
Bastian Kleineidam
871de6a8ce Prefer GoComics over Creators since they have a better naming scheme. 2013-04-25 21:50:45 +02:00
Bastian Kleineidam
4716ecd71d Carciphona description. 2013-04-25 21:50:27 +02:00
Bastian Kleineidam
ba6e0c09a4 Added Unsound. 2013-04-25 21:38:18 +02:00
Bastian Kleineidam
725824f067 Added TwoGuysAndGuy 2013-04-25 21:23:31 +02:00
Bastian Kleineidam
382c4c05ad Added TheDreamlandChronicles 2013-04-25 21:20:48 +02:00
Bastian Kleineidam
8418ea471d Added SabrinaOnline. 2013-04-25 21:14:32 +02:00
Bastian Kleineidam
3e74dc9956 Fix MadamAndEve. 2013-04-25 21:09:42 +02:00
Bastian Kleineidam
c95a447305 Updated docs. 2013-04-25 21:06:20 +02:00
Bastian Kleineidam
6c773e21c7 Added Lackadaisy. 2013-04-25 21:06:12 +02:00
Bastian Kleineidam
1ae674782d Added GirlGenius 2013-04-25 20:58:24 +02:00
Bastian Kleineidam
96fc129fea Add GeneralProtectionFault (disallowed by robots.txt) 2013-04-25 20:54:48 +02:00
Bastian Kleineidam
f20df8b692 Added Curtailed. 2013-04-25 20:46:05 +02:00
Bastian Kleineidam
c114a834dd Added Carciphona 2013-04-25 20:40:15 +02:00
Bastian Kleineidam
51d84131eb Added ARedTailsDream 2013-04-25 20:37:27 +02:00
Bastian Kleineidam
dbdbdd09de Fix SMBC 2013-04-25 20:32:21 +02:00
Bastian Kleineidam
52ee7228ef Fix DorkTower image regex. 2013-04-25 19:01:38 +02:00
Bastian Kleineidam
aca3c959af Improve CtrlAltDel image names. 2013-04-25 19:01:21 +02:00
wummel
1c29f22270 Merge pull request #27 from dromaludaire/master
Fix some SMBC download links
2013-04-22 10:40:24 -07:00
Benjamin Sigonneau
f1da47edef [SMBC] Fix regexp, mainly concerns older strips 2013-04-21 22:39:06 +02:00
Sven Hartge
8e34239b27 Fix typo in regex for SandraOnTheRocks. 2013-04-20 19:59:01 +02:00
Sven Hartge
40f2aed8f0 Add Sandra on the Rocks. 2013-04-20 18:51:06 +02:00
Bastian Kleineidam
4988e79e6e Added some descriptions. 2013-04-19 06:31:12 +02:00
Bastian Kleineidam
e37a80fdc1 Add some descriptions. 2013-04-14 09:02:14 +02:00
Bastian Kleineidam
f15f993851 s/baseurl/baseUrl/g 2013-04-13 20:58:00 +02:00
Bastian Kleineidam
c246b41d64 Code formatting. 2013-04-13 08:00:11 +02:00
Bastian Kleineidam
522af89af5 Add some descriptions. 2013-04-13 08:00:03 +02:00
Bastian Kleineidam
3a03554d26 Ensure unicode output to fix encoding errors. 2013-04-12 21:02:31 +02:00
Bastian Kleineidam
35c031ca81 Fixed some comics. 2013-04-11 18:27:43 +02:00
Bastian Kleineidam
6ca4eaa492 Code cleanup. 2013-04-11 18:27:43 +02:00
Bastian Kleineidam
7e593cf7e8 Add firstStripUrls. 2013-04-10 23:57:09 +02:00
Bastian Kleineidam
a0c7f54871 Fix zwarwald 2013-04-10 20:14:43 +02:00
Bastian Kleineidam
fb05c10808 Sort entries. 2013-04-10 18:36:33 +02:00
Bastian Kleineidam
54283775a8 Add ForLackOfABetterComic 2013-04-10 18:20:39 +02:00
Bastian Kleineidam
d00310f017 Add EverydayBlues 2013-04-10 18:20:08 +02:00
Bastian Kleineidam
8b99b59056 Added DamnLol 2013-04-10 18:19:38 +02:00
Bastian Kleineidam
5127d4c895 Use re.escape and add some firstStripUrl. 2013-04-10 18:19:11 +02:00
Bastian Kleineidam
3213eebd75 Added ZenPencils. 2013-04-09 19:38:47 +02:00
Bastian Kleineidam
e040dd0d50 Added Science. 2013-04-09 19:38:16 +02:00
Bastian Kleineidam
68f14971e8 Added RealmOfAtland. 2013-04-09 19:37:47 +02:00
Bastian Kleineidam
f9179e9de5 Added GoblinsComic 2013-04-09 19:37:24 +02:00
Bastian Kleineidam
f71961acbc Added ExtraOrdinary. 2013-04-09 19:36:51 +02:00
Bastian Kleineidam
190ffcd390 Use str() for robotparser. 2013-04-09 19:36:00 +02:00
Bastian Kleineidam
b9dc385ff2 Implemented voting 2013-04-09 19:33:50 +02:00
Bastian Kleineidam
4528281ddd Voting part 2 2013-04-08 21:20:01 +02:00
Bastian Kleineidam
e762f269b7 First part of voting stuff. 2013-04-08 20:19:10 +02:00
Bastian Kleineidam
7584f0b647 Add version update check. 2013-04-08 20:17:02 +02:00
Bastian Kleineidam
781bac0ca2 Feed text content instead of binary to robots.txt parser. 2013-04-07 18:11:29 +02:00
Bastian Kleineidam
bd1d41b83c Write encoded data in binary format. 2013-04-05 19:27:30 +02:00
Bastian Kleineidam
0fbc005377 A Python3 fix. 2013-04-05 18:57:44 +02:00
Bastian Kleineidam
97522bc5ae Use tuples rather than lists. 2013-04-05 18:55:19 +02:00
Bastian Kleineidam
adb31d84af Use HTMLParser.unescape instead of rolling our own function. 2013-04-05 18:53:19 +02:00
Bastian Kleineidam
1c9f64bc27 Better name for Sketchesnatched. 2013-04-05 18:47:51 +02:00
Bastian Kleineidam
9e26640407 Augment SketcheSnatched 2013-04-05 07:31:22 +02:00
Bastian Kleineidam
50b742721b SketcheSnatched 2013-04-05 07:20:50 +02:00
Bastian Kleineidam
3936cfa9ce Another fix. 2013-04-05 06:56:33 +02:00
Bastian Kleineidam
6aa588860d Code cleanup 2013-04-05 06:36:05 +02:00
Bastian Kleineidam
fabe872d1d Fix SnowFlame 2013-04-04 18:32:37 +02:00
Bastian Kleineidam
8150dabfa6 Remove SarahZero 2013-04-04 18:32:29 +02:00
Bastian Kleineidam
5d6e210c98 Fix Curvy 2013-04-04 18:30:27 +02:00
Bastian Kleineidam
b3cbad37bc Remove CaribbeanBlue 2013-04-04 18:30:16 +02:00
Bastian Kleineidam
80c24a10c0 Fix WebDesignerCOTW 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
62af4b875e Fix Precocious 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
421e31c961 Fix Oglaf 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
d794919e73 Fix LasLindas 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
08a3587df6 Fix KatzenfutterGeleespritzer 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
c57226f1c0 Remove GreystoneInn 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
3078c5ec73 Fix ExtraLife and EyeOfRamalach 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
3fd4cfea0d Fix DasLebenIstKeinPonyhof 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
460c5be689 Add POST support to urlopen(). 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
44c3fb9f16 Remove broken scripted plugins. 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
0054ebfe0b Some Python3 fixes. 2013-04-03 20:32:43 +02:00
Bastian Kleineidam
2c0ca04882 Fix warning for scrapers with multiple image patterns. 2013-04-03 20:32:19 +02:00
Bastian Kleineidam
f53a516219 Use output logging instead of print statement. 2013-04-03 20:31:10 +02:00
Bastian Kleineidam
a972729c0d Add WebDesignerCOTW 2013-04-03 20:30:51 +02:00
Bastian Kleineidam
fdab3b7b35 Add StuffNoOneToldMe 2013-04-03 20:30:29 +02:00
Bastian Kleineidam
43255872c3 Added SnowFlakes. 2013-04-03 20:30:16 +02:00
Bastian Kleineidam
6303a1cb20 Updated scripted plugins. 2013-04-03 20:27:12 +02:00
Bastian Kleineidam
f737486754 Fix hagar. 2013-03-26 20:12:26 +01:00
Bastian Kleineidam
a2f343226f Remove duplicate dilbert. 2013-03-26 20:02:13 +01:00
Bastian Kleineidam
1d7f7a8517 Fix genre list 2013-03-26 19:58:22 +01:00
Bastian Kleineidam
141d5113de Fix hagarthehorrible 2013-03-26 19:54:00 +01:00
Bastian Kleineidam
b62f1ba69d Code cleanup. 2013-03-26 17:36:06 +01:00
Bastian Kleineidam
31d95d1d03 Remove DerFlix 2013-03-26 17:35:58 +01:00
Bastian Kleineidam
3dd2daf223 Updated scripted plugins. 2013-03-26 17:35:47 +01:00
Bastian Kleineidam
92150ddbda Add HagarTheHorrible 2013-03-26 17:35:10 +01:00
Bastian Kleineidam
75f3d59e85 Fix Eriadan 2013-03-26 17:34:56 +01:00
Bastian Kleineidam
de3ce2ec95 Fix WormWorldSaga* 2013-03-26 17:34:27 +01:00
Bastian Kleineidam
a3d74c5a0e Fix BratHalla and BrentalFloss 2013-03-26 17:33:51 +01:00
Bastian Kleineidam
10985ae614 Add genre tags. 2013-03-26 17:33:27 +01:00
Bastian Kleineidam
fcdc67ef92 Fix documentation. 2013-03-26 17:29:20 +01:00
Bastian Kleineidam
110a67cda4 Retry failed page content downloads (eg. timeouts). 2013-03-25 19:49:09 +01:00
Bastian Kleineidam
ec33276fd7 Print stacktrace on image errors. 2013-03-25 19:48:47 +01:00
Bastian Kleineidam
1a7dfc02d2 Add Schuelert 2013-03-25 19:48:32 +01:00
Bastian Kleineidam
af10385da1 Add firstStripUrl for KevinAndKell. 2013-03-25 19:48:19 +01:00
Bastian Kleineidam
940a04b499 Fix comic searching. 2013-03-25 19:48:01 +01:00
Bastian Kleineidam
bafe981917 Add DrMcNinja. 2013-03-25 19:47:44 +01:00
Bastian Kleineidam
9d1f286424 Improved documentation. 2013-03-25 19:47:29 +01:00
Bastian Kleineidam
c99827935b Updated plugins with scripts. 2013-03-25 19:40:38 +01:00
Bastian Kleineidam
01c2afc264 Print exception tracebacks. 2013-03-25 19:39:37 +01:00
Tobias Gruetzmacher
a1b5bfb68f Add a simple event consumer to write JSON metadata.
This drops a file named dosage.json in each comic directory. This is
still not perfect, but something to build upon.
2013-03-24 16:55:30 +01:00
Tobias Gruetzmacher
0a218c0283 Add event comicPageLink for every previous link.
This event allows a listener to build connections between pages.
2013-03-24 16:36:02 +01:00
Bastian Kleineidam
9f08b21a7e Get correct images of gocomic strips. 2013-03-24 14:13:33 +01:00
Bastian Kleineidam
179ba7f49f Add release info. 2013-03-21 19:04:59 +01:00
Bastian Kleineidam
2b98cf0079 CucumberQuest fixes. 2013-03-21 18:38:40 +01:00
Bastian Kleineidam
3f6df92fef Added some comics, fixed some. 2013-03-21 18:33:16 +01:00
Bastian Kleineidam
448e80eaed Added MyCartoons 2013-03-20 21:42:04 +01:00
Bastian Kleineidam
2e3907d942 Add Katzenfuttergeleespritzer and ParallelUniversum 2013-03-20 17:39:49 +01:00
Bastian Kleineidam
3937cfba4b Added SandraAndWooGerman 2013-03-19 20:54:16 +01:00
Bastian Kleineidam
78fb63859c Add DemolitionSquad. 2013-03-19 20:45:59 +01:00
Bastian Kleineidam
17fe58b864 Fix some comics. 2013-03-19 20:45:18 +01:00
Bastian Kleineidam
178d8f80b2 Fix Dilbert image naming. 2013-03-18 18:15:19 +01:00
Bastian Kleineidam
88224fe21a Add DogHouseDiaries, update changelog. 2013-03-15 07:04:19 +01:00
Bastian Kleineidam
6a2f55ddef Dont stop on image regex errors. 2013-03-15 07:03:54 +01:00
Bastian Kleineidam
e88cf514a7 Fixes for FonFlatter 2013-03-13 18:31:58 +01:00
Bastian Kleineidam
e739eb7992 Added CucumberQuest 2013-03-12 21:36:13 +01:00
Bastian Kleineidam
79f0f4b36c Added OrnerBoy 2013-03-12 21:23:26 +01:00
Bastian Kleineidam
502a35166f Added KickInTheHead 2013-03-12 21:16:17 +01:00
Bastian Kleineidam
7c4ac0df7b Add and fix some comics. 2013-03-12 20:49:46 +01:00
Bastian Kleineidam
43f20270d0 Allow a list of regular expressions for image and previous link search. 2013-03-12 20:48:26 +01:00
Bastian Kleineidam
6de26aeeaa Updated keenspot scraper and its comic list. 2013-03-12 20:47:52 +01:00
Bastian Kleineidam
2bf7d16090 Updated comicgenesis comic list. 2013-03-12 20:47:38 +01:00
Bastian Kleineidam
58abcb282d Added GeeksNextDoor. 2013-03-11 22:51:45 +01:00
Bastian Kleineidam
737f1e189d Added FullFrontalNerdity. 2013-03-11 22:45:30 +01:00
Bastian Kleineidam
538523f86c Add keenspot. 2013-03-11 22:03:17 +01:00
Bastian Kleineidam
a16bf6c16b Rename keenspot to comicgenesis and enable it. 2013-03-11 21:50:49 +01:00
Bastian Kleineidam
f0eaba0f69 Reenable comicgenesis comics. 2013-03-11 20:33:56 +01:00
Bastian Kleineidam
950a958e30 Updated for release. [ci skip] 2013-03-11 20:14:27 +01:00
Bastian Kleineidam
7eaf12caf6 Fix LookingForGroup 2013-03-11 19:56:37 +01:00
Bastian Kleineidam
7ee73caf3c Allow multiple event output and improve HTML output. 2013-03-11 17:33:59 +01:00
Bastian Kleineidam
75e576f2de Embed images in html output. 2013-03-09 21:39:43 +01:00
Bastian Kleineidam
8b0a523f77 Page comic listings. 2013-03-09 09:00:50 +01:00
Bastian Kleineidam
5ccf44c36a Embed images in html output. 2013-03-08 22:38:11 +01:00
Bastian Kleineidam
88e28f3923 Fix some comics and add language tag. 2013-03-08 22:33:05 +01:00
Bastian Kleineidam
b368f125bc Fix some comics. 2013-03-08 06:47:00 +01:00
Bastian Kleineidam
4c344765ff Add option to wait before downloading. 2013-03-08 06:46:50 +01:00
Bastian Kleineidam
0ee0822e00 Fix some comics. 2013-03-08 00:06:55 +01:00
Bastian Kleineidam
2bdf0d588d Simplify exception handling. 2013-03-08 00:06:50 +01:00
Bastian Kleineidam
1d7410c038 Added Zwarwald and AhoiPolloi 2013-03-07 23:51:55 +01:00
Bastian Kleineidam
8259a01d64 Fix URLs with no content type header. 2013-03-07 23:08:37 +01:00
Bastian Kleineidam
1cc7d39047 Fix some comics. 2013-03-07 23:08:17 +01:00
Bastian Kleineidam
0215ae82af Fix some comics. 2013-03-07 19:54:18 +01:00
Bastian Kleineidam
e96c68c378 Fix dorktower. 2013-03-07 18:24:12 +01:00
Bastian Kleineidam
7d8786c1d2 Code cleanup. 2013-03-07 18:22:49 +01:00
Bastian Kleineidam
736d9aa8cf Code cleanup. 2013-03-07 18:22:39 +01:00
Bastian Kleineidam
23c20bfe32 Fix some comics. 2013-03-07 18:22:24 +01:00
Bastian Kleineidam
d1e5ad2696 Set proper HTML5 doctype and encoding for HTML output. 2013-03-07 18:21:05 +01:00
Bastian Kleineidam
9f13af7750 Retry empty downloads and don't set a manual modification time. 2013-03-07 18:20:38 +01:00
Bastian Kleineidam
6f2aebe8c0 Updated copyright. 2013-03-07 18:19:50 +01:00
Bastian Kleineidam
106a15b6c3 Add missing attribute. 2013-03-06 20:23:43 +01:00
Bastian Kleineidam
10eb1ff5ec Fix dilbert filenames. 2013-03-06 20:21:20 +01:00
Bastian Kleineidam
d7925ba4a2 Sort comics. 2013-03-06 20:21:10 +01:00
Bastian Kleineidam
bae2a96d8b Added some comic strips and cleanup the scraper code. 2013-03-06 20:00:30 +01:00
Bastian Kleineidam
3a22c05050 Catch WindowsError when initializing colorama. 2013-03-05 21:15:25 +01:00
Bastian Kleineidam
01177e25f0 Updated generated comic lists. 2013-03-05 19:06:00 +01:00
Bastian Kleineidam
c13aa323d8 Code cleanup [ci skip] 2013-03-04 21:44:26 +01:00
Bastian Kleineidam
61a02630b8 Remove duplicate comic entries. 2013-03-04 19:40:10 +01:00
Bastian Kleineidam
4047859c5b Fix BrentalFlossFit 2013-03-04 19:37:26 +01:00
wummel
c0440266cf Merge pull request #12 from TobiX/some-new-comics
Some new comics
2013-03-04 10:13:51 -08:00
Bastian Kleineidam
3712799ee0 Add imageUrlModifier() for scrapers. 2013-03-04 19:10:27 +01:00
Bastian Kleineidam
44d696c4af Flush file contents to disk and check for empty files. 2013-03-04 19:10:26 +01:00
Bastian Kleineidam
60b160bcdf Prevent double slash in support url 2013-03-04 19:10:26 +01:00
Tobias Gruetzmacher
fc3fab8500 Add Namesake. 2013-03-03 22:41:11 +01:00
Tobias Gruetzmacher
bf13b13ab6 Add StickyDillyBuns.
One of the Pixie Trix Comix.
2013-03-03 22:03:27 +01:00
Tobias Gruetzmacher
1af8a99594 Added MenageA3 (ma3comics.com).
One of the Pixie Trix Comix.
2013-03-03 21:52:08 +01:00
Tobias Gruetzmacher
d668f5fc1e Add DangerouslyChloe.
One of the Pixie Trix Comix.
2013-03-03 21:31:44 +01:00
Tobias Gruetzmacher
4036ce06ef Add MagickChicks.
One of the Pixie Trix Comix.
2013-03-03 20:50:21 +01:00
Tobias Gruetzmacher
af57e018a1 Add ShadowGirls. 2013-03-03 18:59:16 +01:00
Tobias Gruetzmacher
5c85e9a2f2 Add BrentalFloss.
"Flossed in Time" does not work ATM since there are errors in the image
URL.
2013-03-03 18:58:21 +01:00
Tobias Gruetzmacher
89f1170ff4 Add AlphaLuna and AlphaLuna/Spanish. 2013-03-03 15:58:40 +01:00
Bastian Kleineidam
fba7f6e527 Updated comic plugins. 2013-03-01 20:55:55 +01:00
Bastian Kleineidam
d7daf67e08 Fix some comics. 2013-02-27 19:40:54 +01:00
Bastian Kleineidam
09df20cd1f Fix some comics and increase travis test number. 2013-02-26 06:12:46 +01:00
Bastian Kleineidam
41c954b309 Another try on URL quoting. 2013-02-23 09:08:08 +01:00
Bastian Kleineidam
953dc62ffd Fix some comics. 2013-02-23 09:07:44 +01:00
Bastian Kleineidam
ec6e59e53c Fix Chucklebrain 2013-02-22 20:29:05 +01:00
Bastian Kleineidam
6793aecbd3 Fix OneQuestion. 2013-02-22 20:23:47 +01:00
Bastian Kleineidam
889056b8e3 Fix PicPakDog 2013-02-22 19:43:33 +01:00
Bastian Kleineidam
2eb7b43dd2 Remove drunkduck awards. 2013-02-21 19:51:10 +01:00
Bastian Kleineidam
f36ed46d6a Fix tests which hit the first URL. 2013-02-21 19:48:21 +01:00
Bastian Kleineidam
d0c3492cc7 Catch robots.txt errors. 2013-02-21 19:48:04 +01:00
Bastian Kleineidam
b453c442c2 Fix some comics. 2013-02-21 19:47:37 +01:00
Bastian Kleineidam
1a84431456 Add Caggage 2013-02-21 19:47:21 +01:00
Bastian Kleineidam
292c58633c Fix AstronomyPOTD 2013-02-20 20:52:37 +01:00
Bastian Kleineidam
725001155a Updated generated comics. 2013-02-20 20:52:23 +01:00
Bastian Kleineidam
ae0e9feea1 Remember skipped URLs. 2013-02-20 20:51:39 +01:00
Bastian Kleineidam
91c32515d5 Fix some comics. 2013-02-19 20:58:04 +01:00
Bastian Kleineidam
8e2a01f19f Fix some comics. 2013-02-18 20:55:54 +01:00
Bastian Kleineidam
79795115f0 Do not sort module lists. 2013-02-18 20:40:35 +01:00
Bastian Kleineidam
be1694592e Do not stream page content URLs. 2013-02-18 20:38:59 +01:00
Bastian Kleineidam
96edb60e01 Fix some comics. 2013-02-18 20:38:44 +01:00
Bastian Kleineidam
17f1988197 Fix Catalyst 2013-02-18 20:03:54 +01:00
Bastian Kleineidam
270510bdc5 Fix AstronomyPOTD 2013-02-18 20:03:42 +01:00
Bastian Kleineidam
6155b022a6 Allow selected strips without images. 2013-02-18 20:03:27 +01:00
Bastian Kleineidam
4f03963b9e Code cleanup. 2013-02-18 20:02:16 +01:00
Bastian Kleineidam
c4191158ec Sort scrapers only when listing them. 2013-02-18 20:01:50 +01:00
Bastian Kleineidam
dc9334cca9 Fix scraperclass function. Closes issue #7. 2013-02-18 19:59:16 +01:00
Bastian Kleineidam
495b6d006d Fix some comics. 2013-02-16 14:54:08 +01:00
Bastian Kleineidam
a99fbbcf45 Fix ASofterWorld 2013-02-16 14:18:43 +01:00
Bastian Kleineidam
da9eee3bc0 Updated copyright. 2013-02-15 18:32:36 +01:00
Bastian Kleineidam
deae84d8fa Updated comicfury. 2013-02-14 21:28:34 +01:00
Bastian Kleineidam
40de445d8c Allow multiple comic name matches. 2013-02-13 22:18:05 +01:00
Bastian Kleineidam
8a33871df8 Fix some comicfury stuff. 2013-02-13 22:17:39 +01:00
Bastian Kleineidam
93c48fb7e2 Make _BasicScraper hashable. 2013-02-13 20:00:16 +01:00
Bastian Kleineidam
23a1acd398 Add firstStripUrl to scrapers. 2013-02-13 19:59:59 +01:00
Bastian Kleineidam
312d117ff3 Rename get_scrapers to get_scraperclasses 2013-02-13 19:59:13 +01:00
Bastian Kleineidam
96bf9ef523 Recognize internal server errors. 2013-02-13 17:54:10 +01:00
Bastian Kleineidam
752bf1c6ef Updated plugins. 2013-02-13 17:53:25 +01:00
Bastian Kleineidam
e3722c1220 Add SandraAndWoo, SupernormalStep 2013-02-13 17:53:11 +01:00
Bastian Kleineidam
c422a23e27 Add ManlyGuysDoingManlyThings 2013-02-13 17:52:49 +01:00
Bastian Kleineidam
7da45ffe11 Fix LasLindas 2013-02-13 17:52:32 +01:00
Bastian Kleineidam
f16e860f1e Only cache robots.txt URL on memoize. 2013-02-13 17:52:07 +01:00
Bastian Kleineidam
7a98cf7599 Updated copyright. 2013-02-13 06:28:35 +01:00
Bastian Kleineidam
67af7bd115 Fix GUComics 2013-02-13 06:27:46 +01:00
Bastian Kleineidam
e38a766db3 Updated generated plugins. 2013-02-12 21:54:56 +01:00
Bastian Kleineidam
49ddcecb72 Fix Petitesymphony. 2013-02-12 21:14:57 +01:00
Bastian Kleineidam
093c2dcddc Fix EyeOfRamalach 2013-02-12 21:14:44 +01:00
Bastian Kleineidam
7375fa042f Fix AlienShores 2013-02-12 21:14:32 +01:00
Bastian Kleineidam
9ec4a44953 Remove universal strips since they are almost all duplicated and the rest is useless. 2013-02-12 20:56:02 +01:00
Bastian Kleineidam
10f6a1caa1 Correct path quoting. 2013-02-12 17:55:33 +01:00
Bastian Kleineidam
ebfc6cba70 Fix LookingForGroup. 2013-02-12 17:55:13 +01:00
Bastian Kleineidam
6d0fffd825 Always use connection pooling. 2013-02-12 17:55:13 +01:00
Bastian Kleineidam
82ada5fba0 Updated copyright. 2013-02-11 19:54:50 +01:00
Bastian Kleineidam
a35c54525d Work around a bug in python requests. 2013-02-11 19:52:59 +01:00
Bastian Kleineidam
14f0a6fe78 Do not prefetch content with requests >= 1.0 2013-02-11 19:45:21 +01:00
Bastian Kleineidam
67836942d8 Simplify the fetchUrl code. 2013-02-11 19:43:46 +01:00
Bastian Kleineidam
3f0816efe2 Updated copyright 2013-02-10 18:25:21 +01:00
Bastian Kleineidam
9fa9af639b Remove duplicate comic. 2013-02-10 18:24:21 +01:00
Bastian Kleineidam
1c24fca199 Updated comic from generated lists. 2013-02-10 15:07:21 +01:00
wummel
a61c4b4096 Merge pull request #6 from TobiX/for-upstream-2013-02-08
Fix for Spinnerette, 2 new comics
2013-02-09 23:03:45 -08:00
Bastian Kleineidam
e9b63210f9 Add encoding, inline images and guid tags to RSS output. 2013-02-10 08:00:32 +01:00
Bastian Kleineidam
77f3d152c0 Fix imageSearch pattern. 2013-02-08 21:03:23 +01:00
Tobias Gruetzmacher
e67b86c32f Add ParadigmShift.
The file names for this are a bit inconsistent...
2013-02-07 23:57:34 +01:00
Tobias Gruetzmacher
4b6d7c54af Add SkinDeep.
Filenames for this are all over the place :(
2013-02-07 23:57:34 +01:00
Tobias Gruetzmacher
b32dc6fd40 Fix Spinnerette.
The old expression was matching "Previous issue" first and skipping all
comics.
2013-02-07 23:57:34 +01:00
Bastian Kleineidam
419ae5fbcf Raise ValueError when HTML file already exists. 2013-02-07 20:48:03 +01:00
Bastian Kleineidam
1a0cd1ee6b Print HTTP client headers. 2013-02-07 18:28:56 +01:00
Bastian Kleineidam
e16b86d768 Allow debug level to be set. 2013-02-07 18:28:40 +01:00
Bastian Kleineidam
68d58640e8 Added some comics. 2013-02-06 22:27:40 +01:00
Bastian Kleineidam
c19cb93a14 Added some comics. 2013-02-06 22:08:36 +01:00
Bastian Kleineidam
137e30b3ac Added Nedroid comic strip. 2013-02-06 07:03:29 +01:00
Bastian Kleineidam
052e510085 Added HijinksEnsue comic strip. 2013-02-06 06:58:06 +01:00
Bastian Kleineidam
af9d8e90f0 Add missing url variable. 2013-02-06 06:36:50 +01:00
Bastian Kleineidam
a90875f018 Updated copyright. 2013-02-05 19:52:10 +01:00
Bastian Kleineidam
f18b5d5542 Fix Arcamax comics. 2013-02-05 19:51:55 +01:00
Bastian Kleineidam
1451047877 Rename latestUrl in url 2013-02-05 19:51:46 +01:00
Bastian Kleineidam
7f78bea1af Always have an url attribute in comic scrapers. 2013-02-04 21:00:26 +01:00