Commit graph

1133 commits

Author SHA1 Message Date
Dirk Reiners
050a0dc97c MenageA3 naming fix 2018-04-23 08:07:41 +02:00
Dirk Reiners
cba9edbdec LifeAintNoPontFarm added 2018-04-23 08:06:13 +02:00
Dirk Reiners
01c1b04778 CyanideAndHappiness fix 2018-04-23 07:53:22 +02:00
Dirk Reiners
fbd2ac2246 Handle get_terminal_size() returning 0 (fixes #106) 2018-04-23 07:50:28 +02:00
Peter Janes
2a2ff2d545 GoComics no longer has nav on the comic's home page. 2018-04-06 14:09:13 -04:00
Tobias Gruetzmacher
1fe98d2f7f Use a diferent div class for GoComics (fixes #102). 2018-03-23 00:29:40 +01:00
Tobias Gruetzmacher
2dbd3382f7 Update LeastICouldDo (fixes #99) 2017-12-15 00:00:25 +01:00
Tobias Gruetzmacher
75aa7207ea Some minor fixes to make some modules work again. 2017-11-27 01:04:35 +01:00
Tobias Gruetzmacher
405c4c0b43 Recreate SluggyFreelance module (fixes #96). 2017-11-26 20:23:33 +01:00
Tobias Gruetzmacher
90685d9b0c Only support modern versions of PyCountry. 2017-11-26 19:29:48 +01:00
Damjan Košir
79a2516c61 deathbulge fix 2017-11-17 21:49:47 +13:00
Tobias Gruetzmacher
d88f6aeee3 Replace online tests with mocks.
We want to test our code, not the comic modules.
2017-10-15 14:54:44 +02:00
Tobias Gruetzmacher
f1b83748ed When testing the command line, call main method.
Previously, we were spawning the main binary in a subprocess, which is
fragile and interacts poorly with some testing frameworks...
2017-10-15 14:54:44 +02:00
Tobias Gruetzmacher
ac2ca54570 Remove handlers after director run. 2017-10-15 14:54:44 +02:00
Damjan Košir
24862715d5 realised we have a scraper for CMS MenageA3 uses 2017-10-03 21:47:32 +13:00
Damjan Košir
0e0dcf1f8e redoing MenageA3 with ParserScraper (previous search regex was broken) 2017-10-02 21:52:40 +13:00
Tobias Gruetzmacher
6369203bc0 Merge pull request #92 from clonejo/feature/commitstrip
add a comic plugin for CommitStrip
2017-09-20 22:46:46 +02:00
Damjan Košir
89a902651c Merge remote-tracking branch 'origin/master' 2017-09-19 22:36:48 +12:00
Damjan Košir
a9d7b4de12 added Deathbulge 2017-09-19 22:36:19 +12:00
clonejo
331faae3ea
add a comic plugin for CommitStrip 2017-09-18 21:31:15 +02:00
glyphy
ad8374d7b8 Fixing the Menagea3 plugin (#91)
I've changed the menagea3 plugin so it should work with the
new directory structure found on the site.
2017-09-04 21:19:46 +02:00
Tobias Gruetzmacher
7e0adf1d96 Unify more WordPress-based modules. 2017-05-22 01:17:05 +02:00
Tobias Gruetzmacher
42f66c07b0 Random module fixes. 2017-05-22 00:30:31 +02:00
Tobias Gruetzmacher
f8def5b9db Bugfix: StandardError does not exist in Python 3. 2017-05-21 23:37:09 +02:00
Tobias Gruetzmacher
a99098d5ad Update GoComics module. 2017-05-21 23:10:32 +02:00
Tobias Gruetzmacher
1400879dc8 Fix another set of modules (e, k). 2017-05-17 00:11:29 +02:00
Tobias Gruetzmacher
4ee99eb196 Merge pull request #85 from sizlo/improveordering
Preserve the order of images in multi image strips for ordered symlink folders
2017-05-16 23:09:46 +02:00
Tim Brier
95e48b8d8d Keep track of the order of images for multi-image strips in the JSON output 2017-05-15 10:56:47 +01:00
Tobias Gruetzmacher
8b90aa5cfb Some minor style fixes. 2017-05-15 00:54:02 +02:00
Tobias Gruetzmacher
b8484cde50 Fix some more modules. 2017-05-15 00:27:28 +02:00
Tobias Gruetzmacher
ddd3fb418c Remove some broken comics from ComicFury module. 2017-05-14 22:45:12 +02:00
Tobias Gruetzmacher
09687c91f4 Fix some SmackJeeves comics. 2017-05-12 00:32:25 +02:00
sizlo
a83911aa67 Favour the first image we found when we're not expecting multiple images 2017-04-18 21:59:04 +01:00
sizlo
8d84361de4 Preserve the order we found images in when removing duplicate images 2017-04-18 21:58:12 +01:00
Tobias Gruetzmacher
593975d907 Minor cleanups for new modules (see #84). 2017-04-16 01:28:17 +02:00
Tim Brier
233da3e052 Add support for SurvivingTheWorld and TumbleDryComics (#84) 2017-04-16 01:11:30 +02:00
Tobias Gruetzmacher
0973570295 Fix a bunch of modules. 2017-04-16 01:06:41 +02:00
Tobias Gruetzmacher
e6f18a2027 Clean up ComicGenesis 2017-02-27 18:20:54 +01:00
Tobias Gruetzmacher
23dccb184e Replace PyInstaller version hack with something better. 2017-02-14 22:07:52 +01:00
Tobias Gruetzmacher
abb72a3a24 Fix CloneManga modules. 2017-02-13 23:41:45 +01:00
Tobias Gruetzmacher
ebbb27d05d Move xpath_class to helpers module. 2017-02-13 22:41:17 +01:00
Tobias Gruetzmacher
20ab279cde Clean up SmackJeeves...
Currently only covers already existing modules: Removed 11 broken
modules, added 2 and tried to update comic names and the adult and
endOfLife flags from their index. This isn't helped by the fact that
their search seems to skip some comics...
2017-02-13 01:46:49 +01:00
Tobias Gruetzmacher
83187b0554 Fix ViiviJaWagner. 2017-02-12 20:29:57 +01:00
Tobias Gruetzmacher
657e61811d Update list of old and removed modules. 2017-02-12 20:17:07 +01:00
Tobias Gruetzmacher
3b6af33ecb Some small module fixes. 2017-02-12 20:15:25 +01:00
Tobias Gruetzmacher
5359dd8629 Update ComicFury again... 2017-02-12 19:50:51 +01:00
Tobias Gruetzmacher
9895014655 Fix PHD with an ugly hack... 2017-02-12 16:21:36 +01:00
Tobias Gruetzmacher
b57945efd1 Update GoComic modules. 2017-02-12 12:21:01 +01:00
Tobias Gruetzmacher
ebe98bc8ba Fix some modules. 2017-02-12 02:16:38 +01:00
Tobias Gruetzmacher
20ca5d7fc2 Fix some modules. 2017-02-06 00:05:05 +01:00
gruetzkopf
edb49faa8b Add support for 'The Monster under the Bed' 2017-01-22 00:11:05 +01:00
Tobias Gruetzmacher
c4a184d173 Remove some vanished modules. 2017-01-12 02:01:10 +01:00
Tobias Gruetzmacher
36ac459bed Add removed GoComics modules to old list. 2017-01-12 01:22:13 +01:00
Tobias Gruetzmacher
a183e812ae Update GoComics module for new site layout.
(fixes #77)
2017-01-11 02:21:05 +01:00
Tobias Gruetzmacher
061efaac6e New module for ComicSherpa (removed from GoComics) 2017-01-11 01:34:52 +01:00
John Safrit
969e633877 Fix pattern for The Devils Panties 2017-01-08 17:39:59 -05:00
Tobias Gruetzmacher
3f9feec041 Allow modules to ignore some HTTP error codes.
This is neccessary since it seems some webservers out there are
misconfigured to deliver actual content with an HTTP error code...
2016-11-01 18:25:02 +01:00
Tobias Gruetzmacher
46b7a374f6 Small GoComics update. 2016-11-01 02:51:00 +01:00
Tobias Gruetzmacher
f7f4e130bf Small fix to the WLP module. 2016-11-01 02:27:29 +01:00
Tobias Gruetzmacher
bc755d09a3 Apply link modifier to all links.
This was previously only the "previous link modifier", now it can also
modify "next" and "latest" links. Additionally, the modifier is given
the current URL, so those cases can be distinguished.
2016-11-01 01:50:44 +01:00
Tobias Gruetzmacher
7fc05f75f5 Remove broken PetiteSymphony comics. 2016-10-31 07:16:10 +01:00
Tobias Gruetzmacher
69e6318f87 Remove ScurryAndCover, too much JavaScript. 2016-10-31 07:04:00 +01:00
Tobias Gruetzmacher
47e2502ec7 Fix a bunch of comic modules. 2016-10-31 06:57:47 +01:00
Tobias Gruetzmacher
446b81fc45 Fix Wumo and friends. 2016-10-30 15:28:54 +01:00
Tobias Gruetzmacher
51ed898f5d Fix some SmackJeeves comics. 2016-10-30 14:30:45 +01:00
Tobias Gruetzmacher
b6d99945f6 Merge pull request #73 from acaranta/master
Added several SmackJeeves Comics
2016-10-30 11:55:17 +01:00
Tobias Gruetzmacher
3b9f30affd Update ComicFury modules. 2016-10-30 11:04:45 +01:00
Tobias Gruetzmacher
a02660a7d3 Replace custom @memoized with stdlib @lru_cache. 2016-10-29 00:46:49 +02:00
Tobias Gruetzmacher
9a6a310b76 Fixup copyright years. 2016-10-29 00:21:41 +02:00
acaranta
83880a3cbd corrected RainbowMansion 2016-10-27 09:53:34 +02:00
acaranta
0ed823175c Added even more Smackjeeves comics 2016-10-27 06:58:57 +02:00
acaranta
a5c9a3c35c Added several SmackJeeves Comics 2016-10-26 05:25:13 +02:00
Peter Brunner
19445a83ae Fix smbc 2016-10-18 21:28:42 -04:00
Tobias Gruetzmacher
f94caa8a16 Use terminal size calculation from standard library. 2016-10-14 23:55:10 +02:00
Tobias Gruetzmacher
06be2a026b Move some ex-KeenSpot comics to shorter names. 2016-10-14 14:23:33 +02:00
Tobias Gruetzmacher
b17d6e5f22 Rework/fix KeenSpot modules. 2016-10-14 00:14:53 +02:00
Tobias Gruetzmacher
064e7976ec Add namer for Extra Fabulous Comics. 2016-10-06 00:42:50 +02:00
mostlyuseful
fce7dfff19 Add "Extra Fabulous Comics" comic 2016-10-04 17:06:50 +02:00
Tobias Gruetzmacher
f342a93aa1 Update GoComics module. 2016-10-01 03:39:36 +02:00
Tobias Gruetzmacher
c0d945a563 Update ComicFury modules. 2016-10-01 02:52:33 +02:00
Tobias Gruetzmacher
98c98ddfab Fix some more comic modules (c-f). 2016-09-30 00:15:45 +02:00
Tobias Gruetzmacher
b1d2650615 Fix some modules (a&b). 2016-09-29 01:29:01 +02:00
Damjan Košir
c04c62e92b xkcd now hone with xpaths 2016-08-18 21:28:25 +12:00
Damjan Košir
9ba184eb43 fixing LoadingArtist 2016-08-16 21:20:35 +12:00
Hubert Figuière
afcd19bf5b Added Prince of Sartar Comic 2016-08-08 09:18:33 -04:00
Hubert Figuière
81821dc450 Added Space Junk Arlia comic 2016-08-08 09:18:33 -04:00
Tobias Gruetzmacher
fb37f946e0 Speed up comic module tests.
This fakes an If-Modified-Since header, so most web servers don't need
to send comic images at all. This should also reduce the amount of data
that needs to be fetched for comic module tests.
2016-08-01 00:44:34 +02:00
Tobias Gruetzmacher
4f80016bf0 Change robotparser import to make PyInstaller happy. 2016-06-06 22:42:01 +02:00
Tobias Gruetzmacher
64c8e502ca Ignore case for comic download directories.
Since we already match comics case-insensitive on the command line, this
was a logical step, even if this means changing quite a bit of code that
all tries to resolve the "comic directory" in a slightly different
way...
2016-06-06 00:08:29 +02:00
Tobias Gruetzmacher
215d597573 Remove DrunkDuck for now.
- It's been disabled for ages
- Needs a major rework
- I don't want to add that many comics anyways...
- This also gets rid of make_scraper :)
2016-06-05 22:22:17 +02:00
Tobias Gruetzmacher
67d0d38100 Migrate SnafuComics to single-class module. 2016-06-05 22:12:16 +02:00
Tobias Gruetzmacher
125c96e9dc Remove command to download ALL comics... 2016-06-05 21:57:56 +02:00
Tobias Gruetzmacher
df2048cb34 Keep track of removed and moved comics (fixes #41).
I plan on keeping this list for at least ~ 2 releases and then purging
older entries...
2016-06-05 21:47:58 +02:00
Tobias Gruetzmacher
9b755a7e6c Restore BobWhite. 2016-06-05 18:32:27 +02:00
Tobias Gruetzmacher
603fd62a1e Fix workaround for PyInstaller... 2016-06-05 16:01:35 +02:00
Tobias Gruetzmacher
295b53a2d3 Fix name overrides (broken by 51008a). 2016-06-05 10:03:29 +02:00
Tobias Gruetzmacher
844bec09ba Remove another dead comic from ComicFury. 2016-06-05 01:06:04 +02:00
Tobias Gruetzmacher
12123961a4 Fix error in PyInstaller packaged application. 2016-06-05 00:34:16 +02:00
André-Patrick Bubel
2b8e948868 Add String Theory comic 2016-06-01 11:19:17 +00:00
André-Patrick Bubel
192751073c Add KillSixBillionDemons comic 2016-05-31 07:28:32 +00:00
Tobias Gruetzmacher
807bee6342 Migrate GoComics to single-class module. 2016-05-23 00:01:10 +02:00
Tobias Gruetzmacher
2c8e57bdea Migrate Creators to single-class module. 2016-05-22 23:56:59 +02:00
Tobias Gruetzmacher
f5dff27b0a Migrate SmackJeeves to single-class module. 2016-05-22 23:54:21 +02:00
Tobias Gruetzmacher
1ea20e1743 Migrate WebcomicFactory to single-class module. 2016-05-22 23:40:58 +02:00
Tobias Gruetzmacher
c62a7283a2 Migrate ComicFury to single-class module. 2016-05-22 23:31:53 +02:00
Tobias Gruetzmacher
1834bf179f Migrate Arcamax to single-class module. 2016-05-22 23:17:24 +02:00
Tobias Gruetzmacher
f29472c143 Make auto-update script more flexible. 2016-05-22 23:06:05 +02:00
Tobias Gruetzmacher
e4650d5941 Remove make_scraper from Nitrocosm. 2016-05-21 14:35:53 +02:00
Tobias Gruetzmacher
b6eb8ab8ef Remove make_scraper from SandraAndWoo 2016-05-21 14:12:11 +02:00
Tobias Gruetzmacher
4630ea047c Implement Oglaf's strange navigation (fixes #33)
(also should fix wummel#91)
2016-05-21 02:38:07 +02:00
Tobias Gruetzmacher
51008a975b Refactor: Introduce generator methods for scrapers
This allows one comic module class to generate multiple scrapers. This
change is to support a more dynamic module system as described in #42.
2016-05-21 01:29:36 +02:00
Tobias Gruetzmacher
89cfd9d310 Add comics from catomix.com. 2016-05-16 23:55:41 +02:00
Tobias Gruetzmacher
a6cf4e7040 Fix some more comic modules. 2016-05-16 23:16:29 +02:00
Tobias Gruetzmacher
be1a63da0c Update GoComics comic list. 2016-05-16 18:26:45 +02:00
Tobias Gruetzmacher
6d3f74142c Move command line tool into package.
This way we can use the default Python console_scripts install process.
2016-05-16 14:57:47 +02:00
Tobias Gruetzmacher
b9d9564085 Fix Dilbert (fixes #44). 2016-05-16 01:21:23 +02:00
Tobias Gruetzmacher
e9b3c487c0 Remove some dead comics. 2016-05-16 01:10:20 +02:00
Tobias Gruetzmacher
bd60155d9f Some more ComicFury comics gone... 2016-05-16 00:53:22 +02:00
Tobias Gruetzmacher
849e60e795 Remove make_scraper magic from webcomiceu. 2016-05-07 03:20:01 +02:00
Tobias Gruetzmacher
975d2376bf Another round of comic module fixes. 2016-05-07 01:50:10 +02:00
Tobias Gruetzmacher
efe1308db2 Replace home-grown Python2/3 compat. with six. 2016-05-05 23:33:48 +02:00
Tobias Gruetzmacher
77ed0218e0 Fix some comic modules. 2016-05-05 20:55:14 +02:00
Tobias Gruetzmacher
bb2ac39639 Fix some URLs. 2016-05-05 10:12:03 +02:00
Tobias Gruetzmacher
d05316e3ac Seems ComicFury is deleting comics regularly...
Well, there's nothing we can do: Remove them.
2016-05-04 08:26:53 +02:00
Tobias Gruetzmacher
0c1aa9e8bd Move libxml < 2.9.3 workaround to base class. 2016-05-02 23:22:06 +02:00
Tobias Gruetzmacher
b93a8fde65 Move PensAndTales comics and fix them. 2016-05-02 22:32:14 +02:00
Tobias Gruetzmacher
4006ced43d Move all HijinksEnsue comics into alphabetic files. 2016-05-02 01:25:34 +02:00
Tobias Gruetzmacher
d5f91ecfd2 Fix some modules in m.py. 2016-04-30 01:59:28 +02:00
Tobias Gruetzmacher
1d52d33311 Remove missing SmackJeeves comics. 2016-04-30 00:56:20 +02:00
Tobias Gruetzmacher
d796f3476c Fix some modules in d.py. 2016-04-30 00:44:18 +02:00
Tobias Gruetzmacher
cc16fea880 Fix some modules in c.py 2016-04-29 00:35:02 +02:00
Tobias Gruetzmacher
1d94439715 Fix some more comic modules. 2016-04-27 00:31:27 +02:00
Tobias Gruetzmacher
8b1ac4eb35 Fix "tagsoup" on SmackJeeves
Unfortunatly, browsers render < outside of HTML tags differently then
libXML until recently (libXML 2.9.3), so we need to preprocess pages
before parsing them...

(This was fixed in libXML commit 140c25)
2016-04-26 08:05:38 +02:00
Tobias Gruetzmacher
035d6e94e4 Allow output level for warnings and errors. 2016-04-26 07:53:53 +02:00
Tobias Gruetzmacher
8ddf553eb4 Fix some more SmackJeeves modules. 2016-04-22 01:04:47 +02:00
Tobias Gruetzmacher
fd85c8583a Unify similar code in fetchUrl and fetchText 2016-04-22 00:42:46 +02:00
Tobias Gruetzmacher
6574997e01 Refactor: All the other class methods.
Turns out, it would have been better if all methods had been instance
methods and not class methods. This finished a big chunk of the rework
needed for #42.
2016-04-21 23:52:31 +02:00
Tobias Gruetzmacher
0d436b8ca9 Refactor: url modifiers to normal methods.
As before, to implement #42 these might want to access information from
the instance, so they should be normal methods.
2016-04-21 21:39:25 +02:00
Tobias Gruetzmacher
c3f32dfef7 Refactor: Make namer a method.
When #42 is realized, the naming of files might differ between comic
modules, so the namer's logical location is the instance, not the class.
2016-04-21 08:20:49 +02:00
Tobias Gruetzmacher
5bd2a49f48 Add debug output on matched XPath/CSS expression. 2016-04-20 23:51:54 +02:00
Tobias Gruetzmacher
fe51a449df Update SmackJeeves
- Now uses _ParserScraper, which makes the pattern quite a bit more
  generic and IMHO more readable
- remove make_scraper magic
- No new comics, only fixed existing ones and removed some dead ones.
2016-04-20 23:36:45 +02:00
Tobias Gruetzmacher
190cd3b063 Convert language & getDisabledReasons to methods.
Both are more properties of a webcomic (this is part of the design
changes for #42)
2016-04-19 23:53:46 +02:00
Tobias Gruetzmacher
df46907f39 Register EXSLT extensions by default.
This allows comic module authors to use the full power of regular
expressions in XPath expression, see http://exslt.org/regexp/regexp.html
for usage. Please be aware that these use the prefix re: instead of
regexp: here.
2016-04-19 23:48:14 +02:00
Tobias Gruetzmacher
4204f5f1e4 Send "If-Modified-Since" header for images. 2016-04-19 00:36:50 +02:00
Tobias Gruetzmacher
13a3409854 Remove some comics that are gone or block us. 2016-04-17 19:42:43 +02:00
Tobias Gruetzmacher
1fbc844077 Update GoComics. 2016-04-17 18:40:09 +02:00
Tobias Gruetzmacher
73e958670d Update ComicFury (again). 2016-04-17 16:19:44 +02:00
Tobias Gruetzmacher
b0481a01f7 Update languages. 2016-04-16 13:14:12 +02:00
Tobias Gruetzmacher
3329027e4b Update ComicFury. 2016-04-16 13:13:47 +02:00
Tobias Gruetzmacher
ee99c087d7 Remove prevUrlMatchesStripUrl.
It was only used for one test.
2016-04-16 01:14:26 +02:00
Tobias Gruetzmacher
92a688457a Remove useless indirection. 2016-04-15 23:42:24 +02:00
Tobias Gruetzmacher
52515b5fc5 Update GoComics. 2016-04-15 00:26:14 +02:00
Tobias Gruetzmacher
031a523846 Fix SnafuComics. 2016-04-14 23:52:35 +02:00
Tobias Gruetzmacher
7626b1e100 Webcomics Nation is gone. 2016-04-14 22:46:52 +02:00
Tobias Gruetzmacher
497653c448 Remove make_scraper magic from Arcamax. 2016-04-14 00:17:59 +02:00
Tobias Gruetzmacher
db87ed95e7 Use new features to make modules simpler. 2016-04-13 23:28:43 +02:00
Tobias Gruetzmacher
b266e28ae1 Remove debugging prints 😭 2016-04-13 22:59:06 +02:00
Tobias Gruetzmacher
ff3b824311 Fix variable shadowing... 2016-04-13 22:43:34 +02:00
Tobias Gruetzmacher
060281e5ff Use concrete scraper objects everywhere.
This is a first step for #42. Since most access to the scraper classes
is through instances, modules can now dynamically override url and name
(name is now a property).
2016-04-13 22:17:30 +02:00
Tobias Gruetzmacher
0468f2f31a Refactor: Convert starter to simple method. 2016-04-13 20:01:51 +02:00
Tobias Gruetzmacher
16004e43e4 Use default bounceStarter for site modules. 2016-04-13 01:24:13 +02:00
Tobias Gruetzmacher
9028724a74 Clean up update helper scripts. 2016-04-13 00:52:16 +02:00
Tobias Gruetzmacher
42e43fa4e6 Read starter parameters from class.
This allows to specify starters in a more declarative and dynamic way.
2016-04-12 23:11:39 +02:00
Tobias Gruetzmacher
b865a171f9 Remove some broken comics. 2016-04-12 08:21:06 +02:00
Tobias Gruetzmacher
4e2e4ac529 Prevent scraper from moving to a different comic. 2016-04-12 08:10:47 +02:00
Tobias Gruetzmacher
443ab119e9 Refresh GoComics list from online directory. 2016-04-12 00:36:33 +02:00
Tobias Gruetzmacher
0e385a3697 Update GoComics (no change in supported comics)
- remove make_scraper magic
- switch to _ParserScraper
2016-04-11 22:42:01 +02:00
Tobias Gruetzmacher
ad7a297964 Fix WLP comics. 2016-04-11 01:07:21 +02:00
Damjan Košir
af2e57d850 Added comic ScurryAndCover...
- Yay, funky JavaScript parsing!
- Start page isn't latest comic...

Updated-by: Tobias Gruetzmacher <tobias-git@23.gs>
2016-04-11 00:09:53 +02:00
Tobias Gruetzmacher
fa98f6ddbf Move more comics to common WordPressScraper. 2016-04-10 23:04:34 +02:00
Tobias Gruetzmacher
f6e605e146 Fix unicode error in text search. 2016-04-10 13:16:30 +02:00
Tobias Gruetzmacher
bc10bd9a4d Streamline color output.
- Depend on external colorama instead of embedding an old copy.
- Move most output code into output module.
- Convert pager to context manager.
2016-04-10 03:45:00 +02:00
Tobias Gruetzmacher
bb5b6ffcec Fix comics in module a.py. 2016-04-07 23:21:31 +02:00
Tobias Gruetzmacher
0033a8046b Fix creators module. 2016-04-07 00:20:03 +02:00
Tobias Gruetzmacher
8768ff07b6 Fix AhoiPolloi, be a bit smarter about encoding.
HTML character encoding in the context of HTTP is quite tricky to get
right and honestly, I'm not sure if I did get it right this time. But I
think, the current behaviour matches best what web browsers try to do:

1. Let Requests figure out the content from the HTTP header. This
   overrides everything else. We need to "trick" LXML to accept our
   decision if the document contains an XML declaration which might
   disagree with the HTTP header.
2. If the HTTP headers don't specify any encoding, let LXML guess the
   encoding and be done with it.
2016-04-06 22:22:22 +02:00
Tobias Gruetzmacher
183d18e7bc Skip non-image on xkcd. 2016-04-06 00:50:01 +02:00
Tobias Gruetzmacher
9feaf245f2 Fixed & removed some comics in s.py. 2016-04-06 00:40:13 +02:00
Tobias Gruetzmacher
6bbdcfb341 BloomingFaeries: Don't download every page twice.
(Also, simplify namer, switch to _ParserScraper)
2016-04-05 23:58:43 +02:00
Tobias Gruetzmacher
8db6f8e8b7 Fix ZapComics, remove ZebraGirl.
- ZebraGirl is now ComicFury/ZebraGirl...
2016-04-04 00:27:11 +02:00
Tobias Gruetzmacher
0bcfb8a82e Move ComicControl into common module.
- Move all comics using ComicControl into alphabetical files.
- Add BalderDash & Picklewhistle
2016-04-04 00:12:53 +02:00
Tobias Gruetzmacher
0d453a6858 Move Flowerlark Studios into alphabetical files. 2016-04-03 22:58:01 +02:00
Tobias Gruetzmacher
a9f0dfdce4 Merge pull request #39 from peterjanes/peterjanes/sherman-fix
Fix Sherman's Lagoon
2016-04-03 22:20:04 +02:00
Tobias Gruetzmacher
926439cd14 Every comic need an url. 2016-04-03 22:03:16 +02:00
Tobias Gruetzmacher
2c6decb7f5 Move WebcomicFactory in its own module.
Also, add an updater script for it.
2016-04-03 21:31:56 +02:00
Peter Janes
759bd0c360 Fix Sherman's Lagoon 2016-04-03 14:54:41 -04:00
Tobias Gruetzmacher
bb1f20d867 Remove make_scraper for most WordPress comics.
- Dropped KatzenfutterGeleespritzer, because robots.txt.
- Move all WordPress/ComicPress scrapers into alphabetical files.
- Move _WordPressScraper & _ComicPress scraper into common.py.
- Some smaller PEP8 fixes.
2016-04-02 00:19:53 +02:00
Tobias Gruetzmacher
7f1e136d8b Sort comics alphabetically & PEP8 style fixes. 2016-03-31 23:13:54 +02:00
Tobias Gruetzmacher
d6db1d0b81 Fix a conflict with IPython. 2016-03-20 23:57:07 +01:00
Tobias Gruetzmacher
90dfceaeb1 Remove dead modules (& format). 2016-03-20 20:48:42 +01:00
Tobias Gruetzmacher
f243096d49 Fix GastroPhobia, remove GeneralProtectionFault.
(& formatting)
2016-03-20 20:11:21 +01:00
Tobias Gruetzmacher
cfcfcc2468 Switch plugin loading to pkgutil.
This should work with all PEP-302 loaders that implement iter_modules.
Unfortunatly, PyInstaller (which I plan to use for Windows releases)
does not support it, so we don't get around a special case. Anyways,
this should help for #22.
2016-03-20 15:13:24 +01:00
Tobias Gruetzmacher
1af022895e Fix NuklearPower (fixes #38).
Also remove make_scraper magic.
2016-03-17 23:19:52 +01:00
Tobias Gruetzmacher
552f29e5fc Update ComicFury comics. (+871, -245)
- Remove make_scraper magic
- Switch to HTML parser
- Update parsing of comic listing.
2016-03-17 00:44:06 +01:00
Tobias Gruetzmacher
6727e9b559 Use vendored urllib3.
As long as requests ships with urllib3, we can't fall back to the
"system" urllib3, since that breaks class-identity checks.
2016-03-16 23:18:19 +01:00
Damjan Košir
615f094ef3 fixing EdmundFinney 2016-03-14 20:32:18 +13:00
Tobias Gruetzmacher
c4fcd985dd Let urllib3 handle all retries. 2016-03-13 21:30:36 +01:00
Tobias Gruetzmacher
78e13962f9 Sort scraper modules (mostly for test stability). 2016-03-13 20:24:21 +01:00
Tobias Gruetzmacher
017d35cb3c Fallback version if pkg_resources not available.
This helps for Windows packaging.
2016-03-03 01:05:36 +01:00
Johannes Schöpp
351fa7154e Modified maximum page size
Fixes #36
2016-03-01 22:19:44 +01:00
Damjan Košir
b0dc510b08 adding LastNerdsOnEarth 2016-01-03 14:16:58 +13:00
Damjan Košir
a1e79cbbf2 fixing Fragile 2016-01-03 14:08:49 +13:00
Tobias Gruetzmacher
81827f83bc Use GitHub releases API for update checks. 2015-11-06 23:07:19 +01:00
Tobias Gruetzmacher
a41574e31a Make version fetching a bit more robust (use pbr). 2015-11-06 22:08:14 +01:00
Tobias Gruetzmacher
64f7e313d5 Remove make_scraper magic from footloosecomic.py. 2015-11-05 00:03:13 +01:00
Tobias Gruetzmacher
7f7a69818b Remove make_scraper magic from creators module. 2015-11-04 23:43:31 +01:00
Tobias Gruetzmacher
94470d564c Fix import for Python 3. 2015-11-03 23:40:45 +01:00
Tobias Gruetzmacher
b819afec39 Switch build to PBR.
This gets us:
- Automatic changelog
- Automatic authors list
- Automatic git version management
2015-11-03 23:27:53 +01:00
Tobias Gruetzmacher
dc22d7b32a Add CatNine comic. 2015-11-02 23:29:56 +01:00
Tobias Gruetzmacher
10d9eac574 Remove support for very old versions of "requests". 2015-11-02 23:24:01 +01:00
MariusK
3e1ea816cc Fixed 'Ruthe' 2015-10-02 13:52:44 +02:00
Helge Stasch
48d8519efd Changed Goblins comic - moved to new scraper and fixed minor issues with some comics (old scrapper was unstable for some comics of Goblins) 2015-09-28 23:50:15 +02:00
Helge Stasch
17fbdf2bf7 Added comic "Ahoy Earth" 2015-09-27 00:44:47 +02:00
Tobias Gruetzmacher
d72ceb92d5 BloomingFaeries: Remove imageUrlModifier (not needed). 2015-09-04 00:37:05 +02:00
Tobias Gruetzmacher
abd80a1d35 Merge pull request #28 from KevinAnthony/master
added comic Blooming Faeries
2015-09-03 23:26:37 +02:00
Tobias Gruetzmacher
b737218182 ZenPencils: Allow multiple images per page. 2015-09-03 23:24:28 +02:00
Kevin Anthony
62ec1f1d18 Removed debugging print state 2015-09-02 11:22:24 -04:00
Kevin Anthony
d7180eaf99 removed bad whitespace 2015-09-02 11:04:32 -04:00
Kevin Anthony
6e8231e78a Added Namer to BloomingFaeries since the web comic author doesn't seem intrested in sticking to any kind of file naming convention 2015-09-02 11:01:48 -04:00
Kevin Anthony
1045bb7d4a added comic Blooming Faeries 2015-09-02 10:13:42 -04:00
Damjan Košir
11f0aa3989 created Wordpress Scraper class 2015-08-11 21:31:45 +12:00
Damjan Košir
0a5b792c32 added Fragile (English and Spanish) 2015-08-07 23:37:10 +12:00
Damjan Košir
fd9c480d9c adding bonus panel to SWBC and multiple images flag to ParserScraper 2015-08-03 22:58:44 +12:00
Damjan Košir
f8a163a361 added a CMS ComicControl, moved some existing comics there, added StreetFighter and Metacarpolis 2015-08-03 22:40:06 +12:00
Damjan Košir
648a84e38e added Sharksplode 2015-08-03 22:20:17 +12:00
Damjan Košir
c19806b681 added AoiHouse 2015-07-31 23:33:30 +12:00
Damjan Košir
2201c9877a added KiwiBlitz 2015-07-31 23:09:56 +12:00
Damjan Košir
fe22df5e5b added LetsSpeakEnglish 2015-07-31 23:06:06 +12:00
Damjan Košir
79ec427fc0 added CatVersusHuman 2015-07-30 22:16:34 +12:00
Tobias Gruetzmacher
303432fc68 Also use css expressions for textSearch. 2015-07-18 01:22:40 +02:00
Tobias Gruetzmacher
6a70bf4671 Enable some comics based on current policy. 2015-07-18 01:21:29 +02:00
Tobias Gruetzmacher
6b0046f9b3 Fix small typos. 2015-07-18 00:11:44 +02:00
Tobias Gruetzmacher
68d4dd463a Revert robots.txt handling.
This brings us back to only honouring robots.txt on page downloads, not
on image downloads.

Rationale: Dosage is not a "robot" in the classical sense. It's not
designed to spider huge amounts of web sites in search for some content
to index, it's only intended to help users keep a personal archive of
comics he is interested in. We try very hard to never download any image
twice. This fixes #24.

(Precedent for this rationale: Google Feedfetcher:
https://support.google.com/webmasters/answer/178852?hl=en#robots)
2015-07-17 20:46:56 +02:00
Tobias Gruetzmacher
7d3bd15c2f Remove AbleAndBaker, site is gone. 2015-07-16 00:49:48 +02:00
Tobias Gruetzmacher
472afa24d3 GoComics doesn't allow spiders, disable them...
This removes 757 comics, including quite popular ones like Calvin and
Hobbes, Garfield, FoxTrot, etc. :(
2015-07-16 00:36:10 +02:00
Tobias Gruetzmacher
7c15ea50d8 Also check robots.txt on image downloads.
We DO want to honour if images are blocked by robots.txt
2015-07-15 23:50:57 +02:00
Tobias Gruetzmacher
5affd8af68 More relaxed robots.txt handling.
This is in line with how Perl's LWP::RobotUA and Google handles server
errors when fetching robots.txt: Just assume access is allowed.

See https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
2015-07-15 19:11:55 +02:00
Tobias Gruetzmacher
88e387ad15 Add Sleepless Domain. 2015-07-12 18:31:21 +02:00
Tobias Gruetzmacher
0b6d7425e1 Remove BladeKitten.
It's not available online anymore, only in print or as a PDF download.
2015-07-11 01:29:21 +02:00
Tobias Gruetzmacher
808b624e5f Remove hard dependency on pycountry again.
This basically reverts commit 86b31dc12b.

It now works like this: If the use has pycountry installed, it is used.
If not, Dosage falls back to a small internal list generated from
pycountry by scripts/mklanguages.py.

This means additional work if we ever decide to translate Dosage, since
pycountry already has all the translations for language names...

This fixes #23.
2015-07-11 01:27:39 +02:00
Tobias Gruetzmacher
d97a9c63e4 Add Erstwhile. 2015-07-10 01:14:56 +02:00
Damjan Košir
7abca1222b added NerfNow 2015-07-07 22:18:06 +12:00
Damjan Košir
119a3cd13a added text to ScandinaviaAndTheWorld 2015-07-07 19:48:25 +12:00
Damjan Košir
5f243e3868 not a comic 2015-07-05 18:33:14 +12:00
Damjan Košir
5e7ad33fc8 Nnewts disabled 2015-07-05 18:32:33 +12:00
Damjan Košir
45012ff9c3 BladeKitten disabled 2015-07-05 18:31:38 +12:00
Tobias Gruetzmacher
0c6feec8cd Fix module name EastCoastVsWestCoast. 2015-06-24 00:51:42 +02:00
Damjan Košir
96572e8cba added TheMelvinChronicles 2015-06-12 21:00:11 +12:00
Damjan Košir
6412e6e542 fixed Spinnerette 2015-06-08 20:31:13 +12:00
Damjan Košir
3d8a49d228 realised TheWebcomicFactory is actually 28 comics... added them 2015-06-07 21:33:59 +12:00
Damjan Košir
05bb22b3ef added TheWebcomicFactory 2015-06-06 14:25:32 +12:00
Damjan Košir
c98800388e added Sithrah 2015-06-04 19:24:55 +12:00
Damjan Košir
010b4bf669 renaming comicpress to wordpress (as it's not just for the comicpress theme) 2015-06-04 19:12:40 +12:00
Damjan Košir
bc91f5f1fb added MistyTheMouse 2015-06-04 19:06:40 +12:00
Damjan Košir
e2d01e4924 fixed ScandinaviaAndTheWorld 2015-06-04 18:58:59 +12:00
Damjan Košir
545a67111e fixed Alice 2015-06-01 15:15:34 +12:00
Damjan Košir
a08ad2dc80 fixed GoGetARoomie 2015-06-01 15:11:16 +12:00
Damjan Košir
ceb19ed2bc fixed Wulffmorgenthaler (now Wumo), added TruthFacts and MeAndDanielle 2015-06-01 12:14:52 +12:00
Damjan Košir
4cd88ecdc0 fixed WormWorldSaga 2015-06-01 11:45:22 +12:00
Damjan Košir
ea6cb925a6 fixed LoadingArtist 2015-06-01 11:33:50 +12:00
Damjan Košir
e268b09567 fixed EarthsongSaga 2015-06-01 11:19:02 +12:00
Damjan Košir
29c8d2eea0 fixed Meek 2015-05-31 23:41:12 +12:00
Damjan Košir
9be6f613e4 fixed MysteriesOfTheArcana 2015-05-31 23:39:04 +12:00
Damjan Košir
3ea8236224 fixed FowlLanguage 2015-05-31 23:29:34 +12:00
Damjan Košir
c1245a85ad moved Footloose, added Cherry, Desigaspring 2015-05-31 23:23:02 +12:00
Damjan Košir
01aeebfbe4 fixed Footloose 2015-05-31 23:16:12 +12:00
Damjan Košir
029fa74067 fixed Bardsworth 2015-05-31 23:03:40 +12:00
Damjan Košir
f3036de8fd fixed Pimpette 2015-05-31 22:57:25 +12:00
Damjan Košir
df7404fd7c fixed CatsAndCameras 2015-05-31 22:50:17 +12:00
Damjan Košir
d4cc8ac857 added buni 2015-05-27 20:36:11 +12:00
Damjan Košir
9beeceffad added BusinessCat and HappyJar 2015-05-27 20:34:51 +12:00
Damjan Košir
d970d27b14 removing duplicate 2015-05-27 00:10:46 +12:00
Damjan Košir
33abd95348 fixed TheGentlemansArmchair 2015-05-26 23:48:22 +12:00
Damjan Košir
5e123ae79e fixed DarkWings (now available under the real name Eryl as well), added Ashes, Laiyu, NoMoreSavePoints and EasilyAmused 2015-05-26 23:43:15 +12:00
Damjan Košir
9adb020fc2 fixed DemolitionSquad 2015-05-26 22:59:25 +12:00
Damjan Košir
605c5f8619 fixed PokeyThePenguin 2015-05-26 22:31:43 +12:00
Damjan Košir
766b7ba99d fixed ProperBarn, added 2214 and OTE 2015-05-26 22:16:55 +12:00
Damjan Košir
2c41435ceb fixing HijiNKS ENSUE and added all 4 comics on that page 2015-05-26 22:06:55 +12:00
Damjan Košir
465e7eaf6f fixing CowboyJedi kinda... there is currently no comic on the front page and the author knows it 2015-05-26 21:35:36 +12:00
Damjan Košir
529a41397a fixing CorydonCafe 2015-05-26 21:32:25 +12:00
Damjan Košir
c3abb93e99 fixing ChainsawSuit 2015-05-26 19:53:04 +12:00
Damjan Košir
f8690af029 fixing Curvy 2015-05-26 19:47:31 +12:00
Damjan Košir
36c790fa4b fixing CraftedFables 2015-05-26 19:32:12 +12:00
Damjan Košir
7067c51056 fixed CheckerboardNightmare 2015-05-25 22:19:36 +12:00
Damjan Košir
5569439c43 fixed 16 comics 2015-05-25 21:57:06 +12:00
Damjan Košir
3edaa97fb9 fixing KatzenfutterGeleespritzer 2015-05-25 20:06:58 +12:00
Damjan Košir
8a245e1d10 fixing BloodBound 2015-05-21 00:04:07 +12:00
Damjan Košir
dc2349951a moving BroodHollow to comicpress 2015-05-21 00:00:35 +12:00
Damjan Košir
a05ae9c75d fixing PandyLand 2015-05-20 23:56:49 +12:00
Damjan Košir
fd60065591 fixing OnTheEdge 2015-05-20 23:50:18 +12:00
Damjan Košir
80b783c016 fixing CourtingDisaster 2015-05-20 23:16:54 +12:00
Damjan Košir
ff239ff58e Merge branch 'comicpress' 2015-05-20 23:12:03 +12:00
Damjan Košir
77c5dbce9b better prevSearch for comic press 2015-05-20 23:08:02 +12:00
Damjan Košir
bc4e7a03f2 fixed BroodHollow 2015-05-20 23:03:15 +12:00
Damjan Košir
8de620c78b fixed CigarroAndCerveja 2015-05-20 22:58:13 +12:00
Damjan Košir
4529fdee3b adding no downsize option 2015-05-20 22:38:29 +12:00
Damjan Košir
77a9cce00d fixing Hipsters 2015-05-19 19:49:45 +12:00
Damjan Košir
79d775a8d9 adding comicpress scraper 2015-05-16 00:15:32 +12:00
Damjan Košir
962286d391 fixed OctopusPie 2015-05-14 23:06:12 +12:00
Damjan Košir
3bbf2d5c23 fixing neko the kitty 2015-05-14 22:42:04 +12:00
Damjan Košir
f75fc62e84 fixing pebbleversion 2015-05-14 22:33:46 +12:00