Tobias Gruetzmacher
52515b5fc5
Update GoComics.
2016-04-15 00:26:14 +02:00
Tobias Gruetzmacher
031a523846
Fix SnafuComics.
2016-04-14 23:52:35 +02:00
Tobias Gruetzmacher
7626b1e100
Webcomics Nation is gone.
2016-04-14 22:46:52 +02:00
Tobias Gruetzmacher
497653c448
Remove make_scraper magic from Arcamax.
2016-04-14 00:17:59 +02:00
Tobias Gruetzmacher
db87ed95e7
Use new features to make modules simpler.
2016-04-13 23:28:43 +02:00
Tobias Gruetzmacher
b266e28ae1
Remove debugging prints 😭
2016-04-13 22:59:06 +02:00
Tobias Gruetzmacher
ff3b824311
Fix variable shadowing...
2016-04-13 22:43:34 +02:00
Tobias Gruetzmacher
060281e5ff
Use concrete scraper objects everywhere.
...
This is a first step for #42 . Since most access to the scraper classes
is through instances, modules can now dynamically override url and name
(name is now a property).
2016-04-13 22:17:30 +02:00
Tobias Gruetzmacher
0468f2f31a
Refactor: Convert starter to simple method.
2016-04-13 20:01:51 +02:00
Tobias Gruetzmacher
16004e43e4
Use default bounceStarter for site modules.
2016-04-13 01:24:13 +02:00
Tobias Gruetzmacher
9028724a74
Clean up update helper scripts.
2016-04-13 00:52:16 +02:00
Tobias Gruetzmacher
42e43fa4e6
Read starter parameters from class.
...
This allows to specify starters in a more declarative and dynamic way.
2016-04-12 23:11:39 +02:00
Tobias Gruetzmacher
b865a171f9
Remove some broken comics.
2016-04-12 08:21:06 +02:00
Tobias Gruetzmacher
4e2e4ac529
Prevent scraper from moving to a different comic.
2016-04-12 08:10:47 +02:00
Tobias Gruetzmacher
443ab119e9
Refresh GoComics list from online directory.
2016-04-12 00:36:33 +02:00
Tobias Gruetzmacher
0e385a3697
Update GoComics (no change in supported comics)
...
- remove make_scraper magic
- switch to _ParserScraper
2016-04-11 22:42:01 +02:00
Tobias Gruetzmacher
ad7a297964
Fix WLP comics.
2016-04-11 01:07:21 +02:00
Damjan Košir
af2e57d850
Added comic ScurryAndCover...
...
- Yay, funky JavaScript parsing!
- Start page isn't latest comic...
Updated-by: Tobias Gruetzmacher <tobias-git@23.gs>
2016-04-11 00:09:53 +02:00
Tobias Gruetzmacher
fa98f6ddbf
Move more comics to common WordPressScraper.
2016-04-10 23:04:34 +02:00
Tobias Gruetzmacher
f6e605e146
Fix unicode error in text search.
2016-04-10 13:16:30 +02:00
Tobias Gruetzmacher
bc10bd9a4d
Streamline color output.
...
- Depend on external colorama instead of embedding an old copy.
- Move most output code into output module.
- Convert pager to context manager.
2016-04-10 03:45:00 +02:00
Tobias Gruetzmacher
bb5b6ffcec
Fix comics in module a.py.
2016-04-07 23:21:31 +02:00
Tobias Gruetzmacher
0033a8046b
Fix creators module.
2016-04-07 00:20:03 +02:00
Tobias Gruetzmacher
8768ff07b6
Fix AhoiPolloi, be a bit smarter about encoding.
...
HTML character encoding in the context of HTTP is quite tricky to get
right and honestly, I'm not sure if I did get it right this time. But I
think, the current behaviour matches best what web browsers try to do:
1. Let Requests figure out the content from the HTTP header. This
overrides everything else. We need to "trick" LXML to accept our
decision if the document contains an XML declaration which might
disagree with the HTTP header.
2. If the HTTP headers don't specify any encoding, let LXML guess the
encoding and be done with it.
2016-04-06 22:22:22 +02:00
Tobias Gruetzmacher
183d18e7bc
Skip non-image on xkcd.
2016-04-06 00:50:01 +02:00
Tobias Gruetzmacher
9feaf245f2
Fixed & removed some comics in s.py.
2016-04-06 00:40:13 +02:00
Tobias Gruetzmacher
6bbdcfb341
BloomingFaeries: Don't download every page twice.
...
(Also, simplify namer, switch to _ParserScraper)
2016-04-05 23:58:43 +02:00
Tobias Gruetzmacher
8db6f8e8b7
Fix ZapComics, remove ZebraGirl.
...
- ZebraGirl is now ComicFury/ZebraGirl...
2016-04-04 00:27:11 +02:00
Tobias Gruetzmacher
0bcfb8a82e
Move ComicControl into common module.
...
- Move all comics using ComicControl into alphabetical files.
- Add BalderDash & Picklewhistle
2016-04-04 00:12:53 +02:00
Tobias Gruetzmacher
0d453a6858
Move Flowerlark Studios into alphabetical files.
2016-04-03 22:58:01 +02:00
Tobias Gruetzmacher
a9f0dfdce4
Merge pull request #39 from peterjanes/peterjanes/sherman-fix
...
Fix Sherman's Lagoon
2016-04-03 22:20:04 +02:00
Tobias Gruetzmacher
926439cd14
Every comic need an url.
2016-04-03 22:03:16 +02:00
Tobias Gruetzmacher
2c6decb7f5
Move WebcomicFactory in its own module.
...
Also, add an updater script for it.
2016-04-03 21:31:56 +02:00
Peter Janes
759bd0c360
Fix Sherman's Lagoon
2016-04-03 14:54:41 -04:00
Tobias Gruetzmacher
bb1f20d867
Remove make_scraper for most WordPress comics.
...
- Dropped KatzenfutterGeleespritzer, because robots.txt.
- Move all WordPress/ComicPress scrapers into alphabetical files.
- Move _WordPressScraper & _ComicPress scraper into common.py.
- Some smaller PEP8 fixes.
2016-04-02 00:19:53 +02:00
Tobias Gruetzmacher
7f1e136d8b
Sort comics alphabetically & PEP8 style fixes.
2016-03-31 23:13:54 +02:00
Tobias Gruetzmacher
d6db1d0b81
Fix a conflict with IPython.
2016-03-20 23:57:07 +01:00
Tobias Gruetzmacher
90dfceaeb1
Remove dead modules (& format).
2016-03-20 20:48:42 +01:00
Tobias Gruetzmacher
f243096d49
Fix GastroPhobia, remove GeneralProtectionFault.
...
(& formatting)
2016-03-20 20:11:21 +01:00
Tobias Gruetzmacher
cfcfcc2468
Switch plugin loading to pkgutil.
...
This should work with all PEP-302 loaders that implement iter_modules.
Unfortunatly, PyInstaller (which I plan to use for Windows releases)
does not support it, so we don't get around a special case. Anyways,
this should help for #22 .
2016-03-20 15:13:24 +01:00
Tobias Gruetzmacher
1af022895e
Fix NuklearPower ( fixes #38 ).
...
Also remove make_scraper magic.
2016-03-17 23:19:52 +01:00
Tobias Gruetzmacher
552f29e5fc
Update ComicFury comics. (+871, -245)
...
- Remove make_scraper magic
- Switch to HTML parser
- Update parsing of comic listing.
2016-03-17 00:44:06 +01:00
Tobias Gruetzmacher
6727e9b559
Use vendored urllib3.
...
As long as requests ships with urllib3, we can't fall back to the
"system" urllib3, since that breaks class-identity checks.
2016-03-16 23:18:19 +01:00
Damjan Košir
615f094ef3
fixing EdmundFinney
2016-03-14 20:32:18 +13:00
Tobias Gruetzmacher
c4fcd985dd
Let urllib3 handle all retries.
2016-03-13 21:30:36 +01:00
Tobias Gruetzmacher
78e13962f9
Sort scraper modules (mostly for test stability).
2016-03-13 20:24:21 +01:00
Tobias Gruetzmacher
017d35cb3c
Fallback version if pkg_resources not available.
...
This helps for Windows packaging.
2016-03-03 01:05:36 +01:00
Johannes Schöpp
351fa7154e
Modified maximum page size
...
Fixes #36
2016-03-01 22:19:44 +01:00
Damjan Košir
b0dc510b08
adding LastNerdsOnEarth
2016-01-03 14:16:58 +13:00
Damjan Košir
a1e79cbbf2
fixing Fragile
2016-01-03 14:08:49 +13:00
Tobias Gruetzmacher
81827f83bc
Use GitHub releases API for update checks.
2015-11-06 23:07:19 +01:00
Tobias Gruetzmacher
a41574e31a
Make version fetching a bit more robust (use pbr).
2015-11-06 22:08:14 +01:00
Tobias Gruetzmacher
64f7e313d5
Remove make_scraper magic from footloosecomic.py.
2015-11-05 00:03:13 +01:00
Tobias Gruetzmacher
7f7a69818b
Remove make_scraper magic from creators module.
2015-11-04 23:43:31 +01:00
Tobias Gruetzmacher
94470d564c
Fix import for Python 3.
2015-11-03 23:40:45 +01:00
Tobias Gruetzmacher
b819afec39
Switch build to PBR.
...
This gets us:
- Automatic changelog
- Automatic authors list
- Automatic git version management
2015-11-03 23:27:53 +01:00
Tobias Gruetzmacher
dc22d7b32a
Add CatNine comic.
2015-11-02 23:29:56 +01:00
Tobias Gruetzmacher
10d9eac574
Remove support for very old versions of "requests".
2015-11-02 23:24:01 +01:00
MariusK
3e1ea816cc
Fixed 'Ruthe'
2015-10-02 13:52:44 +02:00
Helge Stasch
48d8519efd
Changed Goblins comic - moved to new scraper and fixed minor issues with some comics (old scrapper was unstable for some comics of Goblins)
2015-09-28 23:50:15 +02:00
Helge Stasch
17fbdf2bf7
Added comic "Ahoy Earth"
2015-09-27 00:44:47 +02:00
Tobias Gruetzmacher
d72ceb92d5
BloomingFaeries: Remove imageUrlModifier (not needed).
2015-09-04 00:37:05 +02:00
Tobias Gruetzmacher
abd80a1d35
Merge pull request #28 from KevinAnthony/master
...
added comic Blooming Faeries
2015-09-03 23:26:37 +02:00
Tobias Gruetzmacher
b737218182
ZenPencils: Allow multiple images per page.
2015-09-03 23:24:28 +02:00
Kevin Anthony
62ec1f1d18
Removed debugging print state
2015-09-02 11:22:24 -04:00
Kevin Anthony
d7180eaf99
removed bad whitespace
2015-09-02 11:04:32 -04:00
Kevin Anthony
6e8231e78a
Added Namer to BloomingFaeries since the web comic author doesn't seem intrested in sticking to any kind of file naming convention
2015-09-02 11:01:48 -04:00
Kevin Anthony
1045bb7d4a
added comic Blooming Faeries
2015-09-02 10:13:42 -04:00
Damjan Košir
11f0aa3989
created Wordpress Scraper class
2015-08-11 21:31:45 +12:00
Damjan Košir
0a5b792c32
added Fragile (English and Spanish)
2015-08-07 23:37:10 +12:00
Damjan Košir
fd9c480d9c
adding bonus panel to SWBC and multiple images flag to ParserScraper
2015-08-03 22:58:44 +12:00
Damjan Košir
f8a163a361
added a CMS ComicControl, moved some existing comics there, added StreetFighter and Metacarpolis
2015-08-03 22:40:06 +12:00
Damjan Košir
648a84e38e
added Sharksplode
2015-08-03 22:20:17 +12:00
Damjan Košir
c19806b681
added AoiHouse
2015-07-31 23:33:30 +12:00
Damjan Košir
2201c9877a
added KiwiBlitz
2015-07-31 23:09:56 +12:00
Damjan Košir
fe22df5e5b
added LetsSpeakEnglish
2015-07-31 23:06:06 +12:00
Damjan Košir
79ec427fc0
added CatVersusHuman
2015-07-30 22:16:34 +12:00
Tobias Gruetzmacher
303432fc68
Also use css expressions for textSearch.
2015-07-18 01:22:40 +02:00
Tobias Gruetzmacher
6a70bf4671
Enable some comics based on current policy.
2015-07-18 01:21:29 +02:00
Tobias Gruetzmacher
6b0046f9b3
Fix small typos.
2015-07-18 00:11:44 +02:00
Tobias Gruetzmacher
68d4dd463a
Revert robots.txt handling.
...
This brings us back to only honouring robots.txt on page downloads, not
on image downloads.
Rationale: Dosage is not a "robot" in the classical sense. It's not
designed to spider huge amounts of web sites in search for some content
to index, it's only intended to help users keep a personal archive of
comics he is interested in. We try very hard to never download any image
twice. This fixes #24 .
(Precedent for this rationale: Google Feedfetcher:
https://support.google.com/webmasters/answer/178852?hl=en#robots )
2015-07-17 20:46:56 +02:00
Tobias Gruetzmacher
7d3bd15c2f
Remove AbleAndBaker, site is gone.
2015-07-16 00:49:48 +02:00
Tobias Gruetzmacher
472afa24d3
GoComics doesn't allow spiders, disable them...
...
This removes 757 comics, including quite popular ones like Calvin and
Hobbes, Garfield, FoxTrot, etc. :(
2015-07-16 00:36:10 +02:00
Tobias Gruetzmacher
7c15ea50d8
Also check robots.txt on image downloads.
...
We DO want to honour if images are blocked by robots.txt
2015-07-15 23:50:57 +02:00
Tobias Gruetzmacher
5affd8af68
More relaxed robots.txt handling.
...
This is in line with how Perl's LWP::RobotUA and Google handles server
errors when fetching robots.txt: Just assume access is allowed.
See https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
2015-07-15 19:11:55 +02:00
Tobias Gruetzmacher
88e387ad15
Add Sleepless Domain.
2015-07-12 18:31:21 +02:00
Tobias Gruetzmacher
0b6d7425e1
Remove BladeKitten.
...
It's not available online anymore, only in print or as a PDF download.
2015-07-11 01:29:21 +02:00
Tobias Gruetzmacher
808b624e5f
Remove hard dependency on pycountry again.
...
This basically reverts commit 86b31dc12b
.
It now works like this: If the use has pycountry installed, it is used.
If not, Dosage falls back to a small internal list generated from
pycountry by scripts/mklanguages.py.
This means additional work if we ever decide to translate Dosage, since
pycountry already has all the translations for language names...
This fixes #23 .
2015-07-11 01:27:39 +02:00
Tobias Gruetzmacher
d97a9c63e4
Add Erstwhile.
2015-07-10 01:14:56 +02:00
Damjan Košir
7abca1222b
added NerfNow
2015-07-07 22:18:06 +12:00
Damjan Košir
119a3cd13a
added text to ScandinaviaAndTheWorld
2015-07-07 19:48:25 +12:00
Damjan Košir
5f243e3868
not a comic
2015-07-05 18:33:14 +12:00
Damjan Košir
5e7ad33fc8
Nnewts disabled
2015-07-05 18:32:33 +12:00
Damjan Košir
45012ff9c3
BladeKitten disabled
2015-07-05 18:31:38 +12:00
Tobias Gruetzmacher
0c6feec8cd
Fix module name EastCoastVsWestCoast.
2015-06-24 00:51:42 +02:00
Damjan Košir
96572e8cba
added TheMelvinChronicles
2015-06-12 21:00:11 +12:00
Damjan Košir
6412e6e542
fixed Spinnerette
2015-06-08 20:31:13 +12:00
Damjan Košir
3d8a49d228
realised TheWebcomicFactory is actually 28 comics... added them
2015-06-07 21:33:59 +12:00
Damjan Košir
05bb22b3ef
added TheWebcomicFactory
2015-06-06 14:25:32 +12:00
Damjan Košir
c98800388e
added Sithrah
2015-06-04 19:24:55 +12:00
Damjan Košir
010b4bf669
renaming comicpress to wordpress (as it's not just for the comicpress theme)
2015-06-04 19:12:40 +12:00
Damjan Košir
bc91f5f1fb
added MistyTheMouse
2015-06-04 19:06:40 +12:00
Damjan Košir
e2d01e4924
fixed ScandinaviaAndTheWorld
2015-06-04 18:58:59 +12:00
Damjan Košir
545a67111e
fixed Alice
2015-06-01 15:15:34 +12:00
Damjan Košir
a08ad2dc80
fixed GoGetARoomie
2015-06-01 15:11:16 +12:00
Damjan Košir
ceb19ed2bc
fixed Wulffmorgenthaler (now Wumo), added TruthFacts and MeAndDanielle
2015-06-01 12:14:52 +12:00
Damjan Košir
4cd88ecdc0
fixed WormWorldSaga
2015-06-01 11:45:22 +12:00
Damjan Košir
ea6cb925a6
fixed LoadingArtist
2015-06-01 11:33:50 +12:00
Damjan Košir
e268b09567
fixed EarthsongSaga
2015-06-01 11:19:02 +12:00
Damjan Košir
29c8d2eea0
fixed Meek
2015-05-31 23:41:12 +12:00
Damjan Košir
9be6f613e4
fixed MysteriesOfTheArcana
2015-05-31 23:39:04 +12:00
Damjan Košir
3ea8236224
fixed FowlLanguage
2015-05-31 23:29:34 +12:00
Damjan Košir
c1245a85ad
moved Footloose, added Cherry, Desigaspring
2015-05-31 23:23:02 +12:00
Damjan Košir
01aeebfbe4
fixed Footloose
2015-05-31 23:16:12 +12:00
Damjan Košir
029fa74067
fixed Bardsworth
2015-05-31 23:03:40 +12:00
Damjan Košir
f3036de8fd
fixed Pimpette
2015-05-31 22:57:25 +12:00
Damjan Košir
df7404fd7c
fixed CatsAndCameras
2015-05-31 22:50:17 +12:00
Damjan Košir
d4cc8ac857
added buni
2015-05-27 20:36:11 +12:00
Damjan Košir
9beeceffad
added BusinessCat and HappyJar
2015-05-27 20:34:51 +12:00
Damjan Košir
d970d27b14
removing duplicate
2015-05-27 00:10:46 +12:00
Damjan Košir
33abd95348
fixed TheGentlemansArmchair
2015-05-26 23:48:22 +12:00
Damjan Košir
5e123ae79e
fixed DarkWings (now available under the real name Eryl as well), added Ashes, Laiyu, NoMoreSavePoints and EasilyAmused
2015-05-26 23:43:15 +12:00
Damjan Košir
9adb020fc2
fixed DemolitionSquad
2015-05-26 22:59:25 +12:00
Damjan Košir
605c5f8619
fixed PokeyThePenguin
2015-05-26 22:31:43 +12:00
Damjan Košir
766b7ba99d
fixed ProperBarn, added 2214 and OTE
2015-05-26 22:16:55 +12:00
Damjan Košir
2c41435ceb
fixing HijiNKS ENSUE and added all 4 comics on that page
2015-05-26 22:06:55 +12:00
Damjan Košir
465e7eaf6f
fixing CowboyJedi kinda... there is currently no comic on the front page and the author knows it
2015-05-26 21:35:36 +12:00
Damjan Košir
529a41397a
fixing CorydonCafe
2015-05-26 21:32:25 +12:00
Damjan Košir
c3abb93e99
fixing ChainsawSuit
2015-05-26 19:53:04 +12:00
Damjan Košir
f8690af029
fixing Curvy
2015-05-26 19:47:31 +12:00
Damjan Košir
36c790fa4b
fixing CraftedFables
2015-05-26 19:32:12 +12:00
Damjan Košir
7067c51056
fixed CheckerboardNightmare
2015-05-25 22:19:36 +12:00
Damjan Košir
5569439c43
fixed 16 comics
2015-05-25 21:57:06 +12:00
Damjan Košir
3edaa97fb9
fixing KatzenfutterGeleespritzer
2015-05-25 20:06:58 +12:00
Damjan Košir
8a245e1d10
fixing BloodBound
2015-05-21 00:04:07 +12:00
Damjan Košir
dc2349951a
moving BroodHollow to comicpress
2015-05-21 00:00:35 +12:00
Damjan Košir
a05ae9c75d
fixing PandyLand
2015-05-20 23:56:49 +12:00
Damjan Košir
fd60065591
fixing OnTheEdge
2015-05-20 23:50:18 +12:00
Damjan Košir
80b783c016
fixing CourtingDisaster
2015-05-20 23:16:54 +12:00
Damjan Košir
ff239ff58e
Merge branch 'comicpress'
2015-05-20 23:12:03 +12:00
Damjan Košir
77c5dbce9b
better prevSearch for comic press
2015-05-20 23:08:02 +12:00
Damjan Košir
bc4e7a03f2
fixed BroodHollow
2015-05-20 23:03:15 +12:00
Damjan Košir
8de620c78b
fixed CigarroAndCerveja
2015-05-20 22:58:13 +12:00
Damjan Košir
4529fdee3b
adding no downsize option
2015-05-20 22:38:29 +12:00
Damjan Košir
77a9cce00d
fixing Hipsters
2015-05-19 19:49:45 +12:00
Damjan Košir
79d775a8d9
adding comicpress scraper
2015-05-16 00:15:32 +12:00
Damjan Košir
962286d391
fixed OctopusPie
2015-05-14 23:06:12 +12:00
Damjan Košir
3bbf2d5c23
fixing neko the kitty
2015-05-14 22:42:04 +12:00
Damjan Košir
f75fc62e84
fixing pebbleversion
2015-05-14 22:33:46 +12:00
Helge Stasch
5a1ef9b791
Fixed problem with LookingForGroup comic
2015-05-07 13:57:10 +02:00