Commit graph

690 commits

Author SHA1 Message Date
Tobias Gruetzmacher
6727e9b559 Use vendored urllib3.
As long as requests ships with urllib3, we can't fall back to the
"system" urllib3, since that breaks class-identity checks.
2016-03-16 23:18:19 +01:00
Damjan Košir
615f094ef3 fixing EdmundFinney 2016-03-14 20:32:18 +13:00
Tobias Gruetzmacher
c4fcd985dd Let urllib3 handle all retries. 2016-03-13 21:30:36 +01:00
Tobias Gruetzmacher
78e13962f9 Sort scraper modules (mostly for test stability). 2016-03-13 20:24:21 +01:00
Tobias Gruetzmacher
017d35cb3c Fallback version if pkg_resources not available.
This helps for Windows packaging.
2016-03-03 01:05:36 +01:00
Johannes Schöpp
351fa7154e Modified maximum page size
Fixes #36
2016-03-01 22:19:44 +01:00
Damjan Košir
b0dc510b08 adding LastNerdsOnEarth 2016-01-03 14:16:58 +13:00
Damjan Košir
a1e79cbbf2 fixing Fragile 2016-01-03 14:08:49 +13:00
Tobias Gruetzmacher
81827f83bc Use GitHub releases API for update checks. 2015-11-06 23:07:19 +01:00
Tobias Gruetzmacher
a41574e31a Make version fetching a bit more robust (use pbr). 2015-11-06 22:08:14 +01:00
Tobias Gruetzmacher
64f7e313d5 Remove make_scraper magic from footloosecomic.py. 2015-11-05 00:03:13 +01:00
Tobias Gruetzmacher
7f7a69818b Remove make_scraper magic from creators module. 2015-11-04 23:43:31 +01:00
Tobias Gruetzmacher
94470d564c Fix import for Python 3. 2015-11-03 23:40:45 +01:00
Tobias Gruetzmacher
b819afec39 Switch build to PBR.
This gets us:
- Automatic changelog
- Automatic authors list
- Automatic git version management
2015-11-03 23:27:53 +01:00
Tobias Gruetzmacher
dc22d7b32a Add CatNine comic. 2015-11-02 23:29:56 +01:00
Tobias Gruetzmacher
10d9eac574 Remove support for very old versions of "requests". 2015-11-02 23:24:01 +01:00
MariusK
3e1ea816cc Fixed 'Ruthe' 2015-10-02 13:52:44 +02:00
Helge Stasch
48d8519efd Changed Goblins comic - moved to new scraper and fixed minor issues with some comics (old scrapper was unstable for some comics of Goblins) 2015-09-28 23:50:15 +02:00
Helge Stasch
17fbdf2bf7 Added comic "Ahoy Earth" 2015-09-27 00:44:47 +02:00
Tobias Gruetzmacher
d72ceb92d5 BloomingFaeries: Remove imageUrlModifier (not needed). 2015-09-04 00:37:05 +02:00
Tobias Gruetzmacher
abd80a1d35 Merge pull request #28 from KevinAnthony/master
added comic Blooming Faeries
2015-09-03 23:26:37 +02:00
Tobias Gruetzmacher
b737218182 ZenPencils: Allow multiple images per page. 2015-09-03 23:24:28 +02:00
Kevin Anthony
62ec1f1d18 Removed debugging print state 2015-09-02 11:22:24 -04:00
Kevin Anthony
d7180eaf99 removed bad whitespace 2015-09-02 11:04:32 -04:00
Kevin Anthony
6e8231e78a Added Namer to BloomingFaeries since the web comic author doesn't seem intrested in sticking to any kind of file naming convention 2015-09-02 11:01:48 -04:00
Kevin Anthony
1045bb7d4a added comic Blooming Faeries 2015-09-02 10:13:42 -04:00
Damjan Košir
11f0aa3989 created Wordpress Scraper class 2015-08-11 21:31:45 +12:00
Damjan Košir
0a5b792c32 added Fragile (English and Spanish) 2015-08-07 23:37:10 +12:00
Damjan Košir
fd9c480d9c adding bonus panel to SWBC and multiple images flag to ParserScraper 2015-08-03 22:58:44 +12:00
Damjan Košir
f8a163a361 added a CMS ComicControl, moved some existing comics there, added StreetFighter and Metacarpolis 2015-08-03 22:40:06 +12:00
Damjan Košir
648a84e38e added Sharksplode 2015-08-03 22:20:17 +12:00
Damjan Košir
c19806b681 added AoiHouse 2015-07-31 23:33:30 +12:00
Damjan Košir
2201c9877a added KiwiBlitz 2015-07-31 23:09:56 +12:00
Damjan Košir
fe22df5e5b added LetsSpeakEnglish 2015-07-31 23:06:06 +12:00
Damjan Košir
79ec427fc0 added CatVersusHuman 2015-07-30 22:16:34 +12:00
Tobias Gruetzmacher
303432fc68 Also use css expressions for textSearch. 2015-07-18 01:22:40 +02:00
Tobias Gruetzmacher
6a70bf4671 Enable some comics based on current policy. 2015-07-18 01:21:29 +02:00
Tobias Gruetzmacher
6b0046f9b3 Fix small typos. 2015-07-18 00:11:44 +02:00
Tobias Gruetzmacher
68d4dd463a Revert robots.txt handling.
This brings us back to only honouring robots.txt on page downloads, not
on image downloads.

Rationale: Dosage is not a "robot" in the classical sense. It's not
designed to spider huge amounts of web sites in search for some content
to index, it's only intended to help users keep a personal archive of
comics he is interested in. We try very hard to never download any image
twice. This fixes #24.

(Precedent for this rationale: Google Feedfetcher:
https://support.google.com/webmasters/answer/178852?hl=en#robots)
2015-07-17 20:46:56 +02:00
Tobias Gruetzmacher
7d3bd15c2f Remove AbleAndBaker, site is gone. 2015-07-16 00:49:48 +02:00
Tobias Gruetzmacher
472afa24d3 GoComics doesn't allow spiders, disable them...
This removes 757 comics, including quite popular ones like Calvin and
Hobbes, Garfield, FoxTrot, etc. :(
2015-07-16 00:36:10 +02:00
Tobias Gruetzmacher
7c15ea50d8 Also check robots.txt on image downloads.
We DO want to honour if images are blocked by robots.txt
2015-07-15 23:50:57 +02:00
Tobias Gruetzmacher
5affd8af68 More relaxed robots.txt handling.
This is in line with how Perl's LWP::RobotUA and Google handles server
errors when fetching robots.txt: Just assume access is allowed.

See https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
2015-07-15 19:11:55 +02:00
Tobias Gruetzmacher
88e387ad15 Add Sleepless Domain. 2015-07-12 18:31:21 +02:00
Tobias Gruetzmacher
0b6d7425e1 Remove BladeKitten.
It's not available online anymore, only in print or as a PDF download.
2015-07-11 01:29:21 +02:00
Tobias Gruetzmacher
808b624e5f Remove hard dependency on pycountry again.
This basically reverts commit 86b31dc12b.

It now works like this: If the use has pycountry installed, it is used.
If not, Dosage falls back to a small internal list generated from
pycountry by scripts/mklanguages.py.

This means additional work if we ever decide to translate Dosage, since
pycountry already has all the translations for language names...

This fixes #23.
2015-07-11 01:27:39 +02:00
Tobias Gruetzmacher
d97a9c63e4 Add Erstwhile. 2015-07-10 01:14:56 +02:00
Damjan Košir
7abca1222b added NerfNow 2015-07-07 22:18:06 +12:00
Damjan Košir
119a3cd13a added text to ScandinaviaAndTheWorld 2015-07-07 19:48:25 +12:00
Damjan Košir
5f243e3868 not a comic 2015-07-05 18:33:14 +12:00