dosage

Author	SHA1	Message	Date
Tobias Gruetzmacher	7f1e136d8b	Sort comics alphabetically & PEP8 style fixes.	2016-03-31 23:13:54 +02:00
Tobias Gruetzmacher	d6db1d0b81	Fix a conflict with IPython.	2016-03-20 23:57:07 +01:00
Tobias Gruetzmacher	90dfceaeb1	Remove dead modules (& format).	2016-03-20 20:48:42 +01:00
Tobias Gruetzmacher	f243096d49	Fix GastroPhobia, remove GeneralProtectionFault. (& formatting)	2016-03-20 20:11:21 +01:00
Tobias Gruetzmacher	cfcfcc2468	Switch plugin loading to pkgutil. This should work with all PEP-302 loaders that implement iter_modules. Unfortunatly, PyInstaller (which I plan to use for Windows releases) does not support it, so we don't get around a special case. Anyways, this should help for #22.	2016-03-20 15:13:24 +01:00
Tobias Gruetzmacher	1af022895e	Fix NuklearPower (fixes #38 ). Also remove make_scraper magic.	2016-03-17 23:19:52 +01:00
Tobias Gruetzmacher	552f29e5fc	Update ComicFury comics. (+871, -245) - Remove make_scraper magic - Switch to HTML parser - Update parsing of comic listing.	2016-03-17 00:44:06 +01:00
Tobias Gruetzmacher	6727e9b559	Use vendored urllib3. As long as requests ships with urllib3, we can't fall back to the "system" urllib3, since that breaks class-identity checks.	2016-03-16 23:18:19 +01:00
Damjan Košir	615f094ef3	fixing EdmundFinney	2016-03-14 20:32:18 +13:00
Tobias Gruetzmacher	c4fcd985dd	Let urllib3 handle all retries.	2016-03-13 21:30:36 +01:00
Tobias Gruetzmacher	78e13962f9	Sort scraper modules (mostly for test stability).	2016-03-13 20:24:21 +01:00
Tobias Gruetzmacher	017d35cb3c	Fallback version if pkg_resources not available. This helps for Windows packaging.	2016-03-03 01:05:36 +01:00
Johannes Schöpp	351fa7154e	Modified maximum page size Fixes #36	2016-03-01 22:19:44 +01:00
Damjan Košir	b0dc510b08	adding LastNerdsOnEarth	2016-01-03 14:16:58 +13:00
Damjan Košir	a1e79cbbf2	fixing Fragile	2016-01-03 14:08:49 +13:00
Tobias Gruetzmacher	81827f83bc	Use GitHub releases API for update checks.	2015-11-06 23:07:19 +01:00
Tobias Gruetzmacher	a41574e31a	Make version fetching a bit more robust (use pbr).	2015-11-06 22:08:14 +01:00
Tobias Gruetzmacher	64f7e313d5	Remove make_scraper magic from footloosecomic.py.	2015-11-05 00:03:13 +01:00
Tobias Gruetzmacher	7f7a69818b	Remove make_scraper magic from creators module.	2015-11-04 23:43:31 +01:00
Tobias Gruetzmacher	94470d564c	Fix import for Python 3.	2015-11-03 23:40:45 +01:00
Tobias Gruetzmacher	b819afec39	Switch build to PBR. This gets us: - Automatic changelog - Automatic authors list - Automatic git version management	2015-11-03 23:27:53 +01:00
Tobias Gruetzmacher	dc22d7b32a	Add CatNine comic.	2015-11-02 23:29:56 +01:00
Tobias Gruetzmacher	10d9eac574	Remove support for very old versions of "requests".	2015-11-02 23:24:01 +01:00
MariusK	3e1ea816cc	Fixed 'Ruthe'	2015-10-02 13:52:44 +02:00
Helge Stasch	48d8519efd	Changed Goblins comic - moved to new scraper and fixed minor issues with some comics (old scrapper was unstable for some comics of Goblins)	2015-09-28 23:50:15 +02:00
Helge Stasch	17fbdf2bf7	Added comic "Ahoy Earth"	2015-09-27 00:44:47 +02:00
Tobias Gruetzmacher	d72ceb92d5	BloomingFaeries: Remove imageUrlModifier (not needed).	2015-09-04 00:37:05 +02:00
Tobias Gruetzmacher	abd80a1d35	Merge pull request #28 from KevinAnthony/master added comic Blooming Faeries	2015-09-03 23:26:37 +02:00
Tobias Gruetzmacher	b737218182	ZenPencils: Allow multiple images per page.	2015-09-03 23:24:28 +02:00
Kevin Anthony	62ec1f1d18	Removed debugging print state	2015-09-02 11:22:24 -04:00
Kevin Anthony	d7180eaf99	removed bad whitespace	2015-09-02 11:04:32 -04:00
Kevin Anthony	6e8231e78a	Added Namer to BloomingFaeries since the web comic author doesn't seem intrested in sticking to any kind of file naming convention	2015-09-02 11:01:48 -04:00
Kevin Anthony	1045bb7d4a	added comic Blooming Faeries	2015-09-02 10:13:42 -04:00
Damjan Košir	11f0aa3989	created Wordpress Scraper class	2015-08-11 21:31:45 +12:00
Damjan Košir	0a5b792c32	added Fragile (English and Spanish)	2015-08-07 23:37:10 +12:00
Damjan Košir	fd9c480d9c	adding bonus panel to SWBC and multiple images flag to ParserScraper	2015-08-03 22:58:44 +12:00
Damjan Košir	f8a163a361	added a CMS ComicControl, moved some existing comics there, added StreetFighter and Metacarpolis	2015-08-03 22:40:06 +12:00
Damjan Košir	648a84e38e	added Sharksplode	2015-08-03 22:20:17 +12:00
Damjan Košir	c19806b681	added AoiHouse	2015-07-31 23:33:30 +12:00
Damjan Košir	2201c9877a	added KiwiBlitz	2015-07-31 23:09:56 +12:00
Damjan Košir	fe22df5e5b	added LetsSpeakEnglish	2015-07-31 23:06:06 +12:00
Damjan Košir	79ec427fc0	added CatVersusHuman	2015-07-30 22:16:34 +12:00
Tobias Gruetzmacher	303432fc68	Also use css expressions for textSearch.	2015-07-18 01:22:40 +02:00
Tobias Gruetzmacher	6a70bf4671	Enable some comics based on current policy.	2015-07-18 01:21:29 +02:00
Tobias Gruetzmacher	6b0046f9b3	Fix small typos.	2015-07-18 00:11:44 +02:00
Tobias Gruetzmacher	68d4dd463a	Revert robots.txt handling. This brings us back to only honouring robots.txt on page downloads, not on image downloads. Rationale: Dosage is not a "robot" in the classical sense. It's not designed to spider huge amounts of web sites in search for some content to index, it's only intended to help users keep a personal archive of comics he is interested in. We try very hard to never download any image twice. This fixes #24. (Precedent for this rationale: Google Feedfetcher: https://support.google.com/webmasters/answer/178852?hl=en#robots)	2015-07-17 20:46:56 +02:00
Tobias Gruetzmacher	7d3bd15c2f	Remove AbleAndBaker, site is gone.	2015-07-16 00:49:48 +02:00
Tobias Gruetzmacher	472afa24d3	GoComics doesn't allow spiders, disable them... This removes 757 comics, including quite popular ones like Calvin and Hobbes, Garfield, FoxTrot, etc. :(	2015-07-16 00:36:10 +02:00
Tobias Gruetzmacher	7c15ea50d8	Also check robots.txt on image downloads. We DO want to honour if images are blocked by robots.txt	2015-07-15 23:50:57 +02:00
Tobias Gruetzmacher	5affd8af68	More relaxed robots.txt handling. This is in line with how Perl's LWP::RobotUA and Google handles server errors when fetching robots.txt: Just assume access is allowed. See https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt	2015-07-15 19:11:55 +02:00

1 2 3 4 5 ...

697 commits