Commit graph

580 commits

Author SHA1 Message Date
Damjan Košir af2e57d850 Added comic ScurryAndCover...
- Yay, funky JavaScript parsing!
- Start page isn't latest comic...

Updated-by: Tobias Gruetzmacher <tobias-git@23.gs>
2016-04-11 00:09:53 +02:00
Tobias Gruetzmacher fa98f6ddbf Move more comics to common WordPressScraper. 2016-04-10 23:04:34 +02:00
Tobias Gruetzmacher bb5b6ffcec Fix comics in module a.py. 2016-04-07 23:21:31 +02:00
Tobias Gruetzmacher 0033a8046b Fix creators module. 2016-04-07 00:20:03 +02:00
Tobias Gruetzmacher 8768ff07b6 Fix AhoiPolloi, be a bit smarter about encoding.
HTML character encoding in the context of HTTP is quite tricky to get
right and honestly, I'm not sure if I did get it right this time. But I
think, the current behaviour matches best what web browsers try to do:

1. Let Requests figure out the content from the HTTP header. This
   overrides everything else. We need to "trick" LXML to accept our
   decision if the document contains an XML declaration which might
   disagree with the HTTP header.
2. If the HTTP headers don't specify any encoding, let LXML guess the
   encoding and be done with it.
2016-04-06 22:22:22 +02:00
Tobias Gruetzmacher 183d18e7bc Skip non-image on xkcd. 2016-04-06 00:50:01 +02:00
Tobias Gruetzmacher 9feaf245f2 Fixed & removed some comics in s.py. 2016-04-06 00:40:13 +02:00
Tobias Gruetzmacher 6bbdcfb341 BloomingFaeries: Don't download every page twice.
(Also, simplify namer, switch to _ParserScraper)
2016-04-05 23:58:43 +02:00
Tobias Gruetzmacher 8db6f8e8b7 Fix ZapComics, remove ZebraGirl.
- ZebraGirl is now ComicFury/ZebraGirl...
2016-04-04 00:27:11 +02:00
Tobias Gruetzmacher 0bcfb8a82e Move ComicControl into common module.
- Move all comics using ComicControl into alphabetical files.
- Add BalderDash & Picklewhistle
2016-04-04 00:12:53 +02:00
Tobias Gruetzmacher 0d453a6858 Move Flowerlark Studios into alphabetical files. 2016-04-03 22:58:01 +02:00
Tobias Gruetzmacher a9f0dfdce4 Merge pull request #39 from peterjanes/peterjanes/sherman-fix
Fix Sherman's Lagoon
2016-04-03 22:20:04 +02:00
Tobias Gruetzmacher 926439cd14 Every comic need an url. 2016-04-03 22:03:16 +02:00
Tobias Gruetzmacher 2c6decb7f5 Move WebcomicFactory in its own module.
Also, add an updater script for it.
2016-04-03 21:31:56 +02:00
Peter Janes 759bd0c360 Fix Sherman's Lagoon 2016-04-03 14:54:41 -04:00
Tobias Gruetzmacher bb1f20d867 Remove make_scraper for most WordPress comics.
- Dropped KatzenfutterGeleespritzer, because robots.txt.
- Move all WordPress/ComicPress scrapers into alphabetical files.
- Move _WordPressScraper & _ComicPress scraper into common.py.
- Some smaller PEP8 fixes.
2016-04-02 00:19:53 +02:00
Tobias Gruetzmacher 7f1e136d8b Sort comics alphabetically & PEP8 style fixes. 2016-03-31 23:13:54 +02:00
Tobias Gruetzmacher 90dfceaeb1 Remove dead modules (& format). 2016-03-20 20:48:42 +01:00
Tobias Gruetzmacher f243096d49 Fix GastroPhobia, remove GeneralProtectionFault.
(& formatting)
2016-03-20 20:11:21 +01:00
Tobias Gruetzmacher cfcfcc2468 Switch plugin loading to pkgutil.
This should work with all PEP-302 loaders that implement iter_modules.
Unfortunatly, PyInstaller (which I plan to use for Windows releases)
does not support it, so we don't get around a special case. Anyways,
this should help for #22.
2016-03-20 15:13:24 +01:00
Tobias Gruetzmacher 1af022895e Fix NuklearPower (fixes #38).
Also remove make_scraper magic.
2016-03-17 23:19:52 +01:00
Tobias Gruetzmacher 552f29e5fc Update ComicFury comics. (+871, -245)
- Remove make_scraper magic
- Switch to HTML parser
- Update parsing of comic listing.
2016-03-17 00:44:06 +01:00
Damjan Košir 615f094ef3 fixing EdmundFinney 2016-03-14 20:32:18 +13:00
Damjan Košir b0dc510b08 adding LastNerdsOnEarth 2016-01-03 14:16:58 +13:00
Damjan Košir a1e79cbbf2 fixing Fragile 2016-01-03 14:08:49 +13:00
Tobias Gruetzmacher 64f7e313d5 Remove make_scraper magic from footloosecomic.py. 2015-11-05 00:03:13 +01:00
Tobias Gruetzmacher 7f7a69818b Remove make_scraper magic from creators module. 2015-11-04 23:43:31 +01:00
Tobias Gruetzmacher 94470d564c Fix import for Python 3. 2015-11-03 23:40:45 +01:00
Tobias Gruetzmacher dc22d7b32a Add CatNine comic. 2015-11-02 23:29:56 +01:00
MariusK 3e1ea816cc Fixed 'Ruthe' 2015-10-02 13:52:44 +02:00
Helge Stasch 48d8519efd Changed Goblins comic - moved to new scraper and fixed minor issues with some comics (old scrapper was unstable for some comics of Goblins) 2015-09-28 23:50:15 +02:00
Helge Stasch 17fbdf2bf7 Added comic "Ahoy Earth" 2015-09-27 00:44:47 +02:00
Tobias Gruetzmacher d72ceb92d5 BloomingFaeries: Remove imageUrlModifier (not needed). 2015-09-04 00:37:05 +02:00
Tobias Gruetzmacher abd80a1d35 Merge pull request #28 from KevinAnthony/master
added comic Blooming Faeries
2015-09-03 23:26:37 +02:00
Tobias Gruetzmacher b737218182 ZenPencils: Allow multiple images per page. 2015-09-03 23:24:28 +02:00
Kevin Anthony 62ec1f1d18 Removed debugging print state 2015-09-02 11:22:24 -04:00
Kevin Anthony d7180eaf99 removed bad whitespace 2015-09-02 11:04:32 -04:00
Kevin Anthony 6e8231e78a Added Namer to BloomingFaeries since the web comic author doesn't seem intrested in sticking to any kind of file naming convention 2015-09-02 11:01:48 -04:00
Kevin Anthony 1045bb7d4a added comic Blooming Faeries 2015-09-02 10:13:42 -04:00
Damjan Košir 11f0aa3989 created Wordpress Scraper class 2015-08-11 21:31:45 +12:00
Damjan Košir 0a5b792c32 added Fragile (English and Spanish) 2015-08-07 23:37:10 +12:00
Damjan Košir fd9c480d9c adding bonus panel to SWBC and multiple images flag to ParserScraper 2015-08-03 22:58:44 +12:00
Damjan Košir f8a163a361 added a CMS ComicControl, moved some existing comics there, added StreetFighter and Metacarpolis 2015-08-03 22:40:06 +12:00
Damjan Košir 648a84e38e added Sharksplode 2015-08-03 22:20:17 +12:00
Damjan Košir c19806b681 added AoiHouse 2015-07-31 23:33:30 +12:00
Damjan Košir 2201c9877a added KiwiBlitz 2015-07-31 23:09:56 +12:00
Damjan Košir fe22df5e5b added LetsSpeakEnglish 2015-07-31 23:06:06 +12:00
Damjan Košir 79ec427fc0 added CatVersusHuman 2015-07-30 22:16:34 +12:00
Tobias Gruetzmacher 6a70bf4671 Enable some comics based on current policy. 2015-07-18 01:21:29 +02:00
Tobias Gruetzmacher 6b0046f9b3 Fix small typos. 2015-07-18 00:11:44 +02:00
Tobias Gruetzmacher 68d4dd463a Revert robots.txt handling.
This brings us back to only honouring robots.txt on page downloads, not
on image downloads.

Rationale: Dosage is not a "robot" in the classical sense. It's not
designed to spider huge amounts of web sites in search for some content
to index, it's only intended to help users keep a personal archive of
comics he is interested in. We try very hard to never download any image
twice. This fixes #24.

(Precedent for this rationale: Google Feedfetcher:
https://support.google.com/webmasters/answer/178852?hl=en#robots)
2015-07-17 20:46:56 +02:00
Tobias Gruetzmacher 7d3bd15c2f Remove AbleAndBaker, site is gone. 2015-07-16 00:49:48 +02:00
Tobias Gruetzmacher 472afa24d3 GoComics doesn't allow spiders, disable them...
This removes 757 comics, including quite popular ones like Calvin and
Hobbes, Garfield, FoxTrot, etc. :(
2015-07-16 00:36:10 +02:00
Tobias Gruetzmacher 88e387ad15 Add Sleepless Domain. 2015-07-12 18:31:21 +02:00
Tobias Gruetzmacher 0b6d7425e1 Remove BladeKitten.
It's not available online anymore, only in print or as a PDF download.
2015-07-11 01:29:21 +02:00
Tobias Gruetzmacher d97a9c63e4 Add Erstwhile. 2015-07-10 01:14:56 +02:00
Damjan Košir 7abca1222b added NerfNow 2015-07-07 22:18:06 +12:00
Damjan Košir 119a3cd13a added text to ScandinaviaAndTheWorld 2015-07-07 19:48:25 +12:00
Damjan Košir 5f243e3868 not a comic 2015-07-05 18:33:14 +12:00
Damjan Košir 5e7ad33fc8 Nnewts disabled 2015-07-05 18:32:33 +12:00
Damjan Košir 45012ff9c3 BladeKitten disabled 2015-07-05 18:31:38 +12:00
Tobias Gruetzmacher 0c6feec8cd Fix module name EastCoastVsWestCoast. 2015-06-24 00:51:42 +02:00
Damjan Košir 96572e8cba added TheMelvinChronicles 2015-06-12 21:00:11 +12:00
Damjan Košir 6412e6e542 fixed Spinnerette 2015-06-08 20:31:13 +12:00
Damjan Košir 3d8a49d228 realised TheWebcomicFactory is actually 28 comics... added them 2015-06-07 21:33:59 +12:00
Damjan Košir 05bb22b3ef added TheWebcomicFactory 2015-06-06 14:25:32 +12:00
Damjan Košir c98800388e added Sithrah 2015-06-04 19:24:55 +12:00
Damjan Košir 010b4bf669 renaming comicpress to wordpress (as it's not just for the comicpress theme) 2015-06-04 19:12:40 +12:00
Damjan Košir bc91f5f1fb added MistyTheMouse 2015-06-04 19:06:40 +12:00
Damjan Košir e2d01e4924 fixed ScandinaviaAndTheWorld 2015-06-04 18:58:59 +12:00
Damjan Košir 545a67111e fixed Alice 2015-06-01 15:15:34 +12:00
Damjan Košir a08ad2dc80 fixed GoGetARoomie 2015-06-01 15:11:16 +12:00
Damjan Košir ceb19ed2bc fixed Wulffmorgenthaler (now Wumo), added TruthFacts and MeAndDanielle 2015-06-01 12:14:52 +12:00
Damjan Košir 4cd88ecdc0 fixed WormWorldSaga 2015-06-01 11:45:22 +12:00
Damjan Košir ea6cb925a6 fixed LoadingArtist 2015-06-01 11:33:50 +12:00
Damjan Košir e268b09567 fixed EarthsongSaga 2015-06-01 11:19:02 +12:00
Damjan Košir 29c8d2eea0 fixed Meek 2015-05-31 23:41:12 +12:00
Damjan Košir 9be6f613e4 fixed MysteriesOfTheArcana 2015-05-31 23:39:04 +12:00
Damjan Košir 3ea8236224 fixed FowlLanguage 2015-05-31 23:29:34 +12:00
Damjan Košir c1245a85ad moved Footloose, added Cherry, Desigaspring 2015-05-31 23:23:02 +12:00
Damjan Košir 01aeebfbe4 fixed Footloose 2015-05-31 23:16:12 +12:00
Damjan Košir 029fa74067 fixed Bardsworth 2015-05-31 23:03:40 +12:00
Damjan Košir f3036de8fd fixed Pimpette 2015-05-31 22:57:25 +12:00
Damjan Košir df7404fd7c fixed CatsAndCameras 2015-05-31 22:50:17 +12:00
Damjan Košir d4cc8ac857 added buni 2015-05-27 20:36:11 +12:00
Damjan Košir 9beeceffad added BusinessCat and HappyJar 2015-05-27 20:34:51 +12:00
Damjan Košir d970d27b14 removing duplicate 2015-05-27 00:10:46 +12:00
Damjan Košir 33abd95348 fixed TheGentlemansArmchair 2015-05-26 23:48:22 +12:00
Damjan Košir 5e123ae79e fixed DarkWings (now available under the real name Eryl as well), added Ashes, Laiyu, NoMoreSavePoints and EasilyAmused 2015-05-26 23:43:15 +12:00
Damjan Košir 9adb020fc2 fixed DemolitionSquad 2015-05-26 22:59:25 +12:00
Damjan Košir 605c5f8619 fixed PokeyThePenguin 2015-05-26 22:31:43 +12:00
Damjan Košir 766b7ba99d fixed ProperBarn, added 2214 and OTE 2015-05-26 22:16:55 +12:00
Damjan Košir 2c41435ceb fixing HijiNKS ENSUE and added all 4 comics on that page 2015-05-26 22:06:55 +12:00
Damjan Košir 465e7eaf6f fixing CowboyJedi kinda... there is currently no comic on the front page and the author knows it 2015-05-26 21:35:36 +12:00
Damjan Košir 529a41397a fixing CorydonCafe 2015-05-26 21:32:25 +12:00
Damjan Košir c3abb93e99 fixing ChainsawSuit 2015-05-26 19:53:04 +12:00
Damjan Košir f8690af029 fixing Curvy 2015-05-26 19:47:31 +12:00
Damjan Košir 36c790fa4b fixing CraftedFables 2015-05-26 19:32:12 +12:00
Damjan Košir 7067c51056 fixed CheckerboardNightmare 2015-05-25 22:19:36 +12:00
Damjan Košir 5569439c43 fixed 16 comics 2015-05-25 21:57:06 +12:00