Commit graph

875 commits

Author SHA1 Message Date
Tobias Gruetzmacher
d2861d029f Fix "Extra Fabulous Comics" (fixes #129) 2019-12-03 19:50:41 +01:00
Tobias Gruetzmacher
ddba32391b Add BlondeSunrise (fixes #142) 2019-12-03 00:14:57 +01:00
Jakob Kogler
6fd3282047 Add comic "turnoff" (closes #139) 2019-12-01 21:46:00 +01:00
hkocharyan
31309a26d2 fixed oglaf next and previous tags (#141) 2019-11-19 20:56:54 +01:00
Tobias Gruetzmacher
00d0201c5f Fix a bunch of flake8 issues 2019-11-04 00:16:25 +01:00
Techwolf
255fbfa1b4 Add Evon 2019-11-03 23:51:58 +01:00
Techwolf
b230ef31d8 Add Guardia 2019-11-03 23:51:27 +01:00
Techwolf
70223bd38f Add YouSayItFirst 2019-11-03 23:50:47 +01:00
Techwolf
a5a868c2e7 Add NicoleAndDerek 2019-11-03 23:50:47 +01:00
Techwolf
be6a6722b5 Add UnlikeMinerva 2019-11-03 23:50:47 +01:00
Techwolf
fcdbd563a2 Add HavocInc 2019-11-03 23:44:49 +01:00
Techwolf
a575e5e0f0 Add Dissonance 2019-11-03 23:44:24 +01:00
Techwolf
14a01c3e47 Add OutOfPlacers 2019-11-03 23:43:23 +01:00
Techwolf
edc59a86e3 Add PlushAndBlood 2019-11-03 23:42:46 +01:00
Techwolf
4c7a654dcc Add WhiteNoiseLee 2019-11-03 23:42:25 +01:00
Techwolf
12b048d449 Add Savestate 2019-11-03 23:41:48 +01:00
Techwolf
4a783c11ec Add NotAVillain 2019-11-03 23:41:28 +01:00
Techwolf
013d10a1f2 Add SmackJeeves/FurryExperience 2019-11-03 23:40:58 +01:00
Techwolf
2a7d63b7eb Add CrimsonFlag 2019-11-03 23:39:53 +01:00
Techwolf
e06f31784e Add Supercell 2019-11-03 23:38:12 +01:00
Techwolf
f4e3c09717 Add CollegeCatastrophe 2019-11-03 23:37:46 +01:00
Techwolf
59c68bc447 Add NineToNine 2019-11-03 23:37:33 +01:00
Techwolf
a444798460 Add SwordsAndSausages 2019-11-03 23:37:13 +01:00
Techwolf
99ee4147f7 Add SuburbanJungleRoughHousing 2019-11-03 23:36:50 +01:00
Techwolf
f564989e36 Add SuburbanJungle 2019-11-03 23:36:31 +01:00
Techwolf
8a987d3d97 Add ButImACatPerson 2019-11-03 23:36:02 +01:00
Techwolf
44b65f9aac Add OrderOfTheBlackDog 2019-11-03 23:35:40 +01:00
Techwolf
15a5953120 Add ATaleOfTails 2019-11-03 23:35:06 +01:00
Techwolf
323bfc3a6a Add Dreamkeepers 2019-11-03 23:34:38 +01:00
Techwolf
70e78a87de Add CarryOn 2019-11-03 23:34:04 +01:00
Techwolf
e565b083be Add CavesAndCritters 2019-11-03 23:31:17 +01:00
Techwolf
6d76193a9f Add IslaAukate and IslaAukateColor 2019-11-03 23:29:02 +01:00
Techwolf
dd6e536a55 Add Kaerwyn and BlackTapestries 2019-11-03 23:26:03 +01:00
Techwolf
0eccdf737a Add Housepets 2019-11-03 21:53:13 +01:00
Techwolf
48ebffc756 Add HowToBeAWerewolf 2019-11-03 21:39:52 +01:00
Techwolf
f5b7b067b7 Switch AGirlAndHerFed to parser scraper 2019-11-03 21:37:05 +01:00
Techwolf
ed3acd2d2f Fix TheWhiteboard 2019-11-03 21:35:58 +01:00
Techwolf
b055a8574f Fix DominicDeegan 2019-11-03 21:34:53 +01:00
Techwolf
1b87afad7e Fix GrrlPower 2019-11-03 21:34:18 +01:00
Techwolf
9796f994e3 Add SSDD 2019-11-03 21:33:03 +01:00
Techwolf
6b319fdda8 Fix SabrinaOnline 2019-11-03 21:31:50 +01:00
Techwolf
b2db51c361 Add OriginalLife 2019-11-03 21:22:56 +01:00
Techwolf
016516e984 Switch BetterDays to parser scraper 2019-11-03 21:22:29 +01:00
Techwolf
5ca5da51fc Fix Curtailed 2019-11-03 21:20:23 +01:00
Techwolf
79618e2a2f Add first strip URL for PS238 2019-11-03 21:19:10 +01:00
Techwolf
764a8ce6f6 Fix SlightlyDamned 2019-11-03 21:17:31 +01:00
Tobias Gruetzmacher
328b3cd072 Add new namer "joinPathPartsNamer"
Additionally, switch some comics which benefit from it to the new namer.
This fixes #127.
2019-06-30 20:52:15 +02:00
Mikkel Høgh
b8b488670e New comic: The Rock Cocks (#138) 2019-06-30 19:46:39 +02:00
Mikkel Høgh
f29d14c3b4 Fix Gunnerkrigg Court prevSearch matcher (#135) 2019-06-26 23:25:01 +02:00
Mikkel Høgh
78ac7144b2 Fix Girls with Slingshots matchers (#136)
Domain name and URLs have changed slightly.

Fixes #105.
2019-06-26 23:22:45 +02:00
Gervásio Júnior
6c8814fe40 Fix multiple imgs for json flag & ZenPencils bouncer (#133)
When using the JSON output flag, if the page has more than one image,
dictionary indexing cannot be used as list.

For the ZenPencils comic, the bouncer is missing, saving the page url
as the root url.
2019-06-19 07:09:33 +02:00
Arthur Caranta
ffbf494765 Added support for Tripping Over You comic (#130) 2019-04-17 20:28:17 +02:00
Damjan Košir
78e8f05360 added MonkeyUser 2018-08-28 22:13:48 +12:00
erpbridge
e7410ce26b Update for new LookingForGroup site format (#123)
Site uses a WordPress build, but does not explicitly match
any existing scrapers. Fixes #120.
2018-08-12 23:39:51 +02:00
erpbridge
62d3692d3b Update for ElGoonishShive and ElGoonishShiveNP (#122)
Strip no longer supports ID numbers after May 21 site revamp per Dan Shive. Code here switched to ComicControl. Tested and verified locally.  (fixes #118)
2018-08-10 09:13:43 +02:00
Tobias Gruetzmacher
49ec3cc3fa Fix (and simplify) GoComics expressions (fixes #117) 2018-07-14 11:00:27 +02:00
Tobias Gruetzmacher
6ba1e49bfd Kenneth Reitz’s Code Style™
See http://docs.python-requests.org/en/master/dev/contributing/#kenneth-reitz-s-code-style

Effectively, this removes "visual" indents.
2018-06-29 19:26:17 +02:00
Tobias Gruetzmacher
fbb3a18c91 Enable warnings and fix some of them 2018-05-23 00:54:40 +02:00
Dirk Reiners
050a0dc97c MenageA3 naming fix 2018-04-23 08:07:41 +02:00
Dirk Reiners
cba9edbdec LifeAintNoPontFarm added 2018-04-23 08:06:13 +02:00
Dirk Reiners
01c1b04778 CyanideAndHappiness fix 2018-04-23 07:53:22 +02:00
Peter Janes
2a2ff2d545 GoComics no longer has nav on the comic's home page. 2018-04-06 14:09:13 -04:00
Tobias Gruetzmacher
1fe98d2f7f Use a diferent div class for GoComics (fixes #102). 2018-03-23 00:29:40 +01:00
Tobias Gruetzmacher
2dbd3382f7 Update LeastICouldDo (fixes #99) 2017-12-15 00:00:25 +01:00
Tobias Gruetzmacher
75aa7207ea Some minor fixes to make some modules work again. 2017-11-27 01:04:35 +01:00
Tobias Gruetzmacher
405c4c0b43 Recreate SluggyFreelance module (fixes #96). 2017-11-26 20:23:33 +01:00
Damjan Košir
79a2516c61 deathbulge fix 2017-11-17 21:49:47 +13:00
Tobias Gruetzmacher
d88f6aeee3 Replace online tests with mocks.
We want to test our code, not the comic modules.
2017-10-15 14:54:44 +02:00
Damjan Košir
24862715d5 realised we have a scraper for CMS MenageA3 uses 2017-10-03 21:47:32 +13:00
Damjan Košir
0e0dcf1f8e redoing MenageA3 with ParserScraper (previous search regex was broken) 2017-10-02 21:52:40 +13:00
Tobias Gruetzmacher
6369203bc0 Merge pull request #92 from clonejo/feature/commitstrip
add a comic plugin for CommitStrip
2017-09-20 22:46:46 +02:00
Damjan Košir
89a902651c Merge remote-tracking branch 'origin/master' 2017-09-19 22:36:48 +12:00
Damjan Košir
a9d7b4de12 added Deathbulge 2017-09-19 22:36:19 +12:00
clonejo
331faae3ea
add a comic plugin for CommitStrip 2017-09-18 21:31:15 +02:00
glyphy
ad8374d7b8 Fixing the Menagea3 plugin (#91)
I've changed the menagea3 plugin so it should work with the
new directory structure found on the site.
2017-09-04 21:19:46 +02:00
Tobias Gruetzmacher
7e0adf1d96 Unify more WordPress-based modules. 2017-05-22 01:17:05 +02:00
Tobias Gruetzmacher
42f66c07b0 Random module fixes. 2017-05-22 00:30:31 +02:00
Tobias Gruetzmacher
a99098d5ad Update GoComics module. 2017-05-21 23:10:32 +02:00
Tobias Gruetzmacher
1400879dc8 Fix another set of modules (e, k). 2017-05-17 00:11:29 +02:00
Tobias Gruetzmacher
8b90aa5cfb Some minor style fixes. 2017-05-15 00:54:02 +02:00
Tobias Gruetzmacher
b8484cde50 Fix some more modules. 2017-05-15 00:27:28 +02:00
Tobias Gruetzmacher
ddd3fb418c Remove some broken comics from ComicFury module. 2017-05-14 22:45:12 +02:00
Tobias Gruetzmacher
09687c91f4 Fix some SmackJeeves comics. 2017-05-12 00:32:25 +02:00
Tobias Gruetzmacher
593975d907 Minor cleanups for new modules (see #84). 2017-04-16 01:28:17 +02:00
Tim Brier
233da3e052 Add support for SurvivingTheWorld and TumbleDryComics (#84) 2017-04-16 01:11:30 +02:00
Tobias Gruetzmacher
0973570295 Fix a bunch of modules. 2017-04-16 01:06:41 +02:00
Tobias Gruetzmacher
e6f18a2027 Clean up ComicGenesis 2017-02-27 18:20:54 +01:00
Tobias Gruetzmacher
abb72a3a24 Fix CloneManga modules. 2017-02-13 23:41:45 +01:00
Tobias Gruetzmacher
ebbb27d05d Move xpath_class to helpers module. 2017-02-13 22:41:17 +01:00
Tobias Gruetzmacher
20ab279cde Clean up SmackJeeves...
Currently only covers already existing modules: Removed 11 broken
modules, added 2 and tried to update comic names and the adult and
endOfLife flags from their index. This isn't helped by the fact that
their search seems to skip some comics...
2017-02-13 01:46:49 +01:00
Tobias Gruetzmacher
83187b0554 Fix ViiviJaWagner. 2017-02-12 20:29:57 +01:00
Tobias Gruetzmacher
657e61811d Update list of old and removed modules. 2017-02-12 20:17:07 +01:00
Tobias Gruetzmacher
3b6af33ecb Some small module fixes. 2017-02-12 20:15:25 +01:00
Tobias Gruetzmacher
5359dd8629 Update ComicFury again... 2017-02-12 19:50:51 +01:00
Tobias Gruetzmacher
9895014655 Fix PHD with an ugly hack... 2017-02-12 16:21:36 +01:00
Tobias Gruetzmacher
b57945efd1 Update GoComic modules. 2017-02-12 12:21:01 +01:00
Tobias Gruetzmacher
ebe98bc8ba Fix some modules. 2017-02-12 02:16:38 +01:00
Tobias Gruetzmacher
20ca5d7fc2 Fix some modules. 2017-02-06 00:05:05 +01:00
gruetzkopf
edb49faa8b Add support for 'The Monster under the Bed' 2017-01-22 00:11:05 +01:00
Tobias Gruetzmacher
c4a184d173 Remove some vanished modules. 2017-01-12 02:01:10 +01:00
Tobias Gruetzmacher
36ac459bed Add removed GoComics modules to old list. 2017-01-12 01:22:13 +01:00
Tobias Gruetzmacher
a183e812ae Update GoComics module for new site layout.
(fixes #77)
2017-01-11 02:21:05 +01:00
Tobias Gruetzmacher
061efaac6e New module for ComicSherpa (removed from GoComics) 2017-01-11 01:34:52 +01:00
John Safrit
969e633877 Fix pattern for The Devils Panties 2017-01-08 17:39:59 -05:00
Tobias Gruetzmacher
3f9feec041 Allow modules to ignore some HTTP error codes.
This is neccessary since it seems some webservers out there are
misconfigured to deliver actual content with an HTTP error code...
2016-11-01 18:25:02 +01:00
Tobias Gruetzmacher
46b7a374f6 Small GoComics update. 2016-11-01 02:51:00 +01:00
Tobias Gruetzmacher
f7f4e130bf Small fix to the WLP module. 2016-11-01 02:27:29 +01:00
Tobias Gruetzmacher
bc755d09a3 Apply link modifier to all links.
This was previously only the "previous link modifier", now it can also
modify "next" and "latest" links. Additionally, the modifier is given
the current URL, so those cases can be distinguished.
2016-11-01 01:50:44 +01:00
Tobias Gruetzmacher
7fc05f75f5 Remove broken PetiteSymphony comics. 2016-10-31 07:16:10 +01:00
Tobias Gruetzmacher
69e6318f87 Remove ScurryAndCover, too much JavaScript. 2016-10-31 07:04:00 +01:00
Tobias Gruetzmacher
47e2502ec7 Fix a bunch of comic modules. 2016-10-31 06:57:47 +01:00
Tobias Gruetzmacher
446b81fc45 Fix Wumo and friends. 2016-10-30 15:28:54 +01:00
Tobias Gruetzmacher
51ed898f5d Fix some SmackJeeves comics. 2016-10-30 14:30:45 +01:00
Tobias Gruetzmacher
b6d99945f6 Merge pull request #73 from acaranta/master
Added several SmackJeeves Comics
2016-10-30 11:55:17 +01:00
Tobias Gruetzmacher
3b9f30affd Update ComicFury modules. 2016-10-30 11:04:45 +01:00
Tobias Gruetzmacher
9a6a310b76 Fixup copyright years. 2016-10-29 00:21:41 +02:00
acaranta
83880a3cbd corrected RainbowMansion 2016-10-27 09:53:34 +02:00
acaranta
0ed823175c Added even more Smackjeeves comics 2016-10-27 06:58:57 +02:00
acaranta
a5c9a3c35c Added several SmackJeeves Comics 2016-10-26 05:25:13 +02:00
Peter Brunner
19445a83ae Fix smbc 2016-10-18 21:28:42 -04:00
Tobias Gruetzmacher
06be2a026b Move some ex-KeenSpot comics to shorter names. 2016-10-14 14:23:33 +02:00
Tobias Gruetzmacher
b17d6e5f22 Rework/fix KeenSpot modules. 2016-10-14 00:14:53 +02:00
Tobias Gruetzmacher
064e7976ec Add namer for Extra Fabulous Comics. 2016-10-06 00:42:50 +02:00
mostlyuseful
fce7dfff19 Add "Extra Fabulous Comics" comic 2016-10-04 17:06:50 +02:00
Tobias Gruetzmacher
f342a93aa1 Update GoComics module. 2016-10-01 03:39:36 +02:00
Tobias Gruetzmacher
c0d945a563 Update ComicFury modules. 2016-10-01 02:52:33 +02:00
Tobias Gruetzmacher
98c98ddfab Fix some more comic modules (c-f). 2016-09-30 00:15:45 +02:00
Tobias Gruetzmacher
b1d2650615 Fix some modules (a&b). 2016-09-29 01:29:01 +02:00
Damjan Košir
c04c62e92b xkcd now hone with xpaths 2016-08-18 21:28:25 +12:00
Damjan Košir
9ba184eb43 fixing LoadingArtist 2016-08-16 21:20:35 +12:00
Hubert Figuière
afcd19bf5b Added Prince of Sartar Comic 2016-08-08 09:18:33 -04:00
Hubert Figuière
81821dc450 Added Space Junk Arlia comic 2016-08-08 09:18:33 -04:00
Tobias Gruetzmacher
215d597573 Remove DrunkDuck for now.
- It's been disabled for ages
- Needs a major rework
- I don't want to add that many comics anyways...
- This also gets rid of make_scraper :)
2016-06-05 22:22:17 +02:00
Tobias Gruetzmacher
67d0d38100 Migrate SnafuComics to single-class module. 2016-06-05 22:12:16 +02:00
Tobias Gruetzmacher
df2048cb34 Keep track of removed and moved comics (fixes #41).
I plan on keeping this list for at least ~ 2 releases and then purging
older entries...
2016-06-05 21:47:58 +02:00
Tobias Gruetzmacher
9b755a7e6c Restore BobWhite. 2016-06-05 18:32:27 +02:00
Tobias Gruetzmacher
844bec09ba Remove another dead comic from ComicFury. 2016-06-05 01:06:04 +02:00
André-Patrick Bubel
2b8e948868 Add String Theory comic 2016-06-01 11:19:17 +00:00
André-Patrick Bubel
192751073c Add KillSixBillionDemons comic 2016-05-31 07:28:32 +00:00
Tobias Gruetzmacher
807bee6342 Migrate GoComics to single-class module. 2016-05-23 00:01:10 +02:00
Tobias Gruetzmacher
2c8e57bdea Migrate Creators to single-class module. 2016-05-22 23:56:59 +02:00
Tobias Gruetzmacher
f5dff27b0a Migrate SmackJeeves to single-class module. 2016-05-22 23:54:21 +02:00
Tobias Gruetzmacher
1ea20e1743 Migrate WebcomicFactory to single-class module. 2016-05-22 23:40:58 +02:00
Tobias Gruetzmacher
c62a7283a2 Migrate ComicFury to single-class module. 2016-05-22 23:31:53 +02:00
Tobias Gruetzmacher
1834bf179f Migrate Arcamax to single-class module. 2016-05-22 23:17:24 +02:00
Tobias Gruetzmacher
f29472c143 Make auto-update script more flexible. 2016-05-22 23:06:05 +02:00
Tobias Gruetzmacher
e4650d5941 Remove make_scraper from Nitrocosm. 2016-05-21 14:35:53 +02:00
Tobias Gruetzmacher
b6eb8ab8ef Remove make_scraper from SandraAndWoo 2016-05-21 14:12:11 +02:00
Tobias Gruetzmacher
4630ea047c Implement Oglaf's strange navigation (fixes #33)
(also should fix wummel#91)
2016-05-21 02:38:07 +02:00
Tobias Gruetzmacher
51008a975b Refactor: Introduce generator methods for scrapers
This allows one comic module class to generate multiple scrapers. This
change is to support a more dynamic module system as described in #42.
2016-05-21 01:29:36 +02:00
Tobias Gruetzmacher
89cfd9d310 Add comics from catomix.com. 2016-05-16 23:55:41 +02:00
Tobias Gruetzmacher
a6cf4e7040 Fix some more comic modules. 2016-05-16 23:16:29 +02:00
Tobias Gruetzmacher
be1a63da0c Update GoComics comic list. 2016-05-16 18:26:45 +02:00
Tobias Gruetzmacher
b9d9564085 Fix Dilbert (fixes #44). 2016-05-16 01:21:23 +02:00
Tobias Gruetzmacher
e9b3c487c0 Remove some dead comics. 2016-05-16 01:10:20 +02:00
Tobias Gruetzmacher
bd60155d9f Some more ComicFury comics gone... 2016-05-16 00:53:22 +02:00
Tobias Gruetzmacher
849e60e795 Remove make_scraper magic from webcomiceu. 2016-05-07 03:20:01 +02:00
Tobias Gruetzmacher
975d2376bf Another round of comic module fixes. 2016-05-07 01:50:10 +02:00
Tobias Gruetzmacher
efe1308db2 Replace home-grown Python2/3 compat. with six. 2016-05-05 23:33:48 +02:00
Tobias Gruetzmacher
77ed0218e0 Fix some comic modules. 2016-05-05 20:55:14 +02:00
Tobias Gruetzmacher
bb2ac39639 Fix some URLs. 2016-05-05 10:12:03 +02:00
Tobias Gruetzmacher
d05316e3ac Seems ComicFury is deleting comics regularly...
Well, there's nothing we can do: Remove them.
2016-05-04 08:26:53 +02:00
Tobias Gruetzmacher
0c1aa9e8bd Move libxml < 2.9.3 workaround to base class. 2016-05-02 23:22:06 +02:00
Tobias Gruetzmacher
b93a8fde65 Move PensAndTales comics and fix them. 2016-05-02 22:32:14 +02:00
Tobias Gruetzmacher
4006ced43d Move all HijinksEnsue comics into alphabetic files. 2016-05-02 01:25:34 +02:00
Tobias Gruetzmacher
d5f91ecfd2 Fix some modules in m.py. 2016-04-30 01:59:28 +02:00
Tobias Gruetzmacher
1d52d33311 Remove missing SmackJeeves comics. 2016-04-30 00:56:20 +02:00
Tobias Gruetzmacher
d796f3476c Fix some modules in d.py. 2016-04-30 00:44:18 +02:00
Tobias Gruetzmacher
cc16fea880 Fix some modules in c.py 2016-04-29 00:35:02 +02:00
Tobias Gruetzmacher
1d94439715 Fix some more comic modules. 2016-04-27 00:31:27 +02:00
Tobias Gruetzmacher
8b1ac4eb35 Fix "tagsoup" on SmackJeeves
Unfortunatly, browsers render < outside of HTML tags differently then
libXML until recently (libXML 2.9.3), so we need to preprocess pages
before parsing them...

(This was fixed in libXML commit 140c25)
2016-04-26 08:05:38 +02:00
Tobias Gruetzmacher
8ddf553eb4 Fix some more SmackJeeves modules. 2016-04-22 01:04:47 +02:00
Tobias Gruetzmacher
6574997e01 Refactor: All the other class methods.
Turns out, it would have been better if all methods had been instance
methods and not class methods. This finished a big chunk of the rework
needed for #42.
2016-04-21 23:52:31 +02:00
Tobias Gruetzmacher
0d436b8ca9 Refactor: url modifiers to normal methods.
As before, to implement #42 these might want to access information from
the instance, so they should be normal methods.
2016-04-21 21:39:25 +02:00
Tobias Gruetzmacher
c3f32dfef7 Refactor: Make namer a method.
When #42 is realized, the naming of files might differ between comic
modules, so the namer's logical location is the instance, not the class.
2016-04-21 08:20:49 +02:00
Tobias Gruetzmacher
fe51a449df Update SmackJeeves
- Now uses _ParserScraper, which makes the pattern quite a bit more
  generic and IMHO more readable
- remove make_scraper magic
- No new comics, only fixed existing ones and removed some dead ones.
2016-04-20 23:36:45 +02:00
Tobias Gruetzmacher
13a3409854 Remove some comics that are gone or block us. 2016-04-17 19:42:43 +02:00
Tobias Gruetzmacher
1fbc844077 Update GoComics. 2016-04-17 18:40:09 +02:00
Tobias Gruetzmacher
73e958670d Update ComicFury (again). 2016-04-17 16:19:44 +02:00
Tobias Gruetzmacher
3329027e4b Update ComicFury. 2016-04-16 13:13:47 +02:00
Tobias Gruetzmacher
ee99c087d7 Remove prevUrlMatchesStripUrl.
It was only used for one test.
2016-04-16 01:14:26 +02:00
Tobias Gruetzmacher
52515b5fc5 Update GoComics. 2016-04-15 00:26:14 +02:00
Tobias Gruetzmacher
031a523846 Fix SnafuComics. 2016-04-14 23:52:35 +02:00
Tobias Gruetzmacher
7626b1e100 Webcomics Nation is gone. 2016-04-14 22:46:52 +02:00
Tobias Gruetzmacher
497653c448 Remove make_scraper magic from Arcamax. 2016-04-14 00:17:59 +02:00
Tobias Gruetzmacher
db87ed95e7 Use new features to make modules simpler. 2016-04-13 23:28:43 +02:00
Tobias Gruetzmacher
060281e5ff Use concrete scraper objects everywhere.
This is a first step for #42. Since most access to the scraper classes
is through instances, modules can now dynamically override url and name
(name is now a property).
2016-04-13 22:17:30 +02:00
Tobias Gruetzmacher
0468f2f31a Refactor: Convert starter to simple method. 2016-04-13 20:01:51 +02:00
Tobias Gruetzmacher
16004e43e4 Use default bounceStarter for site modules. 2016-04-13 01:24:13 +02:00
Tobias Gruetzmacher
42e43fa4e6 Read starter parameters from class.
This allows to specify starters in a more declarative and dynamic way.
2016-04-12 23:11:39 +02:00
Tobias Gruetzmacher
b865a171f9 Remove some broken comics. 2016-04-12 08:21:06 +02:00
Tobias Gruetzmacher
4e2e4ac529 Prevent scraper from moving to a different comic. 2016-04-12 08:10:47 +02:00
Tobias Gruetzmacher
443ab119e9 Refresh GoComics list from online directory. 2016-04-12 00:36:33 +02:00
Tobias Gruetzmacher
0e385a3697 Update GoComics (no change in supported comics)
- remove make_scraper magic
- switch to _ParserScraper
2016-04-11 22:42:01 +02:00
Tobias Gruetzmacher
ad7a297964 Fix WLP comics. 2016-04-11 01:07:21 +02:00
Damjan Košir
af2e57d850 Added comic ScurryAndCover...
- Yay, funky JavaScript parsing!
- Start page isn't latest comic...

Updated-by: Tobias Gruetzmacher <tobias-git@23.gs>
2016-04-11 00:09:53 +02:00
Tobias Gruetzmacher
fa98f6ddbf Move more comics to common WordPressScraper. 2016-04-10 23:04:34 +02:00
Tobias Gruetzmacher
bb5b6ffcec Fix comics in module a.py. 2016-04-07 23:21:31 +02:00
Tobias Gruetzmacher
0033a8046b Fix creators module. 2016-04-07 00:20:03 +02:00
Tobias Gruetzmacher
8768ff07b6 Fix AhoiPolloi, be a bit smarter about encoding.
HTML character encoding in the context of HTTP is quite tricky to get
right and honestly, I'm not sure if I did get it right this time. But I
think, the current behaviour matches best what web browsers try to do:

1. Let Requests figure out the content from the HTTP header. This
   overrides everything else. We need to "trick" LXML to accept our
   decision if the document contains an XML declaration which might
   disagree with the HTTP header.
2. If the HTTP headers don't specify any encoding, let LXML guess the
   encoding and be done with it.
2016-04-06 22:22:22 +02:00