dosage/dosagelib/plugins/h.py

# -*- coding: utf-8 -*-
# Copyright (C) 2004-2005 Tristan Seligmann and Jonathan Jacobs
# Copyright (C) 2012-2014 Bastian Kleineidam
# Copyright (C) 2015-2016 Tobias Gruetzmacher

from __future__ import absolute_import, division, print_function

from re import compile, escape
from ..scraper import _BasicScraper
from ..util import tagre
from ..helpers import bounceStarter
from .common import _WordPressScraper


class HagarTheHorrible(_BasicScraper):
    url = 'http://www.hagarthehorrible.net/'
    stripUrl = 'http://www.hagardunor.net/comicstrips_us.php?serietype=9&colortype=1&serieno=%s'
    firstStripUrl = stripUrl % '1'
    multipleImagesPerStrip = True
    imageSearch = compile(tagre("img", "src", r'(stripus\d+/(?:Hagar_The_Horrible_?|h)\d+[^ >]+)', quote=""))
    prevUrl = r'(comicstrips_us\.php\?serietype\=9\&colortype\=1\&serieno\=\d+)'
    prevSearch = compile(tagre("a", "href", prevUrl, after="Previous"))
    help = 'Index format: number'

    @classmethod
    def starter(cls):
        """Return last gallery link."""
        url = 'http://www.hagardunor.net/comics.php'
        data = cls.getPage(url)
        pattern = compile(tagre("a", "href", cls.prevUrl))
        for starturl in cls.fetchUrls(url, data, pattern):
            pass
        return starturl


class HappyJar(_WordPressScraper):
    url = 'http://www.happyjar.com/'


class HarkAVagrant(_BasicScraper):
    url = 'http://www.harkavagrant.com/'
    rurl = escape(url)
    starter = bounceStarter(
        url, compile(tagre("a", "href", r'(%sindex\.php\?id=\d+)' % rurl) +
                     tagre("img", "src", "buttonnext.png")))
    stripUrl = url + 'index.php?id=%s'
    firstStripUrl = stripUrl % '1'
    imageSearch = compile(tagre("img", "src", r'(%s[^"]+)' % rurl,
                                after='BORDER'))
    prevSearch = compile(tagre("a", "href", r'(%sindex\.php\?id=\d+)' % rurl) +
                         tagre("img", "src", "buttonprevious.png"))
    help = 'Index format: number'

    @classmethod
    def namer(cls, imageUrl, pageUrl):
        filename = imageUrl.rsplit('/', 1)[1]
        num = pageUrl.rsplit('=', 1)[1]
        return '%s-%s' % (num, filename)


class Hipsters(_WordPressScraper):
    url = 'http://www.hipsters-comic.com/'
    firstStripUrl = 'http://www.hipsters-comic.com/comic/hip01/'


class HorribleVille(_BasicScraper):
    url = 'http://horribleville.com/'
    stripUrl = url + 'd/%s.html'
    firstStripUrl = stripUrl % '20051220'
    imageSearch = compile(tagre("img", "src", r'(/comics/[^"]+)'))
    prevSearch = compile(tagre("a", "href", r'(/d/[^"]+)') +
                         tagre("img", "src", r'/images/previous\.png'))
    help = 'Index format: yyyymmdd'
Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`# -- coding: utf-8 --`
			`# Copyright (C) 2004-2005 Tristan Seligmann and Jonathan Jacobs`
Updated copyright. 2014-01-05 15:50:57 +00:00			`# Copyright (C) 2012-2014 Bastian Kleineidam`
Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`# Copyright (C) 2015-2016 Tobias Gruetzmacher`

			`from __future__ import absolute_import, division, print_function`
Initial commit to Github. 2012-06-20 19:58:13 +00:00
Use re.escape and add some firstStripUrl. 2013-04-10 16:19:11 +00:00			`from re import compile, escape`
Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`from ..scraper import _BasicScraper`
Convert starters and other helpers to new interface. This allows those starters to work with future scrapers. 2014-07-23 18:53:59 +00:00			`from ..util import tagre`
Added some comic strips and cleanup the scraper code. 2013-03-06 19:00:30 +00:00			`from ..helpers import bounceStarter`
Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`from .common import _WordPressScraper`
Added some comic strips and cleanup the scraper code. 2013-03-06 19:00:30 +00:00

Add HagarTheHorrible 2013-03-26 16:35:10 +00:00			`class HagarTheHorrible(_BasicScraper):`
			`url = 'http://www.hagarthehorrible.net/'`
			`stripUrl = 'http://www.hagardunor.net/comicstrips_us.php?serietype=9&colortype=1&serieno=%s'`
			`firstStripUrl = stripUrl % '1'`
			`multipleImagesPerStrip = True`
Fix hagar. 2013-03-26 19:12:26 +00:00			`imageSearch = compile(tagre("img", "src", r'(stripus\d+/(?:Hagar_The_Horrible_?\|h)\d+[^ >]+)', quote=""))`
Add HagarTheHorrible 2013-03-26 16:35:10 +00:00			`prevUrl = r'(comicstrips_us\.php\?serietype\=9\&colortype\=1\&serieno\=\d+)'`
			`prevSearch = compile(tagre("a", "href", prevUrl, after="Previous"))`
			`help = 'Index format: number'`

			`@classmethod`
			`def starter(cls):`
			`"""Return last gallery link."""`
			`url = 'http://www.hagardunor.net/comics.php'`
Convert starters and other helpers to new interface. This allows those starters to work with future scrapers. 2014-07-23 18:53:59 +00:00			`data = cls.getPage(url)`
Add HagarTheHorrible 2013-03-26 16:35:10 +00:00			`pattern = compile(tagre("a", "href", cls.prevUrl))`
Convert starters and other helpers to new interface. This allows those starters to work with future scrapers. 2014-07-23 18:53:59 +00:00			`for starturl in cls.fetchUrls(url, data, pattern):`
Add HagarTheHorrible 2013-03-26 16:35:10 +00:00			`pass`
			`return starturl`


Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`class HappyJar(_WordPressScraper):`
			`url = 'http://www.happyjar.com/'`


Added some comic strips and cleanup the scraper code. 2013-03-06 19:00:30 +00:00			`class HarkAVagrant(_BasicScraper):`
			`url = 'http://www.harkavagrant.com/'`
Use re.escape and add some firstStripUrl. 2013-04-10 16:19:11 +00:00			`rurl = escape(url)`
Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`starter = bounceStarter(`
			`url, compile(tagre("a", "href", r'(%sindex\.php\?id=\d+)' % rurl) +`
			`tagre("img", "src", "buttonnext.png")))`
Added some comic strips and cleanup the scraper code. 2013-03-06 19:00:30 +00:00			`stripUrl = url + 'index.php?id=%s'`
			`firstStripUrl = stripUrl % '1'`
Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`imageSearch = compile(tagre("img", "src", r'(%s[^"]+)' % rurl,`
			`after='BORDER'))`
Use re.escape and add some firstStripUrl. 2013-04-10 16:19:11 +00:00			`prevSearch = compile(tagre("a", "href", r'(%sindex\.php\?id=\d+)' % rurl) +`
Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`tagre("img", "src", "buttonprevious.png"))`
Added some comic strips and cleanup the scraper code. 2013-03-06 19:00:30 +00:00			`help = 'Index format: number'`

			`@classmethod`
			`def namer(cls, imageUrl, pageUrl):`
			`filename = imageUrl.rsplit('/', 1)[1]`
			`num = pageUrl.rsplit('=', 1)[1]`
			`return '%s-%s' % (num, filename)`
Initial commit to Github. 2012-06-20 19:58:13 +00:00

Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`class Hipsters(_WordPressScraper):`
			`url = 'http://www.hipsters-comic.com/'`
			`firstStripUrl = 'http://www.hipsters-comic.com/comic/hip01/'`


Initial commit to Github. 2012-06-20 19:58:13 +00:00			`class HorribleVille(_BasicScraper):`
Always have an url attribute in comic scrapers. 2013-02-04 20:00:26 +00:00			`url = 'http://horribleville.com/'`
			`stripUrl = url + 'd/%s.html'`
Add firstStripUrls. 2013-04-10 21:57:09 +00:00			`firstStripUrl = stripUrl % '20051220'`
Fix some comics. 2012-11-21 20:57:26 +00:00			`imageSearch = compile(tagre("img", "src", r'(/comics/[^"]+)'))`
Remove make_scraper for most WordPress comics. - Dropped KatzenfutterGeleespritzer, because robots.txt. - Move all WordPress/ComicPress scrapers into alphabetical files. - Move _WordPressScraper & _ComicPress scraper into common.py. - Some smaller PEP8 fixes. 2016-04-01 22:14:31 +00:00			`prevSearch = compile(tagre("a", "href", r'(/d/[^"]+)') +`
			`tagre("img", "src", r'/images/previous\.png'))`
Fix some comics. 2012-11-21 20:57:26 +00:00			`help = 'Index format: yyyymmdd'`