dosage/dosagelib/plugins/x.py

# -*- coding: utf-8 -*-
# Copyright (C) 2004-2008 Tristan Seligmann and Jonathan Jacobs
# Copyright (C) 2012-2014 Bastian Kleineidam
# Copyright (C) 2015-2020 Tobias Gruetzmacher
# Copyright (C) 2019-2020 Daniel Ring
from ..scraper import _ParserScraper
from ..helpers import bounceStarter


class Xkcd(_ParserScraper):
    name = 'xkcd'
    url = 'https://xkcd.com/'
    starter = bounceStarter
    stripUrl = url + '%s/'
    firstStripUrl = stripUrl % '1'
    imageSearch = '//div[@id="comic"]//img'
    prevSearch = '//a[@rel="prev"]'
    nextSearch = '//a[@rel="next"]'
    help = 'Index format: n (unpadded)'
    textSearch = '//div[@id="comic"]//img/@title'

    def namer(self, image_url, page_url):
        index = int(page_url.rstrip('/').rsplit('/', 1)[-1])
        name = image_url.rsplit('/', 1)[-1].split('.')[0]
        return '%03d-%s' % (index, name)

    def imageUrlModifier(self, url, data):
        if url and '/large/' in data:
            return url.replace(".png", "_large.png")
        return url

    def shouldSkipUrl(self, url, data):
        return url in (
            self.stripUrl % '1663',  # Garden
        )
Read starter parameters from class. This allows to specify starters in a more declarative and dynamic way. 2016-04-12 21:11:39 +00:00			`# -- coding: utf-8 --`
Fixup copyright years. 2016-10-28 22:21:41 +00:00			`# Copyright (C) 2004-2008 Tristan Seligmann and Jonathan Jacobs`
Updated copyright. 2014-01-05 15:50:57 +00:00			`# Copyright (C) 2012-2014 Bastian Kleineidam`
Add self to authors list, update copyright headers 2020-01-13 06:34:05 +00:00			`# Copyright (C) 2015-2020 Tobias Gruetzmacher`
			`# Copyright (C) 2019-2020 Daniel Ring`
xkcd now hone with xpaths 2016-08-18 09:28:25 +00:00			`from ..scraper import _ParserScraper`
A lot of refactoring. 2012-10-11 10:03:12 +00:00			`from ..helpers import bounceStarter`
Initial commit to Github. 2012-06-20 19:58:13 +00:00
Some minor style fixes. 2017-05-14 22:54:02 +00:00
xkcd now hone with xpaths 2016-08-18 09:28:25 +00:00			`class Xkcd(_ParserScraper):`
Skip non-image on xkcd. 2016-04-05 22:47:47 +00:00			`name = 'xkcd'`
Replace online tests with mocks. We want to test our code, not the comic modules. 2017-10-12 22:34:37 +00:00			`url = 'https://xkcd.com/'`
Refactor: Convert starter to simple method. 2016-04-13 18:01:51 +00:00			`starter = bounceStarter`
Always have an url attribute in comic scrapers. 2013-02-04 20:00:26 +00:00			`stripUrl = url + '%s/'`
Add firstStripUrls. 2013-04-10 21:57:09 +00:00			`firstStripUrl = stripUrl % '1'`
Fix xkcd pages where comic is linked 2019-12-05 06:10:49 +00:00			`imageSearch = '//div[@id="comic"]//img'`
xkcd now hone with xpaths 2016-08-18 09:28:25 +00:00			`prevSearch = '//a[@rel="prev"]'`
			`nextSearch = '//a[@rel="next"]'`
Initial commit to Github. 2012-06-20 19:58:13 +00:00			`help = 'Index format: n (unpadded)'`
Fix xkcd pages where comic is linked 2019-12-05 06:10:49 +00:00			`textSearch = '//div[@id="comic"]//img/@title'`
Initial commit to Github. 2012-06-20 19:58:13 +00:00
Refactor: Make namer a method. When #42 is realized, the naming of files might differ between comic modules, so the namer's logical location is the instance, not the class. 2016-04-21 06:20:49 +00:00			`def namer(self, image_url, page_url):`
Skip non-image on xkcd. 2016-04-05 22:47:47 +00:00			`index = int(page_url.rstrip('/').rsplit('/', 1)[-1])`
			`name = image_url.rsplit('/', 1)[-1].split('.')[0]`
Fix some comics. 2012-11-26 06:13:32 +00:00			`return '%03d-%s' % (index, name)`
Store large xkcd images. 2013-12-04 16:56:54 +00:00
Refactor: url modifiers to normal methods. As before, to implement #42 these might want to access information from the instance, so they should be normal methods. 2016-04-21 19:28:41 +00:00			`def imageUrlModifier(self, url, data):`
Store large xkcd images. 2013-12-04 16:56:54 +00:00			`if url and '/large/' in data:`
			`return url.replace(".png", "_large.png")`
			`return url`
Skip non-image on xkcd. 2016-04-05 22:47:47 +00:00
			`def shouldSkipUrl(self, url, data):`
			`return url in (`
			`self.stripUrl % '1663', # Garden`
			`)`