dosage/scripts/webcomicfactory.py

#!/usr/bin/env python3
# SPDX-License-Identifier: MIT
# Copyright (C) 2004-2008 Tristan Seligmann and Jonathan Jacobs
# Copyright (C) 2012-2014 Bastian Kleineidam
# Copyright (C) 2015-2016 Tobias Gruetzmacher
"""
Script to get WebComicFactory comics and save the info in a JSON file for
further processing.
"""
from scriptutil import ComicListUpdater


class WebComicFactoryUpdater(ComicListUpdater):

    def find_first(self, url):
        data = self.get_url(url)

        firstlinks = data.cssselect('a.comic-nav-first')
        if not firstlinks:
            print("INFO:", "No first link on »%s«, already first page?" %
                  (url))
            return url
        return firstlinks[0].attrib['href']

    def collect_results(self):
        """Parse start page for supported comics."""
        url = 'http://www.thewebcomicfactory.com/'
        data = self.get_url(url)

        for comicdiv in data.cssselect('div.ceo_thumbnail_widget'):
            comicname = comicdiv.cssselect('h2')[0]
            comiclink = comicdiv.cssselect('a')[0]
            comicurl = comiclink.attrib['href']
            name = comicname.text
            if 'comic-color-key' in comicurl:
                continue
            comicurl = self.find_first(comicurl)
            self.add_comic(name, comicurl)

    def get_entry(self, name, url):
        return (u"cls('%s',\n    '%s')," % (name, url))


if __name__ == '__main__':
    WebComicFactoryUpdater(__file__).run()
Assume developers are using virtualenvs If this is true, we don't have to mess with sys.path or provide a "convenience" launcher anymore. 2020-10-01 12:11:11 +00:00			`#!/usr/bin/env python3`
Update file headers The default encoding for source files is UTF-8 since Python 3, so we can drop all encoding headers. While we are at it, just replace them with SPDX headers. 2020-04-18 11:45:44 +00:00			`# SPDX-License-Identifier: MIT`
Fixup copyright years. 2016-10-28 22:21:41 +00:00			`# Copyright (C) 2004-2008 Tristan Seligmann and Jonathan Jacobs`
Clean up update helper scripts. 2016-04-12 22:52:16 +00:00			`# Copyright (C) 2012-2014 Bastian Kleineidam`
Move WebcomicFactory in its own module. Also, add an updater script for it. 2016-04-03 19:31:56 +00:00			`# Copyright (C) 2015-2016 Tobias Gruetzmacher`
			`"""`
			`Script to get WebComicFactory comics and save the info in a JSON file for`
			`further processing.`
			`"""`
Refactor update helpers: Remove duplicate code. 2016-04-14 20:22:37 +00:00			`from scriptutil import ComicListUpdater`
Move WebcomicFactory in its own module. Also, add an updater script for it. 2016-04-03 19:31:56 +00:00

Refactor update helpers: Remove duplicate code. 2016-04-14 20:22:37 +00:00			`class WebComicFactoryUpdater(ComicListUpdater):`
Move WebcomicFactory in its own module. Also, add an updater script for it. 2016-04-03 19:31:56 +00:00
Refactor update helpers: Remove duplicate code. 2016-04-14 20:22:37 +00:00			`def find_first(self, url):`
			`data = self.get_url(url)`
Move WebcomicFactory in its own module. Also, add an updater script for it. 2016-04-03 19:31:56 +00:00
Refactor update helpers: Remove duplicate code. 2016-04-14 20:22:37 +00:00			`firstlinks = data.cssselect('a.comic-nav-first')`
			`if not firstlinks:`
			`print("INFO:", "No first link on »%s«, already first page?" %`
			`(url))`
			`return url`
			`return firstlinks[0].attrib['href']`
Move WebcomicFactory in its own module. Also, add an updater script for it. 2016-04-03 19:31:56 +00:00
Refactor update helpers: Remove duplicate code. 2016-04-14 20:22:37 +00:00			`def collect_results(self):`
			`"""Parse start page for supported comics."""`
			`url = 'http://www.thewebcomicfactory.com/'`
			`data = self.get_url(url)`
Move WebcomicFactory in its own module. Also, add an updater script for it. 2016-04-03 19:31:56 +00:00
Refactor update helpers: Remove duplicate code. 2016-04-14 20:22:37 +00:00			`for comicdiv in data.cssselect('div.ceo_thumbnail_widget'):`
			`comicname = comicdiv.cssselect('h2')[0]`
			`comiclink = comicdiv.cssselect('a')[0]`
			`comicurl = comiclink.attrib['href']`
			`name = comicname.text`
			`if 'comic-color-key' in comicurl:`
			`continue`
			`comicurl = self.find_first(comicurl)`
			`self.add_comic(name, comicurl)`
Move WebcomicFactory in its own module. Also, add an updater script for it. 2016-04-03 19:31:56 +00:00
Make auto-update script more flexible. 2016-05-22 20:55:06 +00:00			`def get_entry(self, name, url):`
Migrate WebcomicFactory to single-class module. 2016-05-22 21:40:58 +00:00			`return (u"cls('%s',\n '%s')," % (name, url))`
Move WebcomicFactory in its own module. Also, add an updater script for it. 2016-04-03 19:31:56 +00:00

			`if __name__ == '__main__':`
Refactor update helpers: Remove duplicate code. 2016-04-14 20:22:37 +00:00			`WebComicFactoryUpdater(__file__).run()`