Update the "adding new comics" for parser base class

2023-06-01 23:03:59 +02:00 · 2023-06-01 23:03:59 +02:00 · 694e6fe290
commit 694e6fe290
parent c5f87dee83
1 changed files with 70 additions and 54 deletions
--- a/doc/adding_new_comics.md
+++ b/doc/adding_new_comics.md
@ -1,82 +1,98 @@
 # Adding a comic to Dosage
-To add a new comic, add a new class in one of the *.py files
+To add a new comic to a local dosage installation, drop a python file into
-in the dosagelib/plugins module.
+Dosage's "user plugin directory" - If you don't know where that is, run `dosage
 --help`, the directory will be shown at the end.
-The files in dosagelib/plugins and the classes inside those files are
+Here is a complete example which is explained in detail below. Dosage provides
-sorted alphabetically. Add your comic to the appropriate filename.
+different base classes for parsing comic pages, but this tutorial only covers
-For example if the comic name is "Super duper comic", the new class
+the modern `ParserScraper` base class, which uses an HTML parser (LXML/libxml)
-should be added to dosagelib/plugins/s.py.
+to find  on each pages's DOM.
-Here is a complete example which is explained in detail below.
+```python
 from ..scraper import ParserScraper
-```
+class SuperDuperComic(ParserScraper):
-class SuperDuperComic(_BasicScraper):
+    url = 'https://superdupercomic.com/'
    url = 'http://superdupercomic.com/'
    rurl = escape(url)
    stripUrl = url + 'comics/%s'
    firstStripUrl = stripUrl % '1'
-    imageSearch = compile(tagre("img", "src", r'(%simg/[^"]+)' % rurl))
+    imageSearch = '//div[d:class("comicpane")]//img'
-    prevSearch = compile(tagre("a", "href", r'(%scomics/\d+)' % rurl, after="prev"))
+    prevSearch = '//a[@rel="prev"]'
    help = 'Index format: n (unpadded)'
 ```
 Let's look at each line in detail.
-```class SuperDuperComic(_BasicScraper):```
+```python
 class SuperDuperComic(ParserScraper):
 ```
-All comic plugin classes inherit from ``_BasicScraper``.
+All comic plugin classes inherit from `ParserScraper`. The class name
-The classname (``SuperDuperComic`` in our example) must be unique,
+(`SuperDuperComic` in our example) must be unique, regardless of upper/lower
-regardless of upper/lower characters.
+characters. The user finds comics with this class name, so be sure to select
 The user finds comics with this classname, so be sure to select
 something descriptive and easy to remember.
-```url = 'http://superdupercomic.com/'```
+```python
 url = 'https://superdupercomic.com/'
 ```
-The URL must display the latest comic picture. This is where the
+The URL must display the latest comic picture. This is where the comic image
-comic image search will start. See below for some special cases.
+search will start. See below for some special cases.
-```rurl = escape(url)```
+```python
 stripUrl = url + 'comics/%s'
 ```
-This defines a variable ``rurl`` which is used in the search patterns
+This defines how a comic strip URL looks like. In our example, all comic strip
-below. It properly escapes all regular expression special characters
+URLs look like `https://superdupercomic.com/comics/NNN` where NNN is the
-like dots or question marks.
+increasing comic number.
-```stripUrl = url + 'comics/%s'```
+```python
 firstStripUrl = stripUrl % '1'
 ```
-This defines how a comic strip URL looks like. In our example, all
+This tells Dosage what the earliest comic strip URL looks like. Dosage stops
-comic strip URLs look like ``http://superdupercomic.com/comics/NNN``
+searching for more comics when it is encounterd. In our example comic numbering
-where NNN is the increasing comic number.
+starts with `1`, so the oldest comic URL is
 `https://superdupercomic.com/comics/1`
-```firstStripUrl = stripUrl % '1'```
+```python
 imageSearch = '//div[d:class("comicpane")]//img'
 ```
-This tells Dosage what the earliest comic strip URL looks like. Dosage
+Each comic page URL has one or more comic strip images. The `imageSearch`
-stops searching for more comics when it is encounterd. In our example
+defines an [XPath](https://quickref.me/xpath) expression to find the comic
-comic numbering starts with ``1``, so the oldest comic URL is
+strip image inside each page. Most of the time you can use your browser's
-``http://superdupercomic.com/comics/1``
+console (Open with `F12`) to experiment on the real page. Dosage adds a custom
 XPath function (`d:class`) to make it easier to match HTML classes.
-```imageSearch = compile(tagre("img", "src", r'(%simg/[^"]+)' % rurl))```
+```python
 prevSearch = '//a[@rel="prev"]'
 ```
-Each comic page URL has one or more comic strip images. The imageSearch
+To search for more comics, Dosage has to look for the previous comic URL. This
-pattern must match those images in the HTML content of the page URL.
+property defines an XPath expression to find a link to the previous comic page.
 To make it easy to match HTML tags, the ``tagre()`` function is
 helpful. The first parameter is the tag name, the second the attribute
 name and the third the attribute value. So in our example the given
 pattern whould match a tag like
 ``<img src="http://superdupercomic.com/img/comic1.jpg" />``` .
-```prevSearch = compile(tagre("a", "href", r'(%scomics/\d+)' % rurl, after="prev"))```
+```python
 help = 'Index format: n (unpadded)'
 ```
-To search for more comics, Dosage has to look for the previous comic URL.
+Since the user can search comics from a given start point, the help can
-The ``after=`` value in ``tagre()`` matches anything between the
+describe how the comic is numbered. Running `dosage superdupercomic:100` would
-attribute value and the end of the tag.
+start getting comics from number 100 and earlier.
 So this pattern assumes each comic page URL has a link to the previous
 comic, for example ``http://superdupercomic.com/comics/100`` has a
 link ``<a href="http://superdupercomic.com/comics/99" class="prev">``.
-``help = 'Index format: n (unpadded)'``
+## Contribute a module to dosage
-Since the user can search comics from a given start point, the help
+If you don't know how to use git and/or setup a Python development environment,
-must describe how the comic is numbered. Running
+that's fine! You can [create an
-``dosage superdupercomic:100`` would start getting comics from number
+issue](https://github.com/webcomics/dosage/issues/new) on GitHub and paste the
-100 and earlier.
+source of your new module into it and a Dosage developer will take care of
 integrating the module into Dosage.
 Otherwise, integrate your new comic module into in one of the `*.py` files in
 the dosagelib/plugins module.
 The files in dosagelib/plugins and the classes inside those files are sorted
 alphabetically. Add your comic to the appropriate filename. For example if the
 comic name is "Super duper comic", the new class should be added to
 dosagelib/plugins/s.py.