A modified version of https://github.com/webcomics/dosage
8768ff07b6
HTML character encoding in the context of HTTP is quite tricky to get right and honestly, I'm not sure if I did get it right this time. But I think, the current behaviour matches best what web browsers try to do: 1. Let Requests figure out the content from the HTTP header. This overrides everything else. We need to "trick" LXML to accept our decision if the document contains an XML declaration which might disagree with the HTTP header. 2. If the HTTP headers don't specify any encoding, let LXML guess the encoding and be done with it. |
||
---|---|---|
doc | ||
dosagelib | ||
scripts | ||
tests | ||
.codeclimate.yml | ||
.gitattributes | ||
.gitignore | ||
.mailmap | ||
.travis.yml | ||
COPYING | ||
dosage | ||
MANIFEST.in | ||
README.md | ||
requirements.txt | ||
setup.cfg | ||
setup.py | ||
tox.ini |
Dosage
Dosage is a comic strip downloader and archiver.
Dosage is designed to keep a local copy of specific webcomics and other picture-based content such as Picture of the Day sites. With the dosage commandline script you can get the latest strip of a webcomic, or catch-up to the last strip downloaded, or download a strip for a particular date/index (if the webcomic's site layout allows this).
Multiple webcomics can be downloaded in parallel, making the update of comic strips faster.
See http://dosage.rocks/ for more info.