Bastian Kleineidam
|
c246b41d64
|
Code formatting.
|
2013-04-13 08:00:11 +02:00 |
|
Bastian Kleineidam
|
35c031ca81
|
Fixed some comics.
|
2013-04-11 18:27:43 +02:00 |
|
Bastian Kleineidam
|
190ffcd390
|
Use str() for robotparser.
|
2013-04-09 19:36:00 +02:00 |
|
Bastian Kleineidam
|
b9dc385ff2
|
Implemented voting
|
2013-04-09 19:33:50 +02:00 |
|
Bastian Kleineidam
|
4528281ddd
|
Voting part 2
|
2013-04-08 21:20:01 +02:00 |
|
Bastian Kleineidam
|
781bac0ca2
|
Feed text content instead of binary to robots.txt parser.
|
2013-04-07 18:11:29 +02:00 |
|
Bastian Kleineidam
|
0fbc005377
|
A Python3 fix.
|
2013-04-05 18:57:44 +02:00 |
|
Bastian Kleineidam
|
97522bc5ae
|
Use tuples rather than lists.
|
2013-04-05 18:55:19 +02:00 |
|
Bastian Kleineidam
|
adb31d84af
|
Use HTMLParser.unescape instead of rolling our own function.
|
2013-04-05 18:53:19 +02:00 |
|
Bastian Kleineidam
|
6aa588860d
|
Code cleanup
|
2013-04-05 06:36:05 +02:00 |
|
Bastian Kleineidam
|
460c5be689
|
Add POST support to urlopen().
|
2013-04-04 18:30:02 +02:00 |
|
Bastian Kleineidam
|
0054ebfe0b
|
Some Python3 fixes.
|
2013-04-03 20:32:43 +02:00 |
|
Bastian Kleineidam
|
2c0ca04882
|
Fix warning for scrapers with multiple image patterns.
|
2013-04-03 20:32:19 +02:00 |
|
Bastian Kleineidam
|
110a67cda4
|
Retry failed page content downloads (eg. timeouts).
|
2013-03-25 19:49:09 +01:00 |
|
Bastian Kleineidam
|
43f20270d0
|
Allow a list of regular expressions for image and previous link search.
|
2013-03-12 20:48:26 +01:00 |
|
Bastian Kleineidam
|
88e28f3923
|
Fix some comics and add language tag.
|
2013-03-08 22:33:05 +01:00 |
|
Bastian Kleineidam
|
c13aa323d8
|
Code cleanup [ci skip]
|
2013-03-04 21:44:26 +01:00 |
|
Bastian Kleineidam
|
41c954b309
|
Another try on URL quoting.
|
2013-02-23 09:08:08 +01:00 |
|
Bastian Kleineidam
|
d0c3492cc7
|
Catch robots.txt errors.
|
2013-02-21 19:48:04 +01:00 |
|
Bastian Kleineidam
|
be1694592e
|
Do not stream page content URLs.
|
2013-02-18 20:38:59 +01:00 |
|
Bastian Kleineidam
|
96bf9ef523
|
Recognize internal server errors.
|
2013-02-13 17:54:10 +01:00 |
|
Bastian Kleineidam
|
f16e860f1e
|
Only cache robots.txt URL on memoize.
|
2013-02-13 17:52:07 +01:00 |
|
Bastian Kleineidam
|
10f6a1caa1
|
Correct path quoting.
|
2013-02-12 17:55:33 +01:00 |
|
Bastian Kleineidam
|
6d0fffd825
|
Always use connection pooling.
|
2013-02-12 17:55:13 +01:00 |
|
Bastian Kleineidam
|
a35c54525d
|
Work around a bug in python requests.
|
2013-02-11 19:52:59 +01:00 |
|
Bastian Kleineidam
|
14f0a6fe78
|
Do not prefetch content with requests >= 1.0
|
2013-02-11 19:45:21 +01:00 |
|
Bastian Kleineidam
|
67836942d8
|
Simplify the fetchUrl code.
|
2013-02-11 19:43:46 +01:00 |
|
Bastian Kleineidam
|
1a0cd1ee6b
|
Print HTTP client headers.
|
2013-02-07 18:28:56 +01:00 |
|
Bastian Kleineidam
|
73700e66f0
|
Cleanup
|
2013-01-24 21:42:27 +01:00 |
|
Bastian Kleineidam
|
f1356a9ff8
|
Fix URL norming, See issue #2.
|
2013-01-23 21:16:22 +01:00 |
|
Bastian Kleineidam
|
5479627d86
|
Updated copyright.
|
2013-01-09 22:21:19 +01:00 |
|
Bastian Kleineidam
|
6a2f57b132
|
Support requests module >= 1.0
|
2012-12-19 20:43:18 +01:00 |
|
Bastian Kleineidam
|
e5a04931d3
|
Various fixes and additions.
|
2012-12-12 17:41:29 +01:00 |
|
Bastian Kleineidam
|
4def4b81bd
|
Add cookie feature.
|
2012-12-08 21:30:23 +01:00 |
|
Bastian Kleineidam
|
faba7b0bca
|
Fix more comics.
|
2012-12-08 00:45:18 +01:00 |
|
Bastian Kleineidam
|
e5d9002f09
|
Fix more comics.
|
2012-12-05 21:52:52 +01:00 |
|
Bastian Kleineidam
|
387dff79a9
|
Fix comics.
|
2012-12-04 07:02:40 +01:00 |
|
Bastian Kleineidam
|
45df462a47
|
Fix some comics.
|
2012-12-02 18:35:06 +01:00 |
|
Bastian Kleineidam
|
451fd982d9
|
Add comic scripts, add fixes and other stuff.
|
2012-11-28 18:15:12 +01:00 |
|
Bastian Kleineidam
|
0556ffd30a
|
Fix comics, improve tests, use python-requests.
|
2012-11-26 18:44:31 +01:00 |
|
Bastian Kleineidam
|
d4eee7719d
|
Dynamic type generation helpers.
|
2012-11-26 07:14:02 +01:00 |
|
Bastian Kleineidam
|
958a788550
|
Fix some comics.
|
2012-11-21 21:57:26 +01:00 |
|
Bastian Kleineidam
|
54eaadf4fc
|
Updated documentation and fix some comics.
|
2012-11-20 18:53:53 +01:00 |
|
Bastian Kleineidam
|
7e39b291dc
|
Fix some comics
|
2012-11-14 20:23:30 +01:00 |
|
Bastian Kleineidam
|
b3e51ddc93
|
Simplify tagre regex.
|
2012-10-12 21:47:41 +02:00 |
|
Bastian Kleineidam
|
9c032c9006
|
Match before and after a tag.
|
2012-10-12 21:11:44 +02:00 |
|
Bastian Kleineidam
|
da2b13822d
|
Remove stray print statement.
|
2012-10-11 19:58:10 +02:00 |
|
Bastian Kleineidam
|
78f44e9d9c
|
Improve URL retrieval.
|
2012-10-11 19:53:10 +02:00 |
|
Bastian Kleineidam
|
c0ad053647
|
Prevent empty URL matching.
|
2012-10-11 18:16:29 +02:00 |
|
Bastian Kleineidam
|
979c97901b
|
Fix tagre tests.
|
2012-10-11 17:02:40 +02:00 |
|