Commit graph

107 commits

Author SHA1 Message Date
Bastian Kleineidam
781bac0ca2 Feed text content instead of binary to robots.txt parser. 2013-04-07 18:11:29 +02:00
Bastian Kleineidam
0fbc005377 A Python3 fix. 2013-04-05 18:57:44 +02:00
Bastian Kleineidam
97522bc5ae Use tuples rather than lists. 2013-04-05 18:55:19 +02:00
Bastian Kleineidam
adb31d84af Use HTMLParser.unescape instead of rolling our own function. 2013-04-05 18:53:19 +02:00
Bastian Kleineidam
6aa588860d Code cleanup 2013-04-05 06:36:05 +02:00
Bastian Kleineidam
460c5be689 Add POST support to urlopen(). 2013-04-04 18:30:02 +02:00
Bastian Kleineidam
0054ebfe0b Some Python3 fixes. 2013-04-03 20:32:43 +02:00
Bastian Kleineidam
2c0ca04882 Fix warning for scrapers with multiple image patterns. 2013-04-03 20:32:19 +02:00
Bastian Kleineidam
110a67cda4 Retry failed page content downloads (eg. timeouts). 2013-03-25 19:49:09 +01:00
Bastian Kleineidam
43f20270d0 Allow a list of regular expressions for image and previous link search. 2013-03-12 20:48:26 +01:00
Bastian Kleineidam
88e28f3923 Fix some comics and add language tag. 2013-03-08 22:33:05 +01:00
Bastian Kleineidam
c13aa323d8 Code cleanup [ci skip] 2013-03-04 21:44:26 +01:00
Bastian Kleineidam
41c954b309 Another try on URL quoting. 2013-02-23 09:08:08 +01:00
Bastian Kleineidam
d0c3492cc7 Catch robots.txt errors. 2013-02-21 19:48:04 +01:00
Bastian Kleineidam
be1694592e Do not stream page content URLs. 2013-02-18 20:38:59 +01:00
Bastian Kleineidam
96bf9ef523 Recognize internal server errors. 2013-02-13 17:54:10 +01:00
Bastian Kleineidam
f16e860f1e Only cache robots.txt URL on memoize. 2013-02-13 17:52:07 +01:00
Bastian Kleineidam
10f6a1caa1 Correct path quoting. 2013-02-12 17:55:33 +01:00
Bastian Kleineidam
6d0fffd825 Always use connection pooling. 2013-02-12 17:55:13 +01:00
Bastian Kleineidam
a35c54525d Work around a bug in python requests. 2013-02-11 19:52:59 +01:00
Bastian Kleineidam
14f0a6fe78 Do not prefetch content with requests >= 1.0 2013-02-11 19:45:21 +01:00
Bastian Kleineidam
67836942d8 Simplify the fetchUrl code. 2013-02-11 19:43:46 +01:00
Bastian Kleineidam
1a0cd1ee6b Print HTTP client headers. 2013-02-07 18:28:56 +01:00
Bastian Kleineidam
73700e66f0 Cleanup 2013-01-24 21:42:27 +01:00
Bastian Kleineidam
f1356a9ff8 Fix URL norming, See issue #2. 2013-01-23 21:16:22 +01:00
Bastian Kleineidam
5479627d86 Updated copyright. 2013-01-09 22:21:19 +01:00
Bastian Kleineidam
6a2f57b132 Support requests module >= 1.0 2012-12-19 20:43:18 +01:00
Bastian Kleineidam
e5a04931d3 Various fixes and additions. 2012-12-12 17:41:29 +01:00
Bastian Kleineidam
4def4b81bd Add cookie feature. 2012-12-08 21:30:23 +01:00
Bastian Kleineidam
faba7b0bca Fix more comics. 2012-12-08 00:45:18 +01:00
Bastian Kleineidam
e5d9002f09 Fix more comics. 2012-12-05 21:52:52 +01:00
Bastian Kleineidam
387dff79a9 Fix comics. 2012-12-04 07:02:40 +01:00
Bastian Kleineidam
45df462a47 Fix some comics. 2012-12-02 18:35:06 +01:00
Bastian Kleineidam
451fd982d9 Add comic scripts, add fixes and other stuff. 2012-11-28 18:15:12 +01:00
Bastian Kleineidam
0556ffd30a Fix comics, improve tests, use python-requests. 2012-11-26 18:44:31 +01:00
Bastian Kleineidam
d4eee7719d Dynamic type generation helpers. 2012-11-26 07:14:02 +01:00
Bastian Kleineidam
958a788550 Fix some comics. 2012-11-21 21:57:26 +01:00
Bastian Kleineidam
54eaadf4fc Updated documentation and fix some comics. 2012-11-20 18:53:53 +01:00
Bastian Kleineidam
7e39b291dc Fix some comics 2012-11-14 20:23:30 +01:00
Bastian Kleineidam
b3e51ddc93 Simplify tagre regex. 2012-10-12 21:47:41 +02:00
Bastian Kleineidam
9c032c9006 Match before and after a tag. 2012-10-12 21:11:44 +02:00
Bastian Kleineidam
da2b13822d Remove stray print statement. 2012-10-11 19:58:10 +02:00
Bastian Kleineidam
78f44e9d9c Improve URL retrieval. 2012-10-11 19:53:10 +02:00
Bastian Kleineidam
c0ad053647 Prevent empty URL matching. 2012-10-11 18:16:29 +02:00
Bastian Kleineidam
979c97901b Fix tagre tests. 2012-10-11 17:02:40 +02:00
Bastian Kleineidam
17a40d4fda Make tagre quote configurable. 2012-10-11 15:43:29 +02:00
Bastian Kleineidam
9d30a7004e Only warn about missing images. 2012-10-11 15:17:08 +02:00
Bastian Kleineidam
c707aa893d A lot of refactoring. 2012-10-11 12:03:12 +02:00
Bastian Kleineidam
c1dc5892c8 Only import colorama on windows systems. 2012-10-01 18:01:56 +02:00
Bastian Kleineidam
a53e1f63bc Improve console size guessing. 2012-09-27 21:59:11 +02:00
Bastian Kleineidam
f3365f6a5e Code cleanup. 2012-09-27 21:24:28 +02:00
Bastian Kleineidam
1333be7225 HTTP improvements. 2012-09-26 16:52:45 +02:00
Bastian Kleineidam
cc2a8df98f Document some functions. 2012-09-26 16:47:39 +02:00
Bastian Kleineidam
58c4cffcc8 Match end bracket in tagre function. 2012-09-26 14:42:05 +02:00
Bastian Kleineidam
a17782428b Updated copyright for all source files. 2012-06-20 22:41:04 +02:00
Bastian Kleineidam
c9082aee42 Improved terminal functions. 2012-06-20 22:33:26 +02:00
Bastian Kleineidam
f91fb80a39 Initial commit to Github. 2012-06-20 21:58:13 +02:00