Comparison

As much as we are fond of it, urlgrabber may not be for everyone. There are several options out there for fetching files, each with their strengths and weaknesses. If you're writing a python program, there seem to be three major options: urllib2, urlgrabber, or pycURL. Here, we discuss the various tradeoffs of these tools, and the types of programs that might be best served by each:

urllib2

urllib2 is a pure-python module that comes with the python distribution. It's extremely well-designed and flexible. It is, however, pretty basic. urlgrabber is actually based on urllib2.

pros

urllib2 comes with python, so you don't need to include or require any other dependencies
it's pure python, so if you need to tweek some behavior, you usually can pretty easily. Its great design makes this easier.

cons

it's pure python, so it's a little slower than pycURL
it's a relatively raw inteface to the underlying protocols, so if you need to do anything other than get a file object for a remote file, you're going to be hip-deep in some not-so-simple code

good matches

urllib2 might be a good match for your application if you only need very basic url access or (at the other extreme) you need to do some very crazy protocol-level stuff. Because it is pure python, you're not likely to be limited by some bug or quirk in the implementation, because you can always just sublass around it if you need to.

urlgrabber

urlgrabber is also pure python, with all the associated good and bad that it brings. urlgrabber offers many features over urllib2, but with little of the risk of pycURL; because urlgrabber is pure python and actually based on urllib2, you retain all of urllib2's flexibility and power, but with a much more friendly and featureful interface.

pros

urlgrabber provides most of the common features that applications will need (unlike urllib2).
it's pure python, so it's very easy to modify behavior from top to bottom should you really need to.

cons

it's pure python, so it's slower than pycURL. In fact, it's a layer over urllib2, so it's probably the slowest of the three options. That may be deceptive, though, because if you implement urlgrabber-like features, you'll likely be in the same boat (because that's exactly what we did).

good matches

urlgrabber may be a good choice if you need some fancy features but want to retain the flexibility of pure python.

pycURL

pycURL is a python interface to libcurl, which is a great C library. It supports many crazy features and is very fast.

pros

it's very fast because it's almost all in C.
it supports lots of extreme features like SSL-ed FTP and gopher.

cons

it's a C library, which means you must either require it, or distribute a mixed package
it's a C library, so if it behaves in a way that doesn't work for you, you're kinda SOL. You can hope that the authors change the behavior, you can distribute your own forked version, or you use something like urllib2 or urlgrabber for special cases.

good matches

If maximum speed or exotic features (like gopher) are essential and you're sure you won't need to make fine adjustments, then pycURL may be the way to go.