LWN.net Logo

What is to replace Perl then?

What is to replace Perl then?

Posted Dec 3, 2008 23:06 UTC (Wed) by drag (subscriber, #31333)
In reply to: What is to replace Perl then? by jwb
Parent article: On the future of Perl 5

You could be right. I don't know about the relative maturity of either language's libraries of modules.

For parsing urls though it's simple. From within ipython shell:

In [1]: from urlparse import urlparse

In [2]: url = urlparse('http://lwn.net/Articles/309375/')

In [3]: print url.hostname
lwn.net

In [4]: print url.path
/Articles/309375/

If specified in the string it'll do port, username, and passwords and whatnot.


(Log in to post comments)

What is to replace Perl then?

Posted Dec 3, 2008 23:38 UTC (Wed) by jwb (guest, #15467) [Link]

That wasn't exactly what I was referring to:
Python 2.5.2 (r252:60911, Oct  5 2008, 19:24:49) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib2 import urlopen
>>> urlopen('http://user:password@google.com/')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.5/urllib2.py", line 381, in open
    response = self._open(req, data)
  File "/usr/lib/python2.5/urllib2.py", line 399, in _open
    '_open', req)
  File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.5/urllib2.py", line 1107, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.5/urllib2.py", line 1064, in do_open
    h = http_class(host) # will parse host:port
  File "/usr/lib/python2.5/httplib.py", line 639, in __init__
    self._set_hostport(host, port)
  File "/usr/lib/python2.5/httplib.py", line 651, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: 'password@google.com'
urllib2 can't understand a URL when the authority contains a colon. The fact that there exists, quite separately, urlparse module actually reinforces my point. The tuple returned by urlparse is completely useless anywhere else in the standard library. urlopen won't accept it. urlopen takes either a string or an instance of urllib2.Request object.

In Perl the situation is quite satisfactory. The URI module exists and works, and work harmoniously with HTTP::Message and its descendants, which in turn work harmoniously with LWP and WWW::Mechanize and so forth.

Considering python's age and the fact that it has developed coincidentally with the web, you would think that python's web support would be quite mature by now, but it isn't. python's support for basic web operations in quite bad.

What is to replace Perl then?

Posted Dec 4, 2008 0:15 UTC (Thu) by sbergman27 (guest, #10767) [Link]

"""
That wasn't exactly what I was referring to
"""

You mean when you said: "You can do anything you want in python, except the incredibly simple things like parsing URLs."? Yeah, I can see where you might really have meant you have a trivial quibble with the syntax.

Python URL parsing 101, in case anyone is interested:

---
from urlparse import urlparse

> o = urlparse('http://user:password@google.com:8080/')
> o.scheme
'http'

> o.netloc
'user:password@google.com:8080'

> o.path
'/'

> o.username
'user'

> o.password
'password'

> o.hostname
'google.com'

> o.port
8080
---

What is to replace Perl then?

Posted Dec 4, 2008 0:17 UTC (Thu) by jwb (guest, #15467) [Link]

That's great, now what are you going to do with that tuple? You can't feed it to urllib2. urllib2 wants the URL as a string, but it can't parse all the ones that urlparse can parse. See the problem?

What is to replace Perl then?

Posted Dec 4, 2008 2:40 UTC (Thu) by drag (subscriber, #31333) [Link]

Ya. I would consider that a bug.

What is to replace Perl then?

Posted Dec 4, 2008 3:42 UTC (Thu) by sbergman27 (guest, #10767) [Link]

If so, it doesn't seem to be one that many Python users care about. I spend a lot of time in Python web development communities (Django, TurboGears) and its not something I hear complaints about.

urllib does support this directly:

---
from urllib import urlopen
> x = urlopen('http://myuser:mypasswd@google.com/')
> len(x.read())
5856
---

There have, however, been a couple of request nibbles on the issue tracker over the last 4 years or so to add the functionality to urllib2 as well, and no actual opposition to it. Interestingly, there was activity today from a dev saying he was implementing it, noting that it would be trivial to do.

Personally, I think its probably a good idea, but since its just a few lines to handle this case, it doesn't really bother me.

Like I said, this seems something of a cherry-picked example to "prove" that Python's url handling is "not mature". I'm sure that Python and Ruby folks could pick more than a few cherries regarding Perl's problems if they wanted to.

What is to replace Perl then?

Posted Dec 4, 2008 1:49 UTC (Thu) by jamesh (guest, #1159) [Link]

It probably isn't very helpful to you, but the RFC for HTTP URLs doesn't actually allow for putting passwords in the URL.

The urllib2 module can perform authentication though, as described in http://docs.python.org/library/urllib2.html#examples

What is to replace Perl then?

Posted Dec 4, 2008 2:27 UTC (Thu) by jwb (guest, #15467) [Link]

That is interesting, I hadn't noticed before that HTTP RFC specifies 'host' instead of 'authority', but you're right it's of little use because the problem is that such URLs are found in the wild and when you're building a crawler you pretty much have to handle them.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds