That is interesting, I hadn't noticed before that HTTP RFC specifies 'host' instead of 'authority', but you're right it's of little use because the problem is that such URLs are found in the wild and when you're building a crawler you pretty much have to handle them.