Python 3, ASCII, and UTF-8

Posted Dec 21, 2017 7:42 UTC (Thu) by togga (guest, #53103)
In reply to: Python 3, ASCII, and UTF-8 by peter-b
Parent article: Python 3, ASCII, and UTF-8

The issue here which the (somewhat smaller) "python3 community" misses is that this "obstinacy" is measured in productivity, maintenance and development hours. I'd say it's up there along with .NET interoperability, window "\" path/quoting issues for multiplatform scripts. Man made problems, detached from reality, popping up from thin air.

Python is going from "productivity" to "cumbersome" measured in development hours on a day-to-day basis and has now a hard time to compensate for all it's (other) weaknesses (multithreading with GIL, performance, ...).

How much lipstick can you paint the pig with? "Coercing", "C.UTF-8", "new UTF-8", "PYTHONUTF", "-x utf8", ... My favourite is "surrogateescape error handler".

You can either be this and that type of script with different settings. Best for all?
What if I need to be both or the most common a case not exactly matching any of them. This mess is getting so complex that for a pro it's almost impossible to get full control and newbies never even knew what hit them when they're floored by python3 (I've helped a few...). For example the simple case with ctypes attributes, often read from an external source, that needs to be string type. The text mess is a nasty swamp to wade through.

I agree with Cyberax. Keep it simple (bytestrings) and let the user do explicit things with the data with library functions and classes (maybe include a more advanced regex) when text functionality really needed.

That said. Python2 gave me 14 pretty good years of scripting productivity (mostly attributed to numpy). With this, it's time to move on.