Fedora and Python 2

Posted Apr 8, 2018 21:27 UTC (Sun) by mjblenner (subscriber, #53463)
In reply to: Fedora and Python 2 by smurf
Parent article: Fedora and Python 2

That example code:

>>> t = type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ctypes.CFUNCTYPE(ctypes.c_uint32))]})

isn't really a bug. The 'c_string_symbol' is the python-side handle to the C structure field. In python3 it needs to be unicode (i.e like the python source file), since you do something like

>>> s = t(some_dll[b'c_string_symbol']) # bytes used here to get the C function symbol
>>> s.c_string_symbol()

Fedora and Python 2

Posted Apr 8, 2018 22:41 UTC (Sun) by smurf (subscriber, #17840) [Link] (9 responses)

This does mean that symbols which aren't well-formed UTF8 are inaccessible. There should be a way to get around this restriction, even if the only tools that actually generate those beasts are code obfuscators (if that).

On the other hand: Python is written in UTF-8 (duh) and Python's way to access symbols is by using attributes (also duh). Requiring code to cater to corner cases that don't actually occur in the real world is a surefire recipe for code bloat but doesn't help anybody.

Fedora and Python 2

Posted Apr 8, 2018 23:00 UTC (Sun) by mjblenner (subscriber, #53463) [Link] (8 responses)

> This does mean that symbols which aren't well-formed UTF8 are inaccessible.

Uh, no. The bit that gets the symbols from the dll is using bytes. This bit:

some_dll[b'c_function_name']

You can't refer to it in python by the same random bytes though (why would that matter?).

Fedora and Python 2

Posted Apr 11, 2018 22:06 UTC (Wed) by togga (guest, #53103) [Link] (7 responses)

"You can't refer to it in python by the same random bytes though (why would that matter?)."

Because scripting is mostly about automation. It's just broken in this case to convert these symbols to another representation. This is just one example of many in this theme since many of the libraries and third party extensions needs string representation.

In a glue language scenario (which has been a strong side of Python), if I want to grab symbol (or blob) X from system A, handle it and pass it to system B, I do not want to have Y=f(X) as intermediate representation and then do the inverse function before passing it to B.

Simple things such as reading from an UART or a socket has become a mine-field as soon as you want to use something in Python dealing with strings. Especially annoying when developing and things might not be that clean, nice and tidy.

This whole problem-domain is something Python3 has created. For me, Python itself lacks both in performance and multi-threading and doesn't really have anything language-wise (apart from lots of third party libraries) to compensate for this loss of productivity.

Fedora and Python 2

Posted Apr 11, 2018 23:52 UTC (Wed) by smurf (subscriber, #17840) [Link] (1 responses)

> This whole problem-domain is something Python3 has created.

Well, all I can say is that my experience (both with Perl5 and Python2) is rather different. Our fight with Perl5, incrementally switching our corporate code base to UTF8 compatibility, was … ugly.

Thus I'm very happy about the fact that Python3 spews a large unfriendly stack dump to your terminal when I forget to specify how an external byte stream is encoded. While it's somewhat annoying when you JUST KNOW that all your data is UTF8, or latin1, or randombytes … things change, and when it "suddenly" isn't, you get mojibake. Or worse. No thanks.

Fedora and Python 2

Posted Apr 12, 2018 16:29 UTC (Thu) by togga (guest, #53103) [Link]

I use Python as a script language in the sense of a glue language that adapts to the world, you seem to have the ambition to change the world to adapt to Python. Python and UTF8, is not THAT good :-) For me Python is rather quite old, bloated and tired, and I don't even want to get started with UTF8 as some sort of universal data representation...

The latter sounds to me like utopia and an endless job for achieving nothing but explains lots of the attitudes of the Python community.

Fedora and Python 2

Posted Apr 12, 2018 1:12 UTC (Thu) by mjblenner (subscriber, #53463) [Link] (1 responses)

> It's just broken in this case to convert these symbols to another representation.

OK. I kind of get where you're coming from. Although I'm a bit confused. Or you're a bit confused.

ctypes is an ABI interface, so having the structure field be a different name to the function symbol is of no relevance for using python to glue various other C functions together (even when passing that structure around).

i.e. here:

> type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ...

Anyway, the easy answer is to just use python strings there. ctypes function symbol lookup converts strings to utf-8, so 99%+ of the time, this will work.

Otherwise...

Decode the symbol name to a string with errors='surrogateescape' for use in python, and use the same error handler to decode back to the original bytes for getting the symbol out of the library.

Or you could add a layer of indirection between the structure field names and the function symbols.

Fedora and Python 2

Posted Apr 12, 2018 16:36 UTC (Thu) by togga (guest, #53103) [Link]

> "Or you're a bit confused."

I'm not confused, I'm just experienced lots of issues I didn't have before Python3's software castle in the air.

> "the easy answer is to just use python strings"

Isn't this kind of bloated. These strings can come from anywhere and might not even be visible in Python code at all.

> "Decode the symbol name to a string with errors='surrogateescape' for use in python, and use the same error handler to decode back to the original bytes for getting the symbol out of the library."
> "Or you could add a layer of indirection between the structure field names and the function symbols."

You mean use Python3 and stick with tons of workarounds and issues just for it's sake? Change the whole world to Python3? I value my time much more than that.

Fedora and Python 2

Posted Apr 12, 2018 23:47 UTC (Thu) by dvdeug (guest, #10998) [Link] (2 responses)

I'm not feeling it here. Having a text file in multiple encodings is incredibly fragile and a pain to work with. If you have to deal with external, non-ASCII symbols in your program, you're going to want to change the names to something you can work with in Python, not a random set of bits. If you're automatically generating code and don't care about the Python symbol names, then encoding them using base64 is trivial (and again, you don't care about the Python symbol names so why do you care if they're line noise or encoded line noise?)

If you're just passing something from system A to system B, you shouldn't have to change the data. But there's a fairly thin region where you can choose to not unmangle something and still expect to be able to do anything with it. Stuff not being clean, nice and tidy is all the more reason to make sure you know exactly how the data you're handling is formatted.

Fedora and Python 2

Posted Apr 15, 2018 15:13 UTC (Sun) by togga (guest, #53103) [Link] (1 responses)

> "then encoding them using base64 is trivial (and again, you don't care about the Python symbol names so why do you care if they're line noise or encoded line noise?)"

1. Doesnt scale. Changing representation requires one additional pass over the data. Python is already slow to begin with.
2. Accessing human readable symbols is convenient when needed by scripts, tests or debug.

Fedora and Python 2

Posted Apr 15, 2018 20:53 UTC (Sun) by dvdeug (guest, #10998) [Link]

It adds time linear in the amount of text being processed. Since it only needs to touch text being processed, and processing it already takes at least time linear in the amount of text being processed, it does not change the Big O of your operation at all. It scales by definition. I suspect even in Python it will always be trivial in the amount of time it takes, but the issue is certainly not whether it scales.

We're not talking about human readable symbols; we're talking about "non-ASCII symbols in your program" that aren't Unicode. Even if the editor mangles it for you, how is bsymFFEAA9 worse than \xff\xea\xa9? Something slightly smarter than base64 would preserve human readable names and only mangle unreadable names, but the only case where not worrying about mangling is going to cause problems in Python 3 is when it's not human readable.