|
|
Subscribe / Log in / New account

McIntyre: Scanning for assembly code in Free Software packages

On his blog, Steve McIntyre writes about work he has been doing to identify assembly code in Linux packages:

In the Linaro Enterprise Group, my task for the last several weeks was to work through a huge number of packages looking for assembly code. Why? So that we could identify code that would need porting to work well on AArch64, the new 64-bit execution state coming to the ARM world Real Soon Now.

Working with some Ubuntu and Fedora developers, we generated a list of packages included in each distribution that seemed to contain assembly code of some sort. Then I worked through that list, checking to see:

  1. if there was actually any assembly there;
  2. if so, what it was for, and
  3. whether it was actually used

That work resulted in a report with his findings.


to post comments

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 6:58 UTC (Tue) by peter-b (guest, #66996) [Link] (4 responses)

I can stop using assembly code when GCC finally starts providing a 128-bit CAS on 64-bit architectures that support it (e.g. x86-64 CMPXCH16B).

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 13:13 UTC (Tue) by stevem (subscriber, #1512) [Link]

ACK. Have you told the gcc folks this? :-)

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 15:08 UTC (Tue) by stevenb (guest, #11536) [Link] (2 responses)

Ah, you mean GCC 4.7 and later?

/* compile with "-S -O2 -mcx16"
and look for cmpxchg16b in the output */
typedef int TItype __attribute__ ((mode (TI)));

TItype m_128;

void test(TItype x_128)
{
m_128 = __sync_val_compare_and_swap (&m_128, x_128, m_128);
}

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 15:22 UTC (Tue) by stevenb (guest, #11536) [Link] (1 responses)

Actually, at least GCC 4.5 and GCC 4.6 also. cmpxchg16 support was implemented more than 6 years ago: http://gcc.gnu.org/r122884

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 3, 2013 17:01 UTC (Wed) by peter-b (guest, #66996) [Link]

Bizarrely, the GCC on the Ubuntu workstation provided by my university lacked CMPXCHG16B support as recently as a year ago. Fortunately if newer versions do support it my assembly will never get called, so it's not as if I need to change my code...

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 7:57 UTC (Tue) by justincormack (subscriber, #70439) [Link] (1 responses)

Does sound like there is a lot of poorly maintained code around. Perhaps distros are packaging too much?

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 9:37 UTC (Tue) by Company (guest, #57006) [Link]

You do not need to actively maintain things that work well.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 8:43 UTC (Tue) by jcm (subscriber, #18262) [Link] (2 responses)

(disclaimer: I wrote one of the initial tools used to do assembly scanning in Fedora and generated one of the initial lists of packages)

I've called for this to be something that is done on an ongoing basis, by a neutral third party (perhaps Linux Foundation are the umbrella organization). Not just assembly scanning to find upstreams that haven't moved over to generic functions, but overall "adult supervision" of the package set, looking for very outdated packages, assembly code that needs fixing, security issues that are lying in wait, all of that.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 10:32 UTC (Tue) by error27 (subscriber, #8346) [Link]

I understand that it's expensive and hard to justify to management. But it seems like an uncontroversial idea that anyone could do (not just neutral third parties).

Was it controversial?

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 3, 2013 11:32 UTC (Wed) by Company (guest, #57006) [Link]

Is that a good idea?

I mean, we in the GNOME community do look for and remove outdated options all the time and this "adult supervision" of nerd desktops certainly helps a lot, but for source code?

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 10:56 UTC (Tue) by ssam (guest, #46587) [Link] (18 responses)

if its old assembly written on old CPUs, then a simple C version compiled with normal GCC optimization may well be faster on current CPUs.

programmers would probably be better off giving the compiler the hints it needs to write efficient machine code http://locklessinc.com/articles/vectorize/

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 13:19 UTC (Tue) by stevem (subscriber, #1512) [Link] (17 responses)

That's exactly what I'd expect to see for a lot of this code, yes. Typically, hand-written assembly code isn't likely to win much over a decent compiler; if it does then file bugs against the compiler!

There *are* places where assembly wins, and there always will be: if you know your algorithm so well, a general-purpose compiler will struggle to match you. However, it would be good to minimise the spread of such code to enhance maintainability.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 17:32 UTC (Tue) by butlerm (subscriber, #13312) [Link] (16 responses)

There are places where the semantics of the C language are too weak to avoid assembly language. Atomic operations in particular. Performance wise, there are also major issues with the inability of C code to take advantage of the carry bit. Ideally you could write (x + y) >> 32, but on a 32 bit machine you either can't do that or it is very slow.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 17:46 UTC (Tue) by stevem (subscriber, #1512) [Link] (11 responses)

Of course, there are a number of places where C just can't/won't do the job.

Atomics is a good example. BUT: I think there's no excuse for lots and lots of people all using assembly for locking directly in their code, with all the attendant porting and maintenance problems. The compiler should have working builtins for whatever you need here, on any platform the compiler supports. If it doesn't (or they're too slow, or whatever), then that's a bug and it's easily fixable once - not in every program out there.

A lot of the other uses of assembly are similar, from what I've seen in this study. I was shocked to see how many people were using x86 assembly for trivial bitops or byte-swapping.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 18:23 UTC (Tue) by JoeBuck (subscriber, #2330) [Link] (10 responses)

If you have an older package that uses assembly language for performance, it might be worth re-evaluating whether the assembly code still beats the GCC output.

If you think that you need to write assembly language because you need atomic operations, you should first read the GCC manual and learn about the __sync and __atomic builtins (these are also supported by LLVM and Intel's compiler, so you aren't locking yourself into GCC). The compiler will then choose the correct implementation for the target architecture, so your program works on ARM and Sparc even if you don't know the assembly language for those architectures.

There will still be specialized cases where assembly language might help, but it makes the program less portable (unless fallback C/C++ code is provided, and then maybe it makes sense to try to reduce the gap between the C++ and the assembly performance by improving the code or possibly by helping the GCC folks to improve the compiler by providing good bug reports).

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 19:03 UTC (Tue) by Aliasundercover (guest, #69009) [Link] (9 responses)

There is another option. If it ain't broke, don't fix it. Just because you might do something different with today's tools and hardware doesn't mean you should open up old working code spending time creating new bugs. There is time to deal with portability when doing a port.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 20:22 UTC (Tue) by justincormack (subscriber, #70439) [Link] (1 responses)

A lot of this code, judging from the report, is not working. Open source code is expected to work on new architectures and with new C compilers but clearly a lot of this code does not. Upstream does not "do a port" because Debian is or Fedora is doing a port. And if a new version of gcc breaks your code because it assumed old behaviour that "ain't broke" then it is broke. Binary versions might work, but source is what matters here.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 22:55 UTC (Tue) by robert_s (subscriber, #42402) [Link]

I wonder if this is the right place to bring up the last notable case of a distribution package maintainer (no less) "fixing" things that they didn't 100.0% understand in a package. After going for a semi-automated trawl.

(Debian & OpenSSL for those who don't remember)

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 21:15 UTC (Tue) by FranTaylor (guest, #80190) [Link] (6 responses)

Anything written in assembler is clearly "broke" with respect to portability, which is the criterion in question.

To put finer point on it, incomprehensible code that "just works" should be put high up on the list of things to FIX, not "leave alone".

Honestly your "old saw" about "leaving things alone" is just POOR ENGINEERING PRACTICE.

---

Programs must be written for people to read, and only incidentally for machines to execute.

- H. Abelson and G. Sussman (in "The Structure and Interpretation of Computer Programs)

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 21:24 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

who said that this code is "incomprehensible"?

But in any case, if you re-write incomprehensible code, you are almost guaranteed that the result is code that doesn't do the job that the original did, because you don't fully understand the problems that the code is solving.

You probably understand the more obvious problems, but the subtle problems and corner cases will bite you.

That doesn't mean that you should never re-write something, but rather than when you do so, you need to recognize that you aren't going to get it right in the first try, and you need to be sure that the value of having the new code (leaner/faster/better documented/etc) is greater than the effort to re-write the code AND then debug the code after it hits the real world (including whatever damage the bugs can do)

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 3, 2013 3:45 UTC (Wed) by rsidd (subscriber, #2582) [Link]

You are taking "incomprehensible" literally. No code is incomprehensible. But taking assembly code that takes an hour to understand, and replacing it with C code that takes 5 minutes to understand, is a win, especially if you are the maintainer (it may not be worth it if it's an obscure package and you're a distro packager).

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 22:13 UTC (Tue) by Aliasundercover (guest, #69009) [Link] (3 responses)

> Honestly your "old saw" about "leaving things alone" is just POOR ENGINEERING PRACTICE.

There is a reason why software has a reputation for mickey mouse engineering. Even the things that did once work break in the endless update churn. Other fields respect leaving working designs alone until there is a genuine need to change them and time to verify those changes are correct.

Even this field respected leaving working things alone before security paranoia set in. Now we have an endless arms race with the hackers and a new set of patches every time you look away. Only hack resistance is served while all other measures of quality suffer.

Since you liked my last old saw so much I have another for you. There is no such thing as portable software, only software that has been ported.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 2, 2013 22:24 UTC (Tue) by xbobx (subscriber, #51363) [Link] (1 responses)

> > Honestly your "old saw" about "leaving things alone" is just POOR ENGINEERING PRACTICE.

> There is a reason why software has a reputation for mickey mouse engineering.

Both are true. In mechanical or civil engineering, just because a bridge hasn't fallen over yet doesn't mean that it doesn't need to be monitored for flaws and maintained to stay up to code. Then again, a perfectly good concrete bridge doesn't need to be replaced by a fancy new suspension bridge just because suspension bridges are all the rage nowadays.

Engineering is the practice of applying judgement to decide when the current solution is sufficient and can be left alone, or needs refinement and to what extent. Doing either extreme by default is going to bite you.

QotW

Posted Apr 3, 2013 20:56 UTC (Wed) by man_ls (guest, #15091) [Link]

Engineering is the practice of applying judgement to decide when the current solution is sufficient and can be left alone, or needs refinement and to what extent. Doing either extreme by default is going to bite you.
Good Quote of the Week, if you ask me.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 3, 2013 8:57 UTC (Wed) by ssam (guest, #46587) [Link]

the new bugs you get in an update are because some change has unforeseen consequences. this probably happens a lot because software is complex with many interdependent parts, some of them more fragile than you would expect.

so modifying any code is potentially dangerous, and needs to be tested. translating asm to C may introduce a subtle behaviour change. but if the change is in a corner case, its quite possible that it was doing the wrong thing in asm and no one ever noticed.

maybe the asm version is fast because it does not check for alignment, or that something is non-zero (maybe poor examples). maybe when the asm was written all the data was aligned, and x was never zero, but that assumption might not always be true.

so replacing a fragile bit of asm with a robust bit of C might be a very good thing. (not that all asm is fragile, or all c is robust. but i am sure the compiler and static analysis tools can give you much better warnings for the C).

Lack of CarryOut in C

Posted Apr 2, 2013 18:45 UTC (Tue) by jreiser (subscriber, #11027) [Link] (3 responses)

the inability of C code to take advantage of the carry bit

Amen. However, sometimes ((unsigned)(x+y) < (unsigned)x) plus a comment is good enough (courtesy of MIPS, which has no Carry in hardware.)

That still isn't good enough for decoding a big-endian bitstream, which wants both CarryOut and Zero after ((x<<=1)|CarryIn).

Lack of CarryOut in C

Posted Apr 2, 2013 19:24 UTC (Tue) by brunowolff (guest, #71160) [Link] (1 responses)

This risks getting removed during optimization.

Lack of CarryOut in C

Posted Apr 2, 2013 20:14 UTC (Tue) by pbonzini (subscriber, #60935) [Link]

Not for ((unsigned)x+(unsigned)(y) < (unsigned)x). jreiser almost got it right.

Lack of CarryOut in C

Posted Apr 3, 2013 3:28 UTC (Wed) by tterribe (guest, #66972) [Link]

> ((x<<=1)|CarryIn)

Conveniently, x<<=1 can be implemented as x=(unsigned)x+(unsigned)x, which reduces this to a previously solved problem. But honestly if you're decoding a bitstream a bit at a time, there are better optimizations to be done.

McIntyre: Scanning for assembly code in Free Software packages

Posted Apr 4, 2013 0:52 UTC (Thu) by kirkengaard (guest, #15022) [Link]

I just want to raise this gem:
Ten packages contained assembly code for Vax machines; in some cases it was clear that the code in question was first written in Vax assembly then ported to C for those weird, new-fangled Sun workstations (in the 1980)s.


Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds