Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for November 27, 2013
ACPI for ARM?
LWN.net Weekly Edition for November 21, 2013
GNU virtual private Ethernet
Device trees II: The harder parts
Are these polling loops or perhaps code that fails if something holds a mutex for more than a second?
Do these clowns seriously put such crap in production software?
The leap second bug
Posted Jul 2, 2012 20:04 UTC (Mon) by clugstj (subscriber, #4020)
Posted Jul 3, 2012 7:41 UTC (Tue) by farnz (guest, #17727)
As an example from a codebase I worked on (under VxWorks, not Linux, but the idea carries over) - you want a decoder thread to run immediately after the graphics thread has finished rendering a frame to screen, to repopulate the queue of decoded frames (if needed due to timestamps being missed by the graphics thread). Your graphics thread is set up to sync to vblank, and you know the expected frame rate; you don't want the decoder thread to stall completely if the graphics stalls due to complex rendering.
You know that in your environment, the decoder thread keeps a minimum of 5 decoded frames queued and ready to go; that's 80 milliseconds. You expect the graphics thread to wake the decoder up every 16 milliseconds, but know that if it doesn't, you can wait up to 50 milliseconds before you urgently need to catch up to ensure that when the graphics thread recovers, it has current data to display. You therefore set the timeout on the mutex sleep decoder-side to 50 milliseconds, knowing that if the graphics thread stalls for whatever reason, you will keep the queue filled with current frames.
Posted Jul 3, 2012 19:34 UTC (Tue) by slashdot (guest, #22014)
One would expect the decoder to read from the network (sleeping if no data is available) and decode frames, putting them in a shared queue, while discarding any that have a "too old" timestamp when a new one is queued (and/or discarding to keep the queue smaller than a fixed limit), and waking up the graphics thread on empty -> non-empty transition.
The graphics thread simply removes the oldest one from the queue (or waits for one to appear if the queue is empty), waits for vblank, and displays it.
No idea why you would want to hold mutexes for milliseconds (which is generally not ideal in any case), or sleep for any time interval, instead of just waiting for network I/O, for vblank, and for the decoding queue being non-empty.
Posted Jul 4, 2012 9:20 UTC (Wed) by farnz (guest, #17727)
The device in question did encoded video stream conformance checking. We had a total of four interesting threads; the statistics gathering thread, the automatic reaction thread, the decoder thread, and the render thread, and three input feeds.
The statistics gathering thread gathers interesting information about the stream, and stores it for the automatic reaction thread and the render thread to use. This can take up to 4ms per frame per input, depending on the instantaneous complexity of the inputs (it does partial decode of video and audio to approximate some interesting measures of the video and audio), but normally takes about 1ms per input.
The automatic reaction thread applies a set of business rules against the statistics, and can trigger external systems to react to an out-of-bounds condition, plus indicates to the render thread that it should show an alarm state to any human user. This takes no more than 1ms with the most complicated rules permitted.
The render thread takes 1ms to render the statistics and any alarm and a further 4ms to update the video box.
The decode thread takes up to 4ms to decode each frame of a selected input. As decoded video is only for presentation to a user, it is considered low importance.
When you add it up, the statistics take 12ms. The reaction thread gets us to 13ms. The render thread needs 1ms if not showing video, for 14ms total, or 5ms if showing video, for 18ms total. The decode thread can add another 4ms to that (22ms total), and our deadline is 16.66ms per frame. We are 6ms over budget for a single frame, in the worst case.
We took this to our product managers, and were told that as long as the automatic reactions happen every frame, we would be OK if the UI was late (render and decode threads), but that they'd want to see at least 1 in every 4 frames of video. This was because the automation was expected to run 24/7 as a set-and-forget system, but the UI would be something only used by some customers at critical times, and could be slow to update.
We handled this by making the reaction thread highest possible priority; the stats thread is the next priority down, as it's more important to have the automated stuff happening than it is to keep the user updated (we expected most people to treat the product as set-and-forget). The decode thread is the next highest priority, as we want to complete a frame decode once we've started it, so that there is some video to display - we don't want to be unable to ever decode a frame in time to display it. The render thread runs at a low priority. The mutex then permits the render thread to release the decode thread when the render thread has enough time left until its next deadline that it should be safe to decode a frame; if the render thread doesn't reach this point in 3 frame times, the decode thread will start anyway, and claim the CPU (delaying the render thread, as it's higher priority, and this is all SCHED_FIFO scheduling).
Given the constraints, how would you have implemented it?
Posted Jul 5, 2012 2:35 UTC (Thu) by slashdot (guest, #22014)
This can for instance be accomplished by giving higher priority to the render thread, and having it wait for the decode thread if N frames have been output with less than K decoded video frames.
Alternatively, decide on a fixed rate and simply sequentially decode one frame and then display N stats frames with that same video frame.
The latter is less flexible but might allow tricks like decoding directly to the framebuffer and then XORing each stats overlay on and off, thus never copying decoded video.
Anyway, non-embedded software does not usually have those issues.
Posted Jul 5, 2012 5:20 UTC (Thu) by farnz (guest, #17727)
Fixed rate is out - while we only guaranteed a low frame rate, we also wanted to achieve a high frame rate if possible, as in the common case (main feed to transmitter, received transmission, low bit rate "we are sorry for any inconvenience" feed), we can meet full frame rate.
Your suggested solution has the same problem as a low timeout - in terms of system overhead, it's identical, plus it now needs extra analysis to verify that the decode thread will run often enough. The advantage of a 50ms timeout is that it's obvious that the decode thread will run once every 50ms in the worst case. In general, it's a bad idea to introduce complexity for the maintenance programmer - if it's not obvious, there's a good chance they'll miss it, make an apparently unrelated change, and now the render thread isn't kicking until you've missed 4 frames, instead of waking up after you've missed 3.
And in non-embedded systems, you get small timeouts by calculation - e.g. "wait for the result of the work I've just asked another thread to do, or for the application layer keepalive timeout to expire". If you've already done 239 seconds of work on this request, and the keepalive timer is 4 minutes, the computed time to sleep will be under a second. Adding extra application code to make the timeout sloppy (e.g. send the keepalive early if the remaining is less than 5 seconds) is extra complexity for a rare case that isn't even needed in the absence of kernel/libc bugs (and one of the powerful points of open source is that you can fix kernel/libc bugs if they affect you, instead of having to have everyone work around them).
Posted Jul 4, 2012 2:13 UTC (Wed) by jzbiciak (✭ supporter ✭, #5246)
Posted Jul 9, 2012 23:37 UTC (Mon) by dlang (✭ supporter ✭, #313)
Posted Jul 10, 2012 0:14 UTC (Tue) by jzbiciak (✭ supporter ✭, #5246)
I had to disable various extensions to slay the memory leaks that were driving Firefox up to the 10-12GB range on me regularly. GC would regularly insert 3+ second pauses and run up the soft-fault count. (At least, I assume that's what it was doing... I had soft-fault counts in the billions.)
Posted Jul 10, 2012 0:19 UTC (Tue) by dlang (✭ supporter ✭, #313)
This has been going on for several years, but is getting less common on recent versions (I am now running the Aurora versions everywhere, so my 'recent' may be newer than your 'current' :-)
Posted Jul 10, 2012 0:34 UTC (Tue) by jzbiciak (✭ supporter ✭, #5246)
Posted Jul 10, 2012 1:01 UTC (Tue) by dlang (✭ supporter ✭, #313)
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds