|
|
Log in / Subscribe / Register

Scheduling influences on simulation outcomes?

Scheduling influences on simulation outcomes?

Posted Jun 5, 2025 10:37 UTC (Thu) by taladar (subscriber, #68407)
Parent article: The importance of free software to science

Wouldn't the simulation programs need to take some additional precautions for full reproducibility of the results similar to the way compilers do for compiling reproducible code or games do for reproducible seed-based procedural generation?

I am thinking of things like not using shared PRNGs from multiple threads where the scheduling order might then give each thread different parts of its (otherwise deterministic) output depending on which thread is scheduled first.

But beyond that some auto-configuration of the program might also be a problem, e.g. detecting the RAM size or CPU core count and scaling operations by that by spawning more threads or processing larger batches at a time.


to post comments

Forwards-compatibility is hard

Posted Jun 5, 2025 11:20 UTC (Thu) by farnz (subscriber, #17727) [Link]

You also get into having to think about what might be different in the future; for example, I've seen problems with an algorithm in the late 1990s that assumed x87 FPUs and was reproducible on all x86 CPUs of the era, but not on future x86 family CPUs (using SSE2 instead of x87), or on MIPS CPUs.

And then there's things like using 32 bit counters because you can't overflow them in reasonable time; this is true when things are slow enough, but as they get faster, it can become false. For example, an Ethernet packet is a minimum of 672 bit times on the wire; in 1995, a 32 bit packet counter represented over 8 hours of packets at the maximum standardised rate. However, today's maximum standardised rate (from 2024) is 800 Gbit/s, or overflow in a bit over 3.6 seconds.

The best we can reasonably ask for is that it's possible to follow your documentation and reproduce your results - that can include documenting the hardware, OS and other details of the system you produced the result on. That way, it becomes possible for a future reimplementation of your algorithm to do the A/B comparison between your system, and their new system, even if they've had to get help from a museum to build up the required hardware to reproduce your results.

Scheduling influences on simulation outcomes?

Posted Jun 5, 2025 13:01 UTC (Thu) by fenncruz (subscriber, #81417) [Link]

Yes these are problems that can be solved if people care enough. It takes work and dedication to get it working with an existing codebase, and to keep it that way as people add new code (ask me how I know :) ).

On your last point you actually want fixed sized batches, for reproducabilitly. Then distribute each batch to a thread as the thread becomes free. That way you always do the same ordering of your floating point numbers ( (a+b)+c /= a +(b+c) in floating point maths). Think about summing elements in an array broken into chunks per thread, naively more threads would mean more intermediate values that need to get get summed up. With fixed sized blocks it doesn't matter whether someone runs your code with 1 thread or 100, the number of intermediate values is the same. So you get the same answer when the intermediate values get added up at the end.

On random numbers you would need each thread to have its own stream, plus someway to initialise each block of work to its own seed (not thread, again as the number of threads might vary between users).

Scheduling influences on simulation outcomes?

Posted Jun 5, 2025 16:46 UTC (Thu) by fraetor (subscriber, #161147) [Link]

In HPC land you often have a slightly different way of thinking.

As supercomputers are expensive you are often thinking in terms of The Computer rather than a computer. Porting to a different machine is usually a significant task, and to maintain maximum performance you are often using lots of non-portable tricks that will need adjusting. Things like memory size, core count, or core binding are often configured relatively statically.

Because of this a supercomputer port is usually accompanied by an evaluation phase where subject matter experts look at the outputs and decide if it is close enough.

Once you have completed a port then Known Good Outputs (KGOs) are very commonly used within that single machine, to ensure any output change, even to the most insignificant of bits, is able to be explained.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds