Popcorn Linux pops up on linux-kernel
Each node in a Popcorn system is a separate Linux host sitting on the network. Popcorn itself is started by loading a kernel module that is charged with connecting the larger system together. The module reads a list of IP addresses (IPv4 only) directly from a file (/etc/popcorn/nodes by default). Each machine will make a TCP connection to every node listed ahead of itself in this file, then wait for an incoming connection from every node listed afterward. Thereafter, each node is known by an integer ID which is simply its position in the nodes file.
There is a hard-coded maximum of 62 nodes. No sort of authentication is
done for incoming node connections, which might seem like a bit of a
security issue; indeed, the patch set warns against running Popcorn on
machines connected to the Internet.
There does not seem to be any provision for nodes going up or down or being
absent entirely.
Comments in the patch set say that the TCP-based communication system
"is intended for Popcorn testing and development purposes
only
", suggesting that, someday, somebody will get around to
implementing something better.
System calls
Unsurprisingly, some new system calls are needed to allow applications to work within the larger Popcorn Linux system. To start with, an application can query the set of available nodes with this new system call:
struct popcorn_node_info { unsigned int status; int arch; int distance; }; int popcorn_get_node_info(int *my_nid, struct popcorn_node_info *nodes, int len);
On return, my_nid will contain the ID of the node the caller is running on, and nodes, an array of len popcorn_node_info structs indexed by node ID, will be filled in with the details of the first len nodes (up to the number that actually exist, of course). The status field in each entry will be zero if the corresponding node is offline, one otherwise. The arch field will describe the node's architecture (POPCORN_ARCH_X86 in the current patch set, since x86 is the only supported architecture), and distance is always zero.
The system call to move the current thread to another node is:
int popcorn_migrate(int node_id, void *uregs);
where node_id identifies the node to which the thread should be moved, and uregs is, for some reason, the contents of the processor registers to be restored when the thread resumes on the new node. The passing of the processor registers separately might be an artifact of a Popcorn feature that is not part of this patch set: the ability to move threads to remote nodes with a different processor architecture. In the posted patch set, threads can only move themselves; the underlying code is written to allow other processes to force a thread to move, though.
While popcorn_migrate() looks like a general facility to move threads around, in practice it seems to be a bit more limited than that. A moved ("remote") thread retains a connection to its "origin" node; indeed, the original thread is still present on the origin node, it is just prevented from executing while the remote thread is running. A remote thread can only be moved back to the origin, so migrating a thread between two remote nodes would be a two-step operation, first moving it back to the origin then to out the new node.
The current execution status of a thread can be had with the last of the new system calls in this patch set:
struct popcorn_thread_status { int current_nid; int peer_nid; pid_t peer_pid; }; int popcorn_get_thread_status(struct popcorn_thread_status *status);
This call will fill in status with the current node ID for the calling thread, the other node it is connected to, and its process ID on that node. If the thread is not currently migrated, both current_nid and peer_nid will be the ID of the origin node.
Supporting remote threads
Once a thread has been moved to another node, more work must be done to keep things synchronized. For example, if the remote thread exits, the origin thread must be made to exit too. A signal sent to the origin thread must be propagated to the remote version where the work is actually being done. All of this is handled by intercepting various actions and sending messages across the inter-node connections to cause the right things to happen. Some especially complicated code appears to be making futexes work across machines.
Migrating a thread sets up the basic information it needs to run, but leaves a lot of stuff behind; in particular, almost the entirety of the thread's memory-layout information still lives on the origin node, where it might well be shared with other threads. It is not surprising that memory management is the focus of some of the most complex code in the Popcorn patch set.
The set of virtual memory areas (VMAs) describing the thread's address-space layout will be shared with any other threads running in the same process — threads that probably have not been migrated to the same target node. So, while the migrated thread needs to mirror that VMA arrangement, it has little ability to change it without coordinating with the origin node. For VMAs, this coordination is handled by actually executing almost all operations at the origin.
Thus, for example, if the migrated thread calls mmap(), that call will be intercepted and shipped back to the origin for execution. The origin node will send back a response describing the result of the operation; the migrated thread's memory layout will then be adjusted to match. Other calls, including brk() and madvise() are handled in the same way.
Pages of actual memory need to be handled a bit differently, though, or performance will suffer horribly. Popcorn implements a protocol to allow the ownership of pages to move between nodes, much like ownership of cache lines can move between processors. Read-only copies of pages can be spread across a set of nodes, but only one node can be modifying a specific page at any given time. Much of the coordination is handled, once again, by the origin node, which handles tasks like sending and receiving copies of pages, invalidating pages on remote nodes, revoking page ownership, and more.
The patch set also adds a new madvise() operation, MADV_RELEASE, which explicitly releases a remote node's ownership of a range of pages.
Will it pop soon?
There is a lot more to Popcorn Linux than what has been posted to the list so far. There is a mechanism to run multiple kernels on the same machine, for example, using a modified version of the kexec mechanism. There is a whole fault-tolerance project underway. There is a mechanism to offload low-demand virtual machines to slow (but power-efficient) embedded boards, possibly running a different processor architecture. And more; the web site is well-populated with academic papers describing various parts of the system.
Popcorn Linux seems like an interesting project, so readers unfamiliar with how kernel development works may be surprised to see that this patch set, posted on April 29 and which has received a fair amount of attention on various Internet sites since, has not seen a single response on the mailing list. The reason for that is relatively straightforward, though: what has been posted is a pile of code, rather than a patch series that is intended for serious review and consideration. Patch 1 introduces the system calls, for example, but the structure definitions they rely on don't show up until Patch 5, and the messaging infrastructure, without which nothing works, shows up last. Your editor can attest that reading a patch series organized in this way is not a simple task; many busy kernel developers are unlikely to try.
One often hears complaints that the work done in academic settings almost never makes it into Linux; this seems paradoxical, given that the open development process behind Linux should be a natural fit for academic developers. This patch set shows where the roadblocks are, though. It represents many years worth of work, but none of that work was directed toward creating a patch set that is ready to be considered for merging.
To get this work even considered for upstream, the Popcorn Linux developers will have do a number of things. The patches will have to be reworked into a bisectable series, where each patch stands alone and can be considered on its own merits. The temporary messaging system will have to be replaced with something that is robust, secure, and fast. Performance benchmarks will have to be prepared, including the impact of Popcorn Linux on systems that are not using any of its features. Documentation is distressingly optional in kernel development, but a few pages of introductory material might help developers review the patches. And so on.
Doing all of that would be a lot of work, even before one gets into the code changes that are likely to become necessary during the review process. This work is expensive and is unlikely to lead to the publication of even a single thesis or academic paper. It is unsurprising that getting code of this complexity upstream tends to look unappealing to academic researchers. So that work is rarely done.
Sadly, that seems likely to be the fate of Popcorn Linux as well, unless somebody
can come up with the funding and the motivation to make it suitable for the
decidedly non-academic Linux kernel. Even if it is not merged, though,
Popcorn Linux may eventually inspire some energetic developer to adopt some
of its best ideas and get them upstream in some form. There is a lot of
interesting work to be found in this project; hopefully some of it will
eventually graduate from the academic setting and onto our systems.
Index entries for this article | |
---|---|
Kernel | Academic systems |
Kernel | Clusters |
Kernel | Popcorn Linux |
Posted May 5, 2020 13:29 UTC (Tue)
by bof (subscriber, #110741)
[Link] (11 responses)
Posted May 5, 2020 13:49 UTC (Tue)
by pabs (subscriber, #43278)
[Link] (7 responses)
Posted May 5, 2020 16:39 UTC (Tue)
by ballombe (subscriber, #9523)
[Link] (3 responses)
Posted May 5, 2020 17:15 UTC (Tue)
by k3ninho (subscriber, #50375)
[Link] (1 responses)
That's not to say that a single system image is wrong, but that we'd look to create such a thing in a different way in our present tooling.
K3n.
Posted May 5, 2020 19:59 UTC (Tue)
by ballombe (subscriber, #9523)
[Link]
SSI is used for HPC task which have a working set measured in terabyte, where traditional MPI is inadequate since it require a copy of the working set on each nodes. HP will sell you SSI with 64TB of RAM.
Posted May 5, 2020 21:37 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted May 5, 2020 21:50 UTC (Tue)
by acarno (subscriber, #123476)
[Link] (2 responses)
Source: I worked on this project in grad school (not on this portion of it though -- I doubt the parts I worked on will ever make it out of academia).
Posted May 5, 2020 23:40 UTC (Tue)
by pabs (subscriber, #43278)
[Link] (1 responses)
Posted May 6, 2020 10:38 UTC (Wed)
by ballombe (subscriber, #9523)
[Link]
The technology was killed by the cost of fast interconnect and the falling cost of RAM and multicore system.
Posted May 5, 2020 15:58 UTC (Tue)
by ldearquer (guest, #137451)
[Link] (2 responses)
Posted May 6, 2020 5:36 UTC (Wed)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted May 7, 2020 8:15 UTC (Thu)
by ldearquer (guest, #137451)
[Link]
Posted May 5, 2020 16:49 UTC (Tue)
by iabervon (subscriber, #722)
[Link] (2 responses)
Joking, of course, but it does seem like academic computer science has a good standard for presenting work that's lacking a good way to connect the presentation to the actual code written in doing the research.
Posted May 7, 2020 9:24 UTC (Thu)
by ecree (guest, #95790)
[Link] (1 responses)
Alternatively, get someone with deep pockets (the LF?) to spin up a new funding agency that focuses on stuff for your journal.
Posted May 7, 2020 15:02 UTC (Thu)
by iabervon (subscriber, #722)
[Link]
Posted May 6, 2020 12:21 UTC (Wed)
by mfuzzey (subscriber, #57966)
[Link]
Ignoring the added stuff in */popcorn and drivers/msg_layer (which should probably be popcorn_msg_layer) the number of lines changed is smaller than I would have expected for something like this.
Posted May 6, 2020 19:20 UTC (Wed)
by Lennie (subscriber, #49641)
[Link]
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
If you really need SSI with terabytes of RAM, this is much cheaper than buying dedicated machines from SGI or IBM. However do not expect similar level of performance, high-speed interconnect is costly.
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
They even supported automatic process migration from a CPU on one system to a CPU on another.
They also supported automatic detection and dynamic addition of nodes.
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Also, kerrighed latest version, as stated on their website, is from 2010, based on kernel 2.6.30
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel
Popcorn Linux pops up on linux-kernel