Grinberg: Linux on an 8-bit micro?
uARM is certainly no speed demon. It takes about 2 hours to boot to bash prompt ("init=/bin/bash" kernel command line). Then 4 more hours to boot up the entire Ubuntu ("exec init" and then login). Starting X takes a lot longer. The effective emulated CPU speed is about 6.5KHz, which is on par with what you'd expect emulating a 32-bit CPU & MMU on a measly 8-bit micro. Curiously enough, once booted, the system is somewhat usable. You can type a command and get a reply within a minute. That is to say that you can, in fact, use it. I used it to day to format an SD card, for example. This is definitely not the fastest, but I think it may be the cheapest, slowest, simplest to hand assemble, lowest part count, and lowest-end Linux PC. The board is hand-soldered using wires, there is not even a requirement for a printed circuit board."
Posted Mar 29, 2012 23:31 UTC (Thu)
by lkundrak (subscriber, #43452)
[Link]
Posted Mar 30, 2012 2:06 UTC (Fri)
by jengelh (guest, #33263)
[Link]
Posted Mar 30, 2012 2:09 UTC (Fri)
by bshotts (subscriber, #2597)
[Link] (5 responses)
Posted Mar 30, 2012 3:13 UTC (Fri)
by felixfix (subscriber, #242)
[Link] (4 responses)
Earliest Unix computers were a few years earlier but more core and bigger disks. I'd guess roughly the same cycle time, but 36 bit words (PDP-6, -10) and I would guess 32K or 64K of that. The CDC 7600 super computer from 1968 had a basic instruction time of 27.5 nsec with memory read/write of ten times that, but with pipelining, if memory serves, some small multiple of 64K 60 bit words, ie maybe 1-2MB of core. Full floating point divide was only a few cycles, 2-3? 5? Seymour Cray was a frickin genius.
Posted Mar 30, 2012 3:53 UTC (Fri)
by donbarry (guest, #10485)
[Link]
According to the CDC 7600 hardware instant manual, floating divide (rounded or unrounded) on the processor took 20 minor cycles. Floating product took 5. Add/subtract were 4. Population count (58 cycles on the 6400) was down to 2 cycles on the 7600.
I cut my teeth on a Cyber 74. Ahh, those were the days.
Posted Mar 30, 2012 4:59 UTC (Fri)
by eru (subscriber, #2753)
[Link] (1 responses)
Unix development was done mostly on PDP-11 series machines, which were 16-bit with 8-bit bytes. Force-fitting C to work those 36-bit architectures that were not byte-addressable was done much later, I believe. Probably the C language would have been very different, if Kernighan & Ritchie had been using the word-addressable CPU:s that were common in those days.
Posted Apr 21, 2012 1:48 UTC (Sat)
by ssavitzky (subscriber, #2855)
[Link]
LISP and FORTRAN were based on the IBM 709.
Posted Apr 6, 2012 8:01 UTC (Fri)
by Cato (guest, #7643)
[Link]
Posted Mar 30, 2012 5:07 UTC (Fri)
by eru (subscriber, #2753)
[Link] (10 responses)
I recall there was once a project to make a Unix-like operating system for 8-bit Z80 series microprocessors, called "UZI". Lets see if Google finds traces of it... Here: http://www.dougbraun.com/uzi.html
Posted Mar 30, 2012 5:29 UTC (Fri)
by scientes (guest, #83068)
[Link]
Pretty much sums up the usefulness of these types of efforts.
(not that it wouldn't be good to have the kernel so fast that such emulations could be more useful)
Posted Mar 30, 2012 5:32 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
Posted Mar 31, 2012 3:18 UTC (Sat)
by xtifr (guest, #143)
[Link] (3 responses)
(I actually got Minix running under OS/2's "virtual DOS machine" at one point, which was pretty fun. Didn't work with Linux, though--not even those early versions.)
Posted Mar 31, 2012 5:32 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Okay, okay, I'm getting off your lawn.
Posted Mar 31, 2012 9:16 UTC (Sat)
by juliank (guest, #45896)
[Link]
The parent post talked about the 8086, though, anyway.
Posted Mar 31, 2012 15:00 UTC (Sat)
by jzbiciak (guest, #5246)
[Link]
Well, ok, they differed slightly in rare cases where you were doing something tricky to expose the different prefetch depths on the BIU. As I recall, that was pretty much the only way to tell the two apart in software across clock rates and system architectures.
Posted Mar 30, 2012 9:25 UTC (Fri)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Mar 30, 2012 11:05 UTC (Fri)
by pboddie (guest, #50784)
[Link]
Wikipedia will undoubtedly list others. The project described in the article is quite an achievement in so many ways, but I'd be interested to see something cross-compiled to the architecture in question.
Posted Mar 30, 2012 12:55 UTC (Fri)
by jzbiciak (guest, #5246)
[Link]
Posted Mar 30, 2012 11:55 UTC (Fri)
by slashdot (guest, #22014)
[Link] (6 responses)
According to Atmel the ATmega1284p is 20 MHz, so 6.5 Khz means his emulation scheme is resulting in a 3000x slowdown, which means his emulator utterly sucks.
Having no binary translation plus only 300 KB/s RAM and limited or no cache is probably the issue (my Sandy Bridge does around 20 GB/s to RAM and 110 GB/s to L1 by comparison).
Posted Mar 30, 2012 12:26 UTC (Fri)
by slashdot (guest, #22014)
[Link] (4 responses)
So native execution is not feasible (and implementing 32-bit ops via 8-bit ops would kill code density anyway).
The optimal approach is then to compile via gcc to a virtual 32-bit no-MMU architecture designed for emulation speed (maybe there's something existing already), and then emulate that, using a software cache scheme with some of the internal RAM.
But overall, the problem is that the CPU is really utter crap (no external RAM, not 32-bit, only 20 MHz...).
Posted Mar 30, 2012 12:52 UTC (Fri)
by jzbiciak (guest, #5246)
[Link] (3 responses)
It looks like his core memRead/memWrite functions for the DRAM are about 28/29 cycles, assuming all the instructions are single-cycle. That's with a full RAS/CAS every access, if I followed the code correctly. Here's the core part that runs with interrupts disabled (ie. the actual bus toggle):
If you only re-RAS when crossing a row boundary, this could speed up a bit, perhaps doubling the throughput to DRAM for sequential accesses. Combine that with the dcache part that he mentioned, and RAM is no longer the bottleneck. (By "re-RAS", I mean you can hold RAS low across multiple accesses, only taking it high when the row address changes, or when refresh kicks in.) I see no problem driving the RAM in software. For large, cheap DRAMs on a controller like this, that's pretty much your only option. Otherwise, you need glue logic to convert bus cycles from the controller into the appropriate DRAM sequencing. Like I said, you could get most of the benefit through caching, though. Looking at the code, he's got hooks for it there, it's just not implemented. Taking a look at the code, it looks like programmer time was optimized over execution time. For a fun hobby project like this, it's hard to argue against that. :-) Now that it's all running, then execution time can be optimized if it makes sense to do so.
Posted Mar 30, 2012 13:03 UTC (Fri)
by jzbiciak (guest, #5246)
[Link]
Raw bandwidth to the DRAM goes up, and you get faster access inside the cache. Win on both counts.
I'm sure he's fully aware of this. He points out himself that he only does really basic byte accesses to the DRAM to keep the software simple. Like I said before, it looks optimized for programmer time and "get it working", which are two very important optimization targets for a purely fun project like this. :-) And let me be clear, it does look like a lot of fun, even if it's not at all practical. Then again, I implemented a serial terminal (complete with a subset of ANSI/VT-xxx decoding) on an Intellivision, so what do I know? (Here's me slowly typing in VIM, logged into my Linux box: https://www.youtube.com/watch?v=dG0nm2Do5Lo )
Posted Mar 30, 2012 15:33 UTC (Fri)
by jthill (subscriber, #56558)
[Link] (1 responses)
Heh.
But then, my brother and I timeshared an Apple II by swapping out page zero when we were kids, hooking it up to a DECWriter? I think that was it. The serial line was software-polled so performance was abysmal, input wasn't bad but printing was done a line at a time. Timeslicing the serial driver was too daunting, we had the idea but didn't really attempt it.
I'll always remember turning away from that with a bit of regret. Slicing big chunks out of response time _always_ makes sense.
Posted Mar 30, 2012 16:30 UTC (Fri)
by jzbiciak (guest, #5246)
[Link]
You won't hear me argue against that, except maybe in certain batch oriented circumstances. That wasn't really the axis I was considering, though, with my "if it makes sense to do so" comment. I was really thinking more along the lines of "Will he spend time polishing this hack, or will he move onto the next big hack?" Is getting the ARM simulation up to 10-15 kHz more or less interesting than some other hack X that catches his fancy? Hard to say. :-)
Posted Apr 1, 2012 15:33 UTC (Sun)
by andreasb (guest, #80258)
[Link]
Ported to what? AVR? There has been gcc for AVR for ages. The emulator is in fact written in C with the Makefile using avr-gcc.
And what exactly should be compiled natively? Linux?
Note that the ATmega1284P has all of 128 kB flash and 16 kB SRAM, and flash is the only place it can execute code from as is common for small microcontroller architectures.
Posted Mar 30, 2012 12:49 UTC (Fri)
by osma (subscriber, #6912)
[Link] (3 responses)
What next, starting up Java programs on this setup? So then you would have a real JVM running on a 8 bit CPU...
Posted Mar 30, 2012 13:45 UTC (Fri)
by paulj (subscriber, #341)
[Link] (1 responses)
Basically, it has long been proven that anything we'd recognise as a computer with memory can emulate any other such computer. The number of bits doesn't matter.
Posted Mar 30, 2012 15:46 UTC (Fri)
by salas (guest, #72871)
[Link]
Posted Mar 30, 2012 17:48 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Then you could run that emulator using the JavaScript Visual 6502 emulator, at an approximate one millionfold slowdown over the original CPU.
And *then* you'd have an unusably slow Ubuntu boot process.
Posted Mar 30, 2012 14:44 UTC (Fri)
by hamjudo (guest, #363)
[Link] (1 responses)
Users are cautioned to avoid ecommerce sites where transactions take more than an hour each, as that could indicate that the site is hosted on one of these systems.
Posted Mar 30, 2012 18:33 UTC (Fri)
by man_ls (guest, #15091)
[Link]
Grinberg: Linux on an 8-bit micro?
Grinberg: Linux on an 8-bit micro?
Couldn't it be faster?
Couldn't it be faster?
Couldn't it be faster?
I'd guess roughly the same cycle time, but 36 bit words (PDP-6, -10)
Couldn't it be faster?
Couldn't it be faster?
Couldn't it be faster?
Very interesting, but isn't this still Linux on ARM? The ARM architecture just happens to be implemented with an emulator running on a 8-bit processor.
Actual 8-bit *nix.
Actual 8-bit *nix.
Actual 8-bit *nix.
Actual 8-bit *nix.
Actual 8-bit *nix.
Actual 8-bit *nix.
Actual 8-bit *nix.
Actual 8-bit *nix.
Actual 8-bit *nix.
Actual 8-bit *nix.
Grinberg: Linux on an 8-bit micro?
Grinberg: Linux on an 8-bit micro?
Grinberg: Linux on an 8-bit micro?
out 0x02, r20 ;PORTA = rB
out 0x05, r24 ;PORTB = rT
sbi 0x09, 7 ;PIND = (1 << 7) ;nRAS
nop
out 0x02, r22 ;PORTA = cB
out 0x05, r19 ;PORTB = cT
sbi 0x03, 4 ;PINB = (1 << 4) ;nCAS
nop
nop
in r24, 0x06 ;r24 = PINC
nop
sbi 0x03, 4 ;PINB = (1 << 4) ;nCAS
sbi 0x09, 7 ;PIND = (1 << 7) ;nRAS
out 0x3F, r25 ;restore SREG
Grinberg: Linux on an 8-bit micro?
Grinberg: Linux on an 8-bit micro?
Grinberg: Linux on an 8-bit micro?
Slicing big chunks out of response time _always_ makes sense.
Grinberg: Linux on an 8-bit micro?
Grinberg: Linux on an 8-bit micro?
Grinberg: Linux on an 8-bit micro?
With enough memory.
Grinberg: Linux on an 8-bit micro?
Grinberg: Linux on an 8-bit micro?
Hypercalls to the emulator are not adequately restricted in Dmitry Grinberg's excellent ARM emulator. An unprivileged user or process can gain full root access.For Release 1-April-2012: Privilege Escalation Vulnerability
Thanks for the advance laughs! I was wondering if your comment was a (-n intended) joke until I pushed "reply" and read the subject.
For Release 1-April-2012: Privilege Escalation Vulnerability