Grinberg: Linux on an 8-bit micro?

Posted Mar 30, 2012 12:26 UTC (Fri) by slashdot (guest, #22014)
In reply to: Grinberg: Linux on an 8-bit micro? by slashdot
Parent article: Grinberg: Linux on an 8-bit micro?

OK, he actually has a bigger problem: he's driving the external RAM in software!

So native execution is not feasible (and implementing 32-bit ops via 8-bit ops would kill code density anyway).

The optimal approach is then to compile via gcc to a virtual 32-bit no-MMU architecture designed for emulation speed (maybe there's something existing already), and then emulate that, using a software cache scheme with some of the internal RAM.

But overall, the problem is that the CPU is really utter crap (no external RAM, not 32-bit, only 20 MHz...).

Grinberg: Linux on an 8-bit micro?

Posted Mar 30, 2012 12:52 UTC (Fri) by jzbiciak (guest, #5246) [Link] (3 responses)

It looks like his core memRead/memWrite functions for the DRAM are about 28/29 cycles, assuming all the instructions are single-cycle. That's with a full RAS/CAS every access, if I followed the code correctly. Here's the core part that runs with interrupts disabled (ie. the actual bus toggle):

	out 0x02, r20	;PORTA = rB
	out 0x05, r24	;PORTB = rT
	sbi 0x09, 7	;PIND = (1 << 7)	;nRAS
	nop

	out 0x02, r22	;PORTA = cB
	out 0x05, r19	;PORTB = cT
	
	sbi 0x03, 4	;PINB = (1 << 4)	;nCAS
	
	nop
	nop
	
	in r24, 0x06	;r24 = PINC
	nop
	
	sbi 0x03, 4	;PINB = (1 << 4)	;nCAS
	sbi 0x09, 7	;PIND = (1 << 7)	;nRAS
	
	out 0x3F, r25	;restore SREG

If you only re-RAS when crossing a row boundary, this could speed up a bit, perhaps doubling the throughput to DRAM for sequential accesses. Combine that with the dcache part that he mentioned, and RAM is no longer the bottleneck. (By "re-RAS", I mean you can hold RAS low across multiple accesses, only taking it high when the row address changes, or when refresh kicks in.)

I see no problem driving the RAM in software. For large, cheap DRAMs on a controller like this, that's pretty much your only option. Otherwise, you need glue logic to convert bus cycles from the controller into the appropriate DRAM sequencing. Like I said, you could get most of the benefit through caching, though.

Looking at the code, he's got hooks for it there, it's just not implemented.

Taking a look at the code, it looks like programmer time was optimized over execution time. For a fun hobby project like this, it's hard to argue against that. :-) Now that it's all running, then execution time can be optimized if it makes sense to do so.

Grinberg: Linux on an 8-bit micro?

Posted Mar 30, 2012 13:03 UTC (Fri) by jzbiciak (guest, #5246) [Link]

Actually, if he wrote a "readRow" and "writeRow" that mated to an on-chip software cache, it's conceivable that the memory bandwidth would more than double. The part of the code I quoted above is about half of the memory read function. Once you've done all the row decode, if you know you're reading the whole row, you can loop through the columns fairly quickly (subject to the timing requirements of the DRAM).

Raw bandwidth to the DRAM goes up, and you get faster access inside the cache. Win on both counts.

I'm sure he's fully aware of this. He points out himself that he only does really basic byte accesses to the DRAM to keep the software simple. Like I said before, it looks optimized for programmer time and "get it working", which are two very important optimization targets for a purely fun project like this. :-) And let me be clear, it does look like a lot of fun, even if it's not at all practical. Then again, I implemented a serial terminal (complete with a subset of ANSI/VT-xxx decoding) on an Intellivision, so what do I know? (Here's me slowly typing in VIM, logged into my Linux box: https://www.youtube.com/watch?v=dG0nm2Do5Lo )

Grinberg: Linux on an 8-bit micro?

Posted Mar 30, 2012 15:33 UTC (Fri) by jthill (subscriber, #56558) [Link] (1 responses)

> if it makes sense

Heh.

But then, my brother and I timeshared an Apple II by swapping out page zero when we were kids, hooking it up to a DECWriter? I think that was it. The serial line was software-polled so performance was abysmal, input wasn't bad but printing was done a line at a time. Timeslicing the serial driver was too daunting, we had the idea but didn't really attempt it.

I'll always remember turning away from that with a bit of regret. Slicing big chunks out of response time _always_ makes sense.

Grinberg: Linux on an 8-bit micro?

Posted Mar 30, 2012 16:30 UTC (Fri) by jzbiciak (guest, #5246) [Link]

Slicing big chunks out of response time _always_ makes sense.

You won't hear me argue against that, except maybe in certain batch oriented circumstances. That wasn't really the axis I was considering, though, with my "if it makes sense to do so" comment.

I was really thinking more along the lines of "Will he spend time polishing this hack, or will he move onto the next big hack?" Is getting the ARM simulation up to 10-15 kHz more or less interesting than some other hack X that catches his fancy? Hard to say. :-)