[LWN Logo]

Date:	Sun, 27 Feb 2000 16:04:13 +0100 (CET)
From:	Ingo Molnar <mingo@chiara.csoma.elte.hu>
To:	Andrea Arcangeli <andrea@suse.de>
Subject: new IRQ scalability changes in 2.3.48

On Sun, 27 Feb 2000, Andrea Arcangeli wrote:

> I ported the SMP irq affinity code and the per-irq-desc locking to alpha
> (plus the ->end semantical change). [...]

here is a summary of all the IA32 IRQ scalability changes which were added
as of 2.3.48, so that other architectures can make sense of these changes
and potentially adopt them:

	- per-IRQ-source spinlocks and per-IRQ-controller spinlocks
	  increasing scalability: now two IRQ handlers on two CPUs
	  can run do_IRQ in parallel. Note that level-triggered PCI IRQ
	  handlers never actually take the IRQ-controller spinlock in the
	  'IRQ handling fast path'.

	- got rid of the global_irq_count shared variable, it was
	  cache-pingponging like hell during multi-CPU interrupt
	  load. The irqs_running() function does it all now - cli()/sti()
	  thus got a bit slower, but it's worth it. The change is supposed
	  to be an invariant otherwise.

	- Reworked (level-triggered) IO-APIC IRQ handlers to never touch
	  the IO-APIC registers and keep the interrupt unacked in the
	  local APIC while the handler is running. This speeded
	  'null IRQ latency' up considerably and also works better with
	  hardware features like focus-CPU, and causes better IRQ
	  atomicity. The 'legacy' edge-triggered IO-APIC IRQ sources
	  still need the slower method to work reliably.

	- per-CPU IRQ statistics causing better cache workload

	- explicit IRQ affinity (to a group of CPUs) can be set through
	  /proc/irq/*. Extended the IRQ controller function template with
	  ->set_affinity(). See Documentation/IRQ-affinity.txt for more.

	- added /proc/irq/prof_cpu_mask, to enable profiling on a single
	  CPU only. (useful to determine the true idleness of a CPU, and
	  other interesting things when using CPU-affine IRQs.)

	- the irq_handler->end() semantics had to be changed slightly to
	  allow the fastest possible IO-APIC IRQ handling on x86.

architectures that are currently using (a hw-adopted version of) the IA32
IRQ architecture are: Alpha, IA64, SH and ARM.

> I checked it works fine here. The sys_dp264 is the only port that
> actively uses SMP irq affinity it (because it's the only one capable
> of SMP irq scaling) and so it's also the only one who currently needs
> lowlevel controller locking. There are also a few common code changes
> (the irq_stat is useless on alpha, on alpha there's a better cpu_data
> smp struct where all the per-cpu things gets allocated) There are a
> few IA32 irq.c cleanups for some 64bit issue. [...]

yep. In 2.5 the IA32 irq.c will probably be moved into kernel/irq.c so
it's important to keep it 64-bit clean. Since there are 11 different
architectures in the main tree now (and 2-3 not yet integrated ones) this
can definitely not happen now, but will be very important to do in 2.5.

Manfred Spraul does have some ideas/patches wrt. per-CPU data structures -
i believe these concepts have to be unified in 2.5 as well (together with
the unification of the irq code). Sparc64 had these per-CPU data
structures for ages.

a related 'SMP-scalability' note: i've implemented a new type of
read-write spinlock which does not cause cacheline pingpong in the read
path (and is thus extremely scalable and cache-friendly), David Miller
added his own ideas and ported it to Sparc - this should show up soon.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/