Processor Groups Linux Kernel Patch
(Asymetric Multiprocessing for Linux)
THIS IS WAAAY OUTDATED!. It's basically just task and irq processor
affinity, which can now be handled by
schedutils and careful
proc/irq/#/irq_affinity handling. I'm just keeping this here for historical
reasons. -- john.c (10/16/2005)
This was one of my first forays into seeing what could be done with an OS
scheduler to take advantage of SMP and NUMA multiprocessor environments to
help speed up numerical computations. It is an ugly, ugly hack that was
more of a proof of concept. It did work, but the benefits gained are very
small, expecially on an SMP machine. I made this patch around the
2.4.0-test days of the Linux kernel. It allowed for, at compile time,
an administrator to specify a certain number of
processors in a multiprocessor systems to be "just" application processors
(ie, never be tied down with OS tasks). As a side effect, it also allowed a
used to tie a process to run only on a particular CPU. Most of the work for
this has (had) already been done by SGI (and others). In essence, it
would allow users of an SMP machine to use asymmetric multiprocessing.
Benefits of assigning a process to a specific processor
If a process is assigned to only one processor, then it will never have to
're-prime' the cache because it's been switched to running on a new
processor (and a new cache). For CPU/Memory intensive tasks, this
can lead to a small increase in performance. On a multiprocessor system, we
also have the luxury of modifying the scheduler to allow the task to run
uninterrupted on that processor if we want, because normal machine activity
(interrupts, other processes, login shells, etc..) will continue to be
handled by other processors. This also maximises the application's use of
the cache since we guarantee no other processes, not even the scheduling
quanta, will interrupt the program.
The Problems
- The performance increases for all of these things -only- appear for
CPU/Memory bound tasks, which is a very small number of applications. Any
I/O waiting will mean that the application processor sits idle while it
could be doing other, useful work.
- On an SMP machine, you are still forced to sharing the CPU bus and
memory bus bandwidth with other processors in the system, so while you get
the bebefits of your own processor and cache, you still only get 'your
share' of the available bandwidth. On a NUMA system, this is a different
story, and this patch could be much more useful on a NUMA system where you
aren't necessarily limited by such sharing. Unfortunately, i don't have a
NUMA machine to play with to test out this theory :). Anyonme wishing to
donate me one is more than welcome to contact me.
- There's a school of throught, subscribed to by Linus and most other
Linux developers out there, that this case would come about naturally if
Linux's scheduler was perfect, and thus we should work on modifying the
scheduler to be better rather than coming up with ugly hacks like this. In
general, I agree. But I also think there's nothing wrong with performance
being your top goal, and in the interm using hacks such as this to help get
your project done.
The Patch
I submitted this patch on the SGI LinuxScalability list. It generated
a little discussion, but noone seemed that interested in general, so I lost
interest for now. And then IBM came out with their Linux Scalability
Project and noone ever really posted to SGI list but me and one guy from
SGI. I guess everything happens on IBM's list now, I don't have much time
to keep track.
This patch consists of a few parts. First, the kernel patch, which is
against kernel 2.4.0-test6 (I believe). Don't expect it to work with later
kernel's, I've never tried, and I don't even have my old PII dual processor
machine to test on anymore. But there should be enough there to figure out
what I was trying to do. That is available here: procgroup-2.4.0-test6.diff
Second, once you've booted your kernel, you can optionally turn off all
interrupts from being routed to your application processors. To do so, go
into /proc/irq/#/irq_affinity and write into each one the bitmask of CPU's
that interrupt is allowed to be processed on. Write 1 for each OS CPU, and
a 0 for each application CPU.
Finally, you need the "assign2proc" program, whcih is available in source
form here. It's ugly, it's a gaping security hole, but it does work. It's
been so long, I don't remember the syntax. Use the source, Luke. I'm not
going to touch it again unless there's some interest generated (by me or
someone else). Here's the source for assign2proc: assign2proc.c
Conclusions:
After getting this hack up and running I ran a few tests and was wholly
unimpressed with the miniscule speedups gained. And since noone else seemed
interested in the hack, I've given up on it. Future work should be
concentrated on optimising the smarts of the current Linux schedular rather
than bastardized hacks such as this. But I'm posting it anyway, even after
all this time, because it was an interesting project, I had fun doing it,
and maybe someone will stumble along it someday and get inspired.
Feel free to email me, john@deater.net or
clemej@alum.rpi.edu
You can visit my poorly maintained
homepage as well... Maybe even look at my resume?
This page was last modified on :