2.6.0-test9-mm4 must-fix list
[Posted November 19, 2003 by corbet]
Must-fix bugs
=============
drivers/char/
~~~~~~~~~~~~~
o TTY locking is broken.
o see FIXME in do_tty_hangup(). This causes ppp BUGs in local_bh_enable()
o Other problems: aviro, dipankar, Alan have details.
o somebody will have to document the tty driver and ldisc API
drivers/tty
~~~~~~~~~~~
o viro: tty_driver refcounting, tty/misc/upper levels of sound still not
completely fixed.
drivers/block/
~~~~~~~~~~~~~~
o ideraid hasn't been ported to 2.5 at all yet.
We need to understand whether the proposed BIO split code will suffice
for this.
drivers/input/
~~~~~~~~~~~~~~
o rmk: unconverted keyboard/mouse drivers (there's a deadline of 2.6.0
currently on these remaining in my/Linus' tree.)
o viro: large absence of locking.
o viro: parport is nearly as bad as that and there the code is more hairy.
IMO parport is more of "figure out what API changes are needed for its
users, get them done ASAP, then fix generic layer at leisure"
o (Albert Cahalan) Lots of people (check Google) get this message from the
kernel:
psmouse.c: Lost synchronization, throwing 2 bytes away.
(the number of bytes will be 1, 2, or 3)
At work, I get it when there is heavy NFS traffic. The mouse goes crazy,
jumping around and doing random cut-and-paste all over everything. This
is with a decently fast and modern PC.
o There seem to be too many reports of keyboards and mice failing or acting
strangely.
drivers/misc/
~~~~~~~~~~~~~
o rmk: UCB1[23]00 drivers, currently sitting in drivers/misc in the ARM
tree. (touchscreen, audio, gpio, type device.)
These need to be moved out of drivers/misc/ and into real places
o viro: actually, misc.c has a good chance to die. With cdev-cidr that's
trivial.
drivers/net/
~~~~~~~~~~~~
drivers/net/irda/
~~~~~~~~~~~~~~~~~
o dongle drivers need to be converted to sir-dev
o irport need to be converted to sir-kthread
o new drivers (irtty-sir/smsc-ircc2/donauboe) need more testing
o rmk: Refuse IrDA initialisation if sizeof(structures) is incorrect (I'm
not sure if we still need this; I think gcc 2.95.3 on ARM shows this
problem though.)
drivers/pci/
~~~~~~~~~~~~
o alan: Some cardbus crashes the system
(bugzilla, please?)
drivers/pcmcia/
~~~~~~~~~~~~~~~
o alan: This is a locking disaster.
(rmk, brodo: in progress)
drivers/pld/
~~~~~~~~~~~~
o rmk: EPXA (ARM platform) PLD hotswap drivers (drivers/pld)
(rmk: will work out what to do here. maybe drivers/arm/)
drivers/video/
~~~~~~~~~~~~~~
o Lots of drivers don't compile, others do but don't work.
drivers/scsi/
~~~~~~~~~~~~~
o hch: large parts of the locking are hosed or not existant
(Mike Anderson, Patrick Mansfield, Badari Pulavarty)
o shost->my_devices isn't locked down at all
o there are lots of members of struct Scsi_Host/scsi_device/scsi_cmnd
with very unclear locking, many of them probably want to become
atomic_t's or bitmaps (for the 1bit bitfields).
o there's lots of volatile abuse in the scsi code that needs to be
thought about.
o there's some global variables incremented without any locks
o Convert am53c974, dpt_i2o, initio and pci2220i to DMA-mapping
o Make inia100, cpqfc, pci2000 and dc390t compile
o Convert
wd33c99 based: a2091 a3000 gpv11 mvme174 sgiwd93
53c7xx based: amiga7xxx bvme6000 mvme16x initio am53c974 pci2000
pci2220i dc390t
To new error handling
It also might be possible to shift the 53c7xx based drivers over to
53c700 which does the new EH stuff, but I don't have the hardware to check
such a shift.
For the non-compiling stuff, I've probably missed a few that just aren't
compilable on my platforms, so any updates would be welcome. Also, are
some of our non-compiling or unconverted drivers obsolete?
o rmk: I have a pending todo: I need to put the scsi error handling through
a workout on my scsi bus from hell to make sure it does the right thing
and doesn't get wedged.
o James B: USB hot-removal crash: "It's a known scsi refcounting issue."
o James B: refcounting issues in SCSI and in the block layer.
fs/
~~~
o AIO/direct-IO writes can race with truncate and wreck filesystems.
(Badari has a patch)
o hch: devfs: there's a fundamental lookup vs devfsd race that's only
fixable by introducing a lookup vs devfs deadlock. I can't see how this is
fixable without getting rid of the current devfsd design. Mandrake seems
to have a workaround for this so this is at least not triggered so easily,
but that's not what I'd consider a fix..
o viro: fs/char_dev.c needs removal of aeb stuff and merge of cdev-cidr.
In progress.
o forward-port sct's O_DIRECT fixes (Badari has a patch)
o viro: there is some generic stuff for namei/namespace/super, but that's a
slow-merge and can go in 2.6 just fine
o andi: also soft needs to be fixed - there are quite a lot of
uninterruptible waits in sunrpc/nfs
o trond: NFS has a mmap-versus-truncate problem
kernel/sched.c
~~~~~~~~~~~~~~
o Starvation, general interactivity need close monitoring.
kernel/
~~~~~~~
o Alan: 32bit uid support is *still* broken for process accounting.
Create a 32bit uid, turn accounting on. Shock horror it doesn't work
because the field is 16bit. We need an acct structure flag day for 2.6
IMHO
(alan has patch)
o viro: core sysctl code is racy. And its interaction wiuth sysfs
o (ingo) rwsems (on x86) are limited to 32766 waiting processes. This
means that setting pid_max to above 32K is unsafe :-(
An option is to use CONFIG_RWSEM_GENERIC_SPINLOCK variant all the time,
for all archs, and not inline any part of the ops.
lib/kobject.c
~~~~~~~~~~~~~
o kobject refcounting (comments from Al Viro):
_anything_ can grab a temporary reference to kobject. IOW, if kobject is
embedded into something that could be freed - it _MUST_ have a destructor
and that destructor _MUST_ be the destructor for containing object.
Any violation of the above (and we already have a bunch of those) is a
user-triggerable memory corruption.
We can tolerate it for a while in 2.5 (e.g. during work on susbsystem we
can decide to switch to that way of handling objects and have subsystem
vulnerable for a while), but all such windows must be closed before 2.6
and during 2.6 we can't open them at all.
o All block drivers which control multiple gendisks with a single
request_queue are broken, due to one-to-one assumptions in the request
queue sysfs hookup.
mm/
~~~
o GFP_DMA32 (or something like that). Lots of ideas. jejb, zaitcev,
willy, arjan, wli.
Specifically, 64-bit systems need to be able to enforce 32-bit addressing
limits for device metadata like network cards' ring buffers and SCSI
command descriptors.
o access_process_vm() doesn't flush right. We probably need new flushing
primitives to do this (davem?)
modules
~~~~~~~
(Rusty)
net/
~~~~
(davem)
o UDP apps can in theory deadlock, because the ip_append_data path can end
up sleeping while the socket lock is held.
It is OK to sleep with the socket held held, normally. But in this case
the sleep happens while waiting for socket memory/space to become
available, if another context needs to take the socket lock to free up the
space we could hang.
I sent a rough patch on how to fix this to Alexey, and he is analyzing
the situation. I expect a final fix from him next week or so.
o Semantics for IPSEC during operations such as TCP connect suck currently.
When we first try to connect to a destination, we may need to ask the
IPSEC key management daemon to resolve the IPSEC routes for us. For the
purposes of what the kernel needs to do, you can think of it like ARP. We
can't send the packet out properly until we resolve the path.
What happens now for IPSEC is basically this:
O_NONBLOCK: returns -EAGAIN over and over until route is resolved
!O_NONBLOCK: Sleeps until route is resolved
These semantics are total crap. The solution, which Alexey is working
on, is to allow incomplete routes to exist. These "incomplete" routes
merely put the packet onto a "resolution queue", and once the key manager
does it's thing we finish the output of the packet. This is precisely how
ARP works.
I don't know when Alexey will be done with this.
net/*/netfilter/
~~~~~~~~~~~~~~~~
(Rusty)
o Rework conntrack hashing.
o Module relationship bogosity fix (trivial, have patch).
sound/
~~~~~~
o rmk: several OSS drivers for SA11xx-based hardware in need of
ALSA-ification and L3 bus support code for these.
o rmk: linux/sound/drivers/mpu401/mpu401.c and
linux/sound/drivers/virmidi.c complained about 'errno' at some time in the
past, need to confirm whether this is still a problem.
o rmk: need to complete ALSA-ification of the WaveArtist driver for both
NetWinder and other stuff (there's some fairly fundamental differences in
the way the mixer needs to be handled for the NetWinder.)
(Issues with forward-porting 2.4 bugfixes.)
(Killing off OSS is 2.7 material)
global
~~~~~~
o alan, Albert Cahalan: 1000 HZ timer increases the need for a stable time
source. Many laptops, SMI can lose ticks. ACPI timers? TSC?
o viro: 64-bit dev_t (not a mustfix for 2.6.0). 32-bit dev_t is done, 64-bit
means extra work on nfsd/raid/etc.
o alan: Forward port 2.4 fixes
- Security fixes including execve holes, execve vs proc races
- SiS IRQ routing for newer SiS and older Intel
o There are about 60 or 70 security related checks that need doing
(copy_user etc) from Stanford tools. (badari is looking into this, and
hollisb)
o A couple of hundred real looking bugzilla bugs
o viro: cdev rework. Mostly done.
(
Log in to post comments)