The following is a raw transcript from the second discussion of the 2.6.0
"must-fix" list; this one looked (mostly) at new features and wishlist
items.
|
<hanna> |
Welcome everybody. Thanks for attending again. Same rules as last time.
|
|
<hanna> |
Dont talk unless you have something to add. Keep in depth technical
|
|
<hanna> |
discussions to the level appropriate for keeping this meeting around
|
|
<hanna> |
an hour. Lets get started.
|
| aebr (~aeb@a213-84-53-62.adsl.xs4all.nl) has joined channel #lse
|
|
<hanna> |
akpm, ready to go. thanks for doing this again!
|
|
<akpm> |
np, thanks.
|
|
dm (~dm@12.98.126.212) has joined channel #lse
|
|
<akpm> |
latest paperwork is at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/must-fix/must-fix-4a.txt
|
|
orjan (~orjan@c213-89-50-36.cm-upc.chello.se) has joined channel #lse
|
|
nevdull (~rick@32.97.110.142) has joined channel #lse
|
|
<akpm> |
it's not substantively different from v4 so I didn't put out a new one formally
|
|
<akpm> |
we need to do "power management" from the bugs section - that was missed last week
* zaxl (Auto-Away after 10 mins) [BX-MsgLog On]
|
|
msa (~maalanen@instlogin.cs.abo.fi) has joined channel #lse
|
|
mingo (~mingo@66.187.233.200) has joined channel #lse
|
|
<akpm> |
So mochel has a bunch of pending work there.
|
|
<akpm> |
o New device power management core code, both for individual devices,
|
|
<akpm> |
and for global state transitions.
|
|
<akpm> |
o A generic user interface for triggering system power state transitions.
|
|
<akpm> |
o Arch-independent code for performing state transitions, that calls
|
|
<akpm> |
platform-specific methods along the way.
|
|
<akpm> |
o A better suspend-to-disk mechanism than swsusp.
|
|
<akpm> |
mochel: what's the status of all that?
|
|
<mochel> |
i'm testing/debugging it right now
|
|
<akpm> |
mochel: ok. Anything to add there?
|
|
<mochel> |
it's not critical, and is fairly self-contained
|
|
ChanServ (services@services.oftc.net) has changed mode for #lse to +o riel
|
|
<mochel> |
i.e. it doesn't need to hold up 2.6.0, and could safely be added later.
|
|
<Alan> |
mochel: do we have the ability to get a device suspend call back after we finally turn off IRQ's now ?
|
|
<mochel> |
Alan: yes.
|
|
putch (~planchett@81.49.240.199) has joined channel #lse
|
|
vana (vana@147.32.240.58) has joined channel #lse
|
|
<akpm> |
Now there's a bunch of things from Alan
|
|
<akpm> |
o PCI locking
|
|
<putch> |
slt tout le monde
|
|
<akpm> |
gregkh is working on that
|
|
<akpm> |
(its a dupe)
|
|
dipankar (~dipankar@129.42.208.139) has quit: Remote host closed the connection
|
|
<akpm> |
o Frame buffer restore codepaths (that requires some deep PCI magic)
|
|
putch (~planchett@81.49.240.199) has left channel #lse
|
|
dipankar (~dipankar@32.97.110.66) has joined channel #lse
|
|
<akpm> |
I don't know what the story is there. Is it critical?
|
|
<Alan> |
akpm: I dont recognize that as one of mine
|
|
<Alan> |
there are pending issues about how we get frame buffer back ok from swsuspend and also a pci locking related X + hotplug one that I think is post 2.6
|
|
<akpm> |
Alan: OK. You did send that actually ;)
|
|
<akpm> |
o XFree86 hooks
|
|
<akpm> |
o AGP restoration
|
|
<akpm> |
o DRI restoration
|
|
aeb (~aeb@24.132.5.101) has quit: Ping timeout: 488 seconds
|
|
<Alan> |
akpm: XFree86 hooks I think has to be post 2.6 now
|
|
<akpm> |
they all sound related.
|
|
<akpm> |
Alan: OK, what are the implications of not having them?
|
|
<davej> |
AGP has the hooks in place, just needs someone to do the chipset relevant bits
|
|
<Alan> |
akpm: you suspend the laptop and resume and your first 3d app crashes
|
|
<Alan> |
the box
|
|
<Alan> |
davej: for most of them the resume == init
|
|
<davej> |
*nod*
|
|
<mochel> |
it's gonna be hell and will take time.
|
|
<hch> |
davej: actually the agp hooks are gone now
|
|
<davej> |
I'd rather that was tested and patched instead of just blindly changed though
|
|
<mochel> |
but, they're all feature-related, and not bug-related..
|
|
<hch> |
instead the agp drivers use the pci hooks
|
|
<davej> |
hch: the generic ones yes
|
|
<akpm> |
Alan: that sounds fairly fatal. The workaround is to exit X?
|
|
<Alan> |
akpm: if AGP/DRI is modular otherwise reboot
|
|
<Alan> |
and if your dekstop is 3d based its really messy
|
|
<Alan> |
DRI stuff X folks are working ok
|
|
<Alan> |
s/ok/on
|
|
<akpm> |
Alan: does 2.4 crash in this manner as well?
|
|
<Alan> |
AGP stuff in general means calling the chip init routine again to reload the AGP memory data
|
|
<Alan> |
akpm: vendor 2.4 has patches
|
|
<Alan> |
thats how I know calling the init routine normally does all you need
|
|
dmo (~dmo@208.186.192.194) has joined channel #lse
|
|
<Alan> |
Also for APM suspend (as opposed to ACPI) most APM bioses will save/restore them
|
|
<akpm> |
Alan: so what is the conclusion here? Rely on vendor semi-fixes for 2.6?
|
|
hbaum_ (~hbaum@32.97.110.142) has joined channel #lse
|
|
hbaum_ (~hbaum@32.97.110.142) has quit: Client Quit
|
|
<Alan> |
akpm: if the hooks work its just another "this driver wants a few fixes"
|
|
<davej> |
its important, but not 'lets hold back 2.6.0' important IMO
|
|
<akpm> |
davej: OK. And you have a handle on what's needed?
|
|
<Alan> |
davej: right - 2.6.0 with working hooks doesnt stop 2.6.1 with fixes for AGP resume in drivers
|
|
<davej> |
alan, akpm: sure
|
|
lvmguy (~mauelshag@208.210.149.34) has joined channel #lse
|
|
<mochel> |
i can make sure the hooks are there.
|
|
<mochel> |
at least from the driver core..
|
|
<akpm> |
I'm surprised I've never seen a bug report of this. I assume the APM restore is working well?
|
|
<Alan> |
akpm: depends on the laptop vendor
|
|
<akpm> |
hm, OK. next.
|
|
<akpm> |
o IDE suspend/resume without races (Ben is looking at this a little)
|
|
oxymoron (~oxymoron@waste.org) has joined channel #lse
|
|
dipankar (~dipankar@32.97.110.66) has quit: Remote host closed the connection
|
|
<akpm> |
o How to deal with devices that babble (some stuff we have to global IRQ
|
|
<akpm> |
off to save, and global IRQ on -after- we recover with APM)
|
|
<akpm> |
Alan: which devices do you mean here?
|
|
<Alan> |
akpm: Ben's IDE approach is elegant and seems to solve the problem (he's queuing the suspend/resume on the request queue)
|
|
<akpm> |
that's neat
|
|
dardhal (~dardhal@213.0.201.144) has joined channel #lse
|
|
<Alan> |
akpm: devices that babble is resolved- Mochel confirmed there is at least one pm callback point after irqs are disabled
* akpm deletes
|
|
dipankar (~dipankar@202.88.171.30) has joined channel #lse
|
|
<akpm> |
o Pat's swsusp rework?
|
|
<mochel> |
Alan: and another on resume beore irqs are enabled
|
|
<Alan> |
mochel: perfect
|
|
<mochel> |
akpm: part of same pm work. doing suspend to disk to verify generic model works.
|
|
<akpm> |
mochel: yup. And you're not actually looking at suspend-to-swap as such are you?
|
|
<mochel> |
akpm: no. it's configurable, but currently doing suspend-to-dedicated partition.
|
|
<akpm> |
mochel: ok, that's a dupe anyway
|
|
<akpm> |
last up in PM:
|
|
<akpm> |
o Pat: There are already CPU device structures; MTRRs should be a
|
|
<akpm> |
dynamically registered interface of CPUs, which implies there needs
|
|
<akpm> |
to be some other glue to know that there are MTRRs that need to be
|
|
<akpm> |
saved/restored.
|
|
<akpm> |
is that a "bug"?
|
|
<mochel> |
akpm: yes, but not serious.
|
|
<davej> |
wasn't there already a patch for the MTRR sysfs stuff ?
|
|
<mochel> |
akpm: som work needs to be done in that area, and shouldn't much a problem
|
|
<mochel> |
davej: yes, but they wanted to register mtrrs as device, whih they clearly aren't
|
|
ddaM (~dominant@210.186.172.65) has joined channel #lse
|
|
<akpm> |
so what's the conclusion here? Nice to have, nobody's actively working it?
|
|
<mochel> |
akpm: i'll take care of it in the next couple of weeks.
|
|
<akpm> |
mochel: ok, thanks.
|
|
cryogen (~cryogen@cryogen.noc.oftc.net) has joined channel #lse
|
|
<akpm> |
Not-ready features and speedups
|
|
<akpm> |
drivers/block/
|
|
<akpm> |
o Framework for selecting IO schedulers
|
|
<akpm> |
this is basically the last missing piece before we can merge the anticipatory scheduler and CFQ
|
|
<akpm> |
Jens will be doing this.
|
|
<hch> |
what's the state of CFQ?
|
|
<akpm> |
o CFQ scheduler. Seems to work but Jens planning significant rework.
|
|
<rdunlap> |
runtime-selectable per device or what?
|
|
<akpm> |
hch: experimental but stable, basically.
|
|
<akpm> |
rdunlap: yes
|
|
<akpm> |
per queue
|
|
<wli> |
I'd like to have runtime-selectable by queue
|
|
<akpm> |
o Anticipatory scheduler. Working OK now, still has problems with seeky
|
|
<akpm> |
OLTP-style loads.
|
|
<akpm> |
AS is still under development a bit. So the selectable queue feature is really the only way in whcih we can acceptably merge it.
|
|
<akpm> |
But I think we need to push ahead with this regardless because it makes such a difference with most things.
|
|
<oxymoron> |
Has there been any thought towards automatically shrinking the 'anticipation window' when it's not working?
|
|
<akpm> |
We'll never be better than deadline for databases though.
|
|
<wli> |
akpm: is the TCQ issue resolved?
|
|
<akpm> |
oxymoron: we do lots and lots of things like that.
|
|
<wli> |
(performance issue)
|
|
<akpm> |
wli: it's getting better, but no, not fully.
|
|
<Alan> |
akpm: the selectable stuff is important also because we have controllers that want to turn off smarts in the OS, or which interact badly because they do their own prediction
|
|
<hch> |
yupp s390 dasd tries to select a noop scheduler
|
|
<akpm> |
Alan: yup. I think the driver makes that decision at present, but being able to set the noop elevator at runtime will be possible
|
|
<hch> |
in 2.4 they still need some patches for that
|
|
<hch> |
the noop elevator is totally broken in 2.5 IIRC
|
|
<akpm> |
hch: hm, what driver uses it?
|
|
<hch> |
akpm: S/390 dasd
|
|
<hch> |
in fact it doesn't use it anymore in 2.5 because of thaty brokenness
|
|
<akpm> |
hch: OK. I'll note that
|
|
<davej> |
does 2.5 s390 even work right now?
|
|
<Alan> |
aacraid will want to use it too from timings I've done with 2.4
|
|
<Alan> |
and I suspect other raid
|
|
<hanna> |
davej, I believe so
|
|
<hch> |
davej: they posted a big patchkit some time ago
|
|
<davej> |
apart from the s390x ->
s390 merge, I don't recall seeing much recently
|
|
<hch> |
davej: I think Linus dropped it as usual
|
|
<gerrit> |
davej: s390 is staying up to date, but right, not all patches are in tree.
|
|
<akpm> |
hch: no, they got consolidated
|
|
<hch> |
akpm: that was the patchkit before the ignored one :)
|
|
<akpm> |
hch ;)
|
|
<gerrit> |
akpm: s390 shouldn't hold 2.6.0. They are aware of the closedown but focused on new products with 2.4 at the moment
|
|
<akpm> |
next up: the qlogic disas^Wsaga
|
|
tytso (~tytso@dsl092-109-027.nyc2.dsl.speakeasy.net) has joined channel #lse
|
|
<akpm> |
we seem to have about six drivers, none of which work.
|
|
<akpm> |
hch: what's the plan there?
|
|
<Jes> |
akpm, scsi or fc?
|
|
<akpm> |
Jes: both
|
|
<hch> |
akpm: my suggestion would be to merge and fixup the feral driver
|
|
<akpm> |
hch: OK. What's this new driver which jejb is playing with?
|
|
<hch> |
akpm: yes
|
|
<Jes> |
jejb and I are working on qla1280, but jejb is also playing with the feral
|
|
<hch> |
he's also playing with the fc-only qlogic driver, too
|
|
<hch> |
and qla1280 together with yes
|
|
<hch> |
the feral driver has the advantage that it covers all qlogic hbas
|
|
pryzbyj (~pryzbyj@24.59.52.42) has joined channel #lse
|
|
<hch> |
scsi/fc and pci/sbus
|
|
<akpm> |
hch: mjacob has plans to do some significant restructuring of the driver
|
|
<hch> |
akpm: yes
|
|
<hch> |
akpm: that's what I meant
|
|
<akpm> |
sounds like stuff is happening anyway
|
|
<wli> |
hch: should we be concerned about the size of the BSD emulation layer?
|
|
arnd (arnd@80.138.129.137) has joined channel #lse
* hch kicks wli
|
|
<wli> |
I'll take that as a "no".
|
|
<hch> |
wli: read the code
|
|
<akpm> |
next up?
|
|
<akpm> |
o cryptoloop: jmorris: There's no cryptoloop in the 2.4 mainline kernel,
|
|
<akpm> |
but I think every distro ships some version. It would probably be useful
|
|
<akpm> |
to have crypto natively supported in 2.6, with backward compatibility for
|
|
<akpm> |
the majority of 2.4 users.
|
|
patman (~patman@bi01p1.co.us.ibm.com) has joined channel #lse
* akpm looks for a loop maintainer
|
|
<viro> |
akpm: -ENOENT or axboe
|
|
keith (~keith@32.97.110.142) has joined channel #lse
|
|
<davej> |
akpm: ISTR hugh was looking at loop stuff a while back with views to merging some new variant of the loop code
|
|
<hch> |
Adam Richter also has a loop rewrite
|
|
<akpm> |
davej: several people play with loop.
|
|
<viro> |
hch: er
|
|
<hch> |
and one of those cryptoloop guys
|
|
<hch> |
neither of them looks sane
|
|
<viro> |
hch: I'd seen his loop rewrites - quite a few of them
|
|
<viro> |
hch: _not_ a happy sight
|
|
<davej> |
any sane bits in any of the 15 rewrites going on ?
|
|
<oxymoron> |
Might make sense to merge distro version post 2.6.0?
|
|
<akpm> |
davej: they're all too damn big
|
|
<davej> |
ick
|
|
<akpm> |
loop needs a little restructuring to use the crypto api
|
|
<akpm> |
basically move it away from virtual addresses, use page/offset instead.
|
|
<hch> |
s/ to use the crypto api// :)
|
|
<oxymoron> |
What's become of the deadlock issues?
|
|
<akpm> |
oxymoron: there are a few memory-pressure related nasties. That's something I keep meaning to look at. But it'll be another bandaid I expect
|
|
<oxymoron> |
akpm: Mostly fixable with mempool stuff, I presume..
|
|
<akpm> |
If we had someone to work it seriously then forking off a loop-ng for a while may be appropriate
|
|
<riel> |
yes, any crypto memory pressure deadlocks should be fixable with mempool
|
|
<riel> |
memory usage is predictable
|
|
<akpm> |
but until someone signs up to this, it ain't going anywhere.
|
|
<riel> |
point
|
|
<oxymoron> |
I personally think a better way to do crypto here is to be able to shim it in between arbitrary block devices - thinner, less complicated.
|
|
obiwan (~hussein@61.6.137.45) has left channel #lse: Client Exiting
|
|
<hch> |
oxymoron: *nod*
|
|
<Alan> |
oxym: probably ->
2.7
|
|
<akpm> |
next up?
|
|
<oxymoron> |
Next..
|
|
<hch> |
mixup of loop on bdev and loop on file is one of the reasons for really messy cruft in loop.c
|
|
<akpm> |
drivers/md/
|
|
<akpm> |
o ioctl interface cleanup patch is ready (redo the structure layouts)
|
|
<akpm> |
o A port of the 2.4 snapshot target is in progress
|
|
<viro> |
hch: there's an elegant way to deal with that, but that's for #kernel
|
|
<thornber> |
patchset will be on it's way for RFC
|
|
<akpm> |
this is all happening - not much to say really. Unless someone has something to add?
|
|
<thornber> |
1 outstanding problem...
|
|
<hch> |
akpm: dm is doing some really nasty stuff in the ioctl code
|
|
<thornber> |
table reload needs to be split into load/commit
|
|
<hch> |
e.g. tried to give the user a bdev->bd_count value
|
|
<thornber> |
so no allocs occur while the dev is suspended
|
|
<viro> |
hch: seconded
|
|
<hch> |
which is of course completly meaningless once it returned
|
|
<hch> |
and other stuff like that
|
|
<thornber> |
hch: agreed
|
|
<hch> |
I think there needs to be a new ioctl protocol revision to fix that stuff before 2.6.0
|
|
<thornber> |
y, there is one coming
|
|
<viro> |
hch: moreover, they do rather messy lookup by textual representation of major:minor (sic)
|
|
<akpm> |
hch, viro: the md developer are sitting on a ton of patches.
|
|
<hch> |
thornber: what happened to the filesystem interface, btw?
|
|
<hch> |
thornber: iirc that was a must deliver for merging dm
|
|
<thornber> |
I wrote one, noone liked it, gregkh volunteered to do another
|
|
<thornber> |
still watiing ..
|
|
<viro> |
thornber, gregkh: Cc patches to me, I can help with that
|
|
<gregkh> |
thornber: mine is no where completed, I haven't touched it in a long time.
|
|
<gregkh> |
thornber: -ENOTIME
|
|
<hch> |
viro: yupp, that open underlying stuff by dev_t stuff has to go
|
|
<thornber> |
the interface is a seperate module to core dm
|
|
<viro> |
thornber: -> #kernel
|
|
<hch> |
I already pointed that out to the sistina folks when doing open_bdev_excl()
|
|
<gregkh> |
thornber: exactly, I didn't touch the core dm stuff almost at all.
|
|
<viro> |
thornber: and private email
|
|
bongani (~bongani@196.30.125.234) has joined channel #lse
|
|
andmike (~andmike@32.97.110.142) has joined channel #lse
|
|
<akpm> |
thornber: when do you expect to publish the current patchset?
|
|
<thornber> |
I can push something tomorrow
|
|
<akpm> |
thornber: ok thanks, that should be fun
|
|
<akpm> |
anyone have anything else on devicemapper?
|
|
<akpm> |
fs/
|
|
<Alan> |
akpm: other than being sure we get the block splitting to redo ideraid/dm with it no
|
|
<akpm> |
Alan: yup, we covered that last week. Jens and neilb are doing things, and Arjan seems to be OK with the plans
|
|
<akpm> |
o ext3 lock_kernel() removal: that part works OK and is mergeable. But
|
|
<akpm> |
we'll also need to make lock_journal() a spinlock, and that's deep surgery.
|
|
<akpm> |
this is a sore point
|
|
<akpm> |
One of our main journalling filesystems rather sucks on SMP.
|
|
<akpm> |
Alex Tomas has a humongous patch. I need to review it.
|
|
<wli> |
I've not seen many that do well, largely due to PAE interactions.
|
|
<akpm> |
wli: ?
|
|
madd (~dominant@210.186.172.12) has joined channel #lse
|
|
<viro> |
akpm: humongous patch ->
return to sender, ask to split, review results of split...
|
|
<wli> |
akpm: heavy usage of vmalloc() OOM's xfs quickly, reiserfs tends to create a lot of bh's for some reason, jfs has an analogous global semaphore.
|
|
<hch> |
wli: where do you see 'heavy use of vmalloc' ?
|
|
<wli> |
hch: er, sorry, vmap()
|
|
<hch> |
wli: again, where do you see heavy use?
|
|
<hch> |
wli: biggest user is log recovery..
|
|
<wli> |
hch: I got OOM's; I assumed they were from that since I didn't see bh's.
|
|
W0rf (~worf@193.171.247.56) has joined channel #lse
|
|
<wli> |
hch: log recovery is something else. Wasn't that then.
|
|
<gerrit> |
akpm: you've seen dave hansen's testing results from that patch?
|
|
<akpm> |
gerrit: not that I recall
|
|
<gerrit> |
akpm: he and mingming can do more stress testing as an adjunct to additional patch review
|
|
<gerrit> |
akpm: lots of testing, no failures. I'll have him resent the details
|
|
<akpm> |
gerrit: there are races which the eyeball detects - it needs careful review.
|
|
<akpm> |
next up
|
|
<akpm> |
o ext3 and ext2 block allocators have serious failure modes - interleaved
|
|
<akpm> |
allocations.
|
|
<akpm> |
in ext3 this is related.
|
|
<akpm> |
sct has some plans wrt prealloc for ext3
|
|
<akpm> |
we'll get there if we have time.
|
|
<akpm> |
o Integrate Chris Mason's 2.4 reiserfs ordered data and data journaling
|
|
<akpm> |
patches. They make reiserfs a lot safer.
|
|
<oxymoron> |
Actual failure or pessimal performance?
|
|
<akpm> |
oxymoron: awful layout
|
|
<viro> |
akpm: unrelated to 2.5/2.6 boundary
|
|
<viro> |
akpm: such stuff is fs-local and can go at any point once it gets sufficient review/beating
|
|
<akpm> |
viro: yup, a lot of the wishlist features are like that
|
|
<akpm> |
Chris made noises about getting the ordered-data code going by OLS-time.
|
|
<akpm> |
next, nfs client
|
|
<plars> |
akpm: any plans to separate the ones critical for 2.6 from the ones that aren't?
|
|
ddaM (~dominant@210.186.172.65) has quit: Ping timeout: 492 seconds
|
|
<akpm> |
plars: OK, I'll go through and take a shot at prioritising them
|
|
<viro> |
akpm: add sysctl handling cleanup/fixes to the list, while you are at it.
|
|
janetinc (~janetinc@bi01p1.co.us.ibm.com) has quit: Remote host closed the connection
|
|
<akpm> |
viro: which ones?
|
|
<viro> |
akpm: the thing's racy, but it's common for 2.2/2.4/2.5
|
|
<akpm> |
viro: you mean the core sysctl code?
|
|
<viro> |
akpm: yes
|
|
<akpm> |
ok, noted
|
|
<viro> |
akpm: and its interaction with procfs
|
|
<akpm> |
I don't see a lot to say about nfs client. It has active maintainership and stuff is happening.
|
|
<akpm> |
Anyone have anything to add?
|
|
<hch> |
cpumask_t
|
|
<hch> |
irq.c consolidation
|
|
<akpm> |
hch: wrt NFS client
|
|
<akpm> |
I'd very much like for something like Peter Braam's 'lookup with
|
|
<akpm> |
intent' or (better yet) for a proper dentry->open() to be integrated with
|
|
<akpm> |
path_walk()/open_namei(). I'm still working on the latter (Peter has
|
|
<akpm> |
already completed the lookup with intent stuff).
|
|
<davej> |
akpm: there was the mmap + truncate problem trond mentioned. still no fix for that afaik
|
|
<wli> |
hch: those should probably go toward the end since I think they aren't on the original agenda
|
|
<hch> |
akpm: sorry, misparsed
|
|
<dmc> |
davej: Paul McKenney is working on a fix.
|
|
<akpm> |
are we done with fs/ ?
|
|
<davej> |
dmc: thats trivial to repeat (at least here), so I'll be happy to test that when it shows up
|
|
<dmc> |
davej: it's also a race for local fs, though apparently less fatal in most cases.
|
|
_viro (~al@user-2ivf6b6.dialup.mindspring.com) has joined channel #lse
|
|
<akpm> |
davej: can you refresh our minds on what the problem was?
|
|
<davej> |
akpm: 'fsx dies very quickly'
|
|
<oxymoron> |
Wasn't this the one with the 'morton pages'?
|
|
jbrocklin (~joe@dynamic-204-011.natpool.uc.edu) has joined channel #lse
|
|
<davej> |
I suspect its the same reason my kernel compiles over nfs fail
|
|
<hch> |
akpm: please repost the lookup thing, viro is the person who probably cares for it most
|
|
<akpm> |
I'd very much like for something like Peter Braam's 'lookup with
|
|
<akpm> |
intent' or (better yet) for a proper dentry->open() to be integrated with
|
|
<akpm> |
path_walk()/open_namei(). I'm still working on the latter (Peter has
|
|
<akpm> |
already completed the lookup with intent stuff).
|
|
Numbex_L (~knoppix@128.195.31.157) has joined channel #lse
|
|
<dmc> |
oxy: I think it's the same, yeah.
|
|
<akpm> |
viro: that's from trond. Are you familiar with the lookup-with-intent stuff? Sound sane?
|
|
dmo (~dmo@208.186.192.194) has quit: Quit: Client Exiting
|
|
<akpm> |
hmm
|
|
dmo (~dmo@208.186.192.194) has joined channel #lse
|
|
<akpm> |
let's move on
|
|
<akpm> |
kernel/
|
|
<akpm> |
o rusty: Zippel's Reference count simplification. Tricky code, but cuts
|
|
<akpm> |
about 120 lines from module.c. Patch exists, needs stressing.
|
|
<akpm> |
o rusty: /proc/kallsyms. What most people really wanted from /proc/ksyms.
|
|
<akpm> |
Patch exists.
|
|
pryzbyj (~pryzbyj@24.59.52.42) has quit: Ping timeout: 488 seconds
|
|
<akpm> |
o rusty: Fix module-failed-init races by starting module "disabled". Patch
|
|
<akpm> |
exists, requires some subsystems (ie. add_partition) to explicitly say
|
|
<akpm> |
"make module live now". Without patch we are no worse off than 2.4 etc.
|
|
<akpm> |
that's all happening
|
|
<akpm> |
o Integrate userspace irq balancing daemon.
|
|
viro (~al@165.247.149.112) has quit: Ping timeout: 485 seconds
|
|
<wli> |
Does that actually require integration? I thought the API for userspace to set things was already in place.
|
|
<mbligh> |
Does it need any integreation? it works?
|
|
__viro (~al@user-2ivf6os.dialup.mindspring.com) has joined channel #lse
|
|
habanero (~habanero@192.35.232.241) has quit: Quit: Client Exiting
|
|
<oxymoron> |
wli: There was some question about racy proc interface..
|
|
<davej> |
there is the ongoing 'rip out inkernel balancer' argument I suppose
|
|
<dhansen> |
some people want to put it in the kernel tree
|
|
<akpm> |
wli: apparently. That seems to be a bit stalled.
|
|
<hch> |
dhansen: sounds stupid
|
|
<akpm> |
dhansen: yes, I'd like to see it in the main tree as a delivery mechanism
|
|
<wli> |
I'm not terribly happy with the organization of the surrounding APIC manipulation code but it's 100% organizational.
|
|
<hch> |
dhansen: it's not really tied more to the kernel than say runon
|
|
<wli> |
AFAIK all the functional issues are there.
|
|
<akpm> |
dhansen: so that people can bang on it, add support for new architectures, so it stays in-sync with the kernel etc
|
|
<wli> |
s/there/taken care of/
|
|
<mbligh> |
The only problem I can see is that there's some extra code size there, the config option can deal with that easily
|
|
<oxymoron> |
akpm: It the who klibc/modtools argument over again..
|
|
<hch> |
akpm: there's lots of userspace for which this agrument makes much more sense..
|
|
<akpm> |
hch: view it as an experiment. I think one of the reasons why people tend to put too much stuff in-kernel is that we have no convenient distribution channel for userspace/
|
|
<mingo> |
akpm: true. Performance-wise both solutions are equivalent.
* Alan agrees - solve the real problem
|
|
<hch> |
akpm: that's more an argument for a kernel-tools BK repo on kernel.org
|
|
<oxymoron> |
akpm: If it were something that more than 1% of users used, this might be a good test case..
|
|
_viro (~al@user-2ivf6b6.dialup.mindspring.com) has quit: Ping timeout: 492 seconds
|
|
<wli> |
akpm: I would say something very similar wrt. dhcp clients for nfsroot; it's critical to getting the system running (hell, in that case it can't boot at all).
|
|
<__viro> |
speaking of which, what's the situation with klibc merge?
|
|
<wli> |
(dhcp clients meant to be used from initramfs)
|
|
<Alan> |
wli: all the mainstream nfs root stuff uses initrd dhcp clients already - that works out
|
|
<akpm> |
wli: not really. dhcp client isn't the sort of thing which kernel hackers need to bang on regularly. IRQ balancing _is_.
|
|
<__viro> |
gregkh, IIRC, you were the last one to touch it
|
|
<oxymoron> |
viro: Conspicuously absent from must-fix?
|
|
<wli> |
akpm: hmm, I at least have enough diskless machines for it to matter but I'm not everyone I guess.
|
|
<gregkh> |
gregkh: I touched klibc last, yes.
|
|
<gregkh> |
__viro: I touched klibc last, yes.
|
|
__viro (~al@user-2ivf6os.dialup.mindspring.com) is now known as viro
|
|
<gregkh> |
__viro: but haven't looked at the merge in a while, sorry, Linus wanted some kernel code ported to userspace to use it before he would take it.
|
|
<oxymoron> |
akpm: But do they need to bang on it and the kernel code simultaneously?
|
|
<gregkh> |
viro: and I didn't have any readily available, so I moved on to other stuff.
|
|
<hch> |
do we need klibc for 2.6?
|
|
<viro> |
gregkh: -> #kernel, then
|
|
<viro> |
hch: IWBN
|
|
<akpm> |
oxymoron: probably
|
|
<gregkh> |
hch: I'd say yes, but no users of it have steped forward yet.
|
|
<gregkh> |
hch: so until then, I can't justify it.
|
|
<viro> |
gregkh: I have a bunch of such things
|
|
<gregkh> |
viro: great, let's talk later.
|
|
<akpm> |
wrt irqbalance, I guess we're waiting for someone who cares to actually do something with it.
|
|
<akpm> |
I don't see this as a showstopper really. People can boot with noirqbalance and download arjan's stuff.
|
|
<akpm> |
next?
|
|
<akpm> |
o kexec. Seems to work, is in -mm
|
|
<akpm> |
this seems fairly intrusive and late. I'm not sure how much pull it has.
|
|
<mbligh> |
is incredibly useful for those of us with 5 minute reboot times
|
|
<viro> |
akpm: IIRC, there were API issues with it. Had that been resolved?
|
|
Steph (~sglass@pixpat.austin.ibm.com) has joined channel #lse
|
|
Numbex_L (~knoppix@128.195.31.157) is now known as barryn
|
|
dardhal (~dardhal@213.0.201.144) has quit: Remote host closed the connection
|
|
<akpm> |
viro: Linus had opinions. I need to ping him, see if he's happy with it as-is.
|
|
<akpm> |
so no, not yet.
|
|
<viro> |
akpm: we have enough sys_too_fscking_ugly() already...
|
|
<phillips> |
akpm, I'll vote for it
|
|
<wli> |
Alan: some issues with boot PROM interactions I brought up elsewhere
|
|
<akpm> |
viro: do you have specific probs w/ kexec?
|
|
<akpm> |
mbligh: does it work on numaq?
|
|
<viro> |
akpm: I'll need to go through my old notes and current patch
|
|
<mbligh> |
akpm, I'll make it work when I get back from vacation
|
|
<akpm> |
next...
|
|
<akpm> |
o rmk: modules / /proc/kcore / vmalloc This needs sorting and testing to
|
|
<akpm> |
ensure that stuff like gdb vmlinux /proc/kcore works as expected. I
|
|
<akpm> |
believe this is the only show stopper preventing any ARM platform being
|
|
<akpm> |
built in Linus' kernel.
|
|
<akpm> |
all rmk's problems are small ones ;)
|
|
<akpm> |
o rmk: lib/inflate.c must not use static variables (causes these to be
|
|
<akpm> |
referenced via GOTOFF relocations in PIC decompressor. We have a PIC
|
|
<akpm> |
decompressor to avoid having to hard code a per platform zImage link
|
|
<akpm> |
address into the makefiles.)
|
|
<akpm> |
mm/
|
|
<akpm> |
o objrmap: concerns over page reclaim performance at high sharing levels,
|
|
<akpm> |
and interoperation with nonlinear mappings is hairy.
* akpm hides
|
|
<willy> |
the kcore stuff is a problem for ia64 too ... someone posted a proposal yesterday, iirc
|
|
<rdunlap> |
tony luck
|
|
<Alan> |
willy: has the question of some of the nonlinear strange mappings and non coherent cache stuff been resolved ?
|
|
<wli> |
akpm: I think that one is largely teaching it how to handle Morton pages in one way or another.
|
|
<phillips> |
objrmap should be _EXPERIMENTAL_
|
|
<willy> |
Alan: I don't think anyone's thought about it much. Nobody runs Oracle on PA/Linux.
|
|
<dmc> |
wli: mckenney's fix should solve that one.
|
|
<hch> |
phillips: you don't tell me you want it ifdefed, do you?
|
|
<phillips> |
hch, yup
|
|
Alan (~alan@213.105.254.86) has quit: Quit: gotta go
|
|
<wli> |
dmc: what's the current take on remap_file_pages()? I know there are ways to deal with it but am not sure what the preferred method is these days
|
|
<willy> |
thinking about it, there are definitely limitations on how you can map stuff in a coherent way on PA-RISC. The best way is probably just to require that things are only remapped as if we had 8MB pages
|
|
<dmc> |
wli: that one is solved for objrmap. It wasn't all that hairy, really.
|
|
<mbligh> |
akpm, I had some prelimary code to fix the high sharing stuff. I gave it to dmc, it's not finished though
|
|
<dmc> |
mbligh: I'm messing with it now.
|
|
<mbligh> |
cool
|
|
<akpm> |
but we're not at 2.6.10 yet ;)
|
|
jfk (~jfk@213.76.228.208) has joined channel #lse
|
|
<mbligh> |
;-)
|
|
<mbligh> |
it's really pretty small, actually
|
|
andmike (~andmike@32.97.110.142) has quit: Quit: Client exiting
|
|
<wli> |
I wasn't under the impression it was known whether range coalescing actually resolved the performance issues.
|
|
<mbligh> |
we could change scanning to work on a per-process RSS in 2.6.10
|
|
<oxymoron> |
Speaking of 2.6.10, have we heard from folks like Google on mm issues lately?
|
|
<riel> |
mbligh: LOL
|
|
<akpm> |
oxymoron: they're on 2.4.18
|
|
<riel> |
mbligh: besides, it doesn't combine with memory zones
|
|
<mbligh> |
riel, but that's a corner-case, right? ;-)
|
|
<akpm> |
o Reintroduce and make /proc/sys/vm/freepages writable again so that boxes can be tuned for heavy interrupt load.
|
|
<akpm> |
that's in progress
|
|
<akpm> |
we done with mm/?
|
|
<akpm> |
net/
|
|
<oxymoron> |
akpm: Got an appropriate place for my lost async write errors?
|
|
<akpm> |
oxymoron: noted
|
|
<riel> |
mbligh: it's a very sharp corner though, you don't want to bump into it
|
|
<akpm> |
also atomic i_size patches
|
|
<akpm> |
in net/ davem is cooking up MPLS support, which apparently IPSEC wants.
|
|
<oxymoron> |
Isn't that patented?
|
|
<akpm> |
o Sometimes we generate IP fragments when it truly isn't necessary.
|
|
<akpm> |
net/ is boring. It just works all the time.
|
|
<wli> |
heh
|
|
<akpm> |
net/*/netfilter/
|
|
<akpm> |
o Lots of misc. cleanups, which are happening slowly.
|
|
<mbligh> |
NAPI still sucks
|
|
<dhansen> |
mbligh: you think that from one test we did a year ago. Not exactly conclusive
|
|
<akpm> |
mbligh: it's mainly for routers I suspect.
|
|
<hch> |
yupp, SGI folks seem to have problems on big boxens with NAPI
|
|
alan (~alan@213.105.254.86) has joined channel #lse
|
|
<akpm> |
o davem: Netfilter needs to stop linearizing packets as much as possible.
|
|
<mbligh> |
akpm, would be nice if it didn't slow down machines dramatially that aren't flat out on interfaces
|
|
<akpm> |
rusty is working through this apparently
|
|
<hch> |
unfortunately I couldn't conviencing them of reporting it to the lists instead of cooking up half-backed workarounds
|
|
<mbligh> |
dhansen: I've seem similar stuff from others
|
|
<mbligh> |
hch, what did they do? up the interrupt latency?
|
|
<hch> |
mbligh: back out napi support from tg3.c :)
|
|
<akpm> |
next??
|
|
madd (~dominant@210.186.172.12) has quit: Ping timeout: 485 seconds
|
|
<akpm> |
arch/i386/
|
|
<akpm> |
o Also PC9800 merge needs finishing to the point we want for 2.6 (not all)
|
|
<wli> |
Alan: Are the parts we don't want the Kanji input support?
|
|
<hch> |
I haven't heard anything from the pc9800 folks for ages
|
|
<alan> |
akpm: not critical
|
|
<hch> |
wli: their console changes aren't mergeable
|
|
<wli> |
alan: it'd be nice if it built so API changes down there could get propagated
|
|
<alan> |
console is 2.7 stuff
|
|
<viro> |
alan: maybe
|
|
<wli> |
Are those the only bits that have to be left out or is there more that has to get cut?
|
|
<hch> |
wli: no idea
|
|
<alan> |
there are a few possible more merges
|
|
<hch> |
wli: without some feedback from the pc98 folks it's impossible to tell
|
|
<wli> |
Is Osamu Tomita the only contact point?
|
|
<davej> |
their floppy.c clone probably wants some of the recent fixes that went into floppy.c too come to think of it
|
|
<hch> |
wli: I think so
|
|
<alan> |
wli: i'll talk to osamu
|
|
<akpm> |
o ES7000 wants merging (now we are all happy with it). That shouldn't be a
|
|
<akpm> |
big problem.
|
|
<mbligh> |
seems to be in reasonable shape now
|
|
<viro> |
BTW, paride/p{g,t}.c desperately need cleanup similar to bock paride drivers
|
|
<viro> |
I can do that - it won't take much
|
|
<wli> |
I think that's pretty much bouncing it in the direction of the emperor penguin.
|
|
<alan> |
done for -ac e7000 seems fine
|
|
<akpm> |
ok. Anything else for arch/i386/?
|
|
<davej> |
PAT stuff maybe
|
|
<akpm> |
for agp?
|
|
<akpm> |
mtrr exhaustion?
|
|
<davej> |
thats my interest, but other drivers could use it too
|
|
<davej> |
the framebuffer ones spring to mind
|
|
<mbligh> |
akpm, can we merge the early printk stuff?
|
|
<wli> |
freitag's stuff suffices.
|
|
<akpm> |
mbligh: sounds like a good idea. ALl the patches I've seen have been crufty tho
|
|
<wli> |
even less suffices actually.
|
|
<wli> |
Shoving a console registration super-early appears to DTRT.
|
|
<wli> |
(an explicit one that is)
|
|
<akpm> |
wli: we'd need a vga driver for it
|
|
<akpm> |
next..
|
|
<wli> |
akpm: Possible. VGA is of low utility for that though.
|
|
<mbligh> |
wli, for most peope, VGA is what's needed.
|
|
<akpm> |
wli: is it? Diagnosing early lockup on PCs?
|
|
<mbligh> |
for me, I want serial.
|
|
<oxymoron> |
earlyconsole=foo?
|
|
<dhansen> |
serial probably needs early command-line parsing, which is the ugliest part of the early printk stuff
|
|
<wli> |
akpm: hmm, could do something for end-user lockups
|
|
<akpm> |
next is "global" things
|
|
<akpm> |
o 64-bit dev_t
|
|
<viro> |
akpm: BTW, one more item:
|
|
<akpm> |
viro: yup
|
|
<viro> |
akpm: cleaning up options-parsers in filesystems
|
|
<viro> |
akpm: patch exists, needs porting
|
|
<mbligh> |
dhansen: re command line, yeah ... I think just merging VGA would be a good first step.
|
|
<viro> |
akpm: infrastructure in there can help with the aforementioned command-line parsing stuff
|
|
<akpm> |
viro: table-driven?
|
|
<viro> |
akpm: more or less
|
|
<akpm> |
viro: yes, sorely needed
|
|
<akpm> |
64-bit dev_t we kinda covered last week. It is a showstopper, although not a 2.6.0 showstopper I guess.
|
|
<akpm> |
o We need a kernel side API for reporting error events to userspace (could
|
|
<akpm> |
be async to 2.6 itself)
|
|
<akpm> |
davem proposed that this be based off netlink
|
|
<akpm> |
he had a protpoatch which was simple.
|
|
<akpm> |
I'm not sure what the implications are for drivers etc.
|
|
<oxymoron> |
akpm: lots of printk ->
netlink work to make it useful.
|
|
<akpm> |
this appears to be the whole subsystem logging problem.
|
|
<viro> |
akpm: there will be problems with initialization order
|
|
<akpm> |
it seems too late and with too little momentum for anything radical to be happening
|
|
<rdunlap> |
oxymoron: and userspace binary interface to kernel (a la ioctl) -- or am i mistaken?
|
|
<davej> |
akpm: sounds like carrion grade stuff
|
|
<rdunlap> |
and enterprise
|
|
<wli> |
+
|
|
<oxymoron> |
rdunlap: No, I think it would channel through existing netlink if..
|
|
<akpm> |
davej: yup. But it's also an opportunity to clean things up and rationalise cruft, etc. But most of that is a 2.7 janitorial activity
|
|
<davej> |
enterprise logging.. to boldly go...
|
|
<davej> |
akpm: agreed.
|
|
<oxymoron> |
Well past 10/31..
|
|
<davej> |
akpm: though if the infrastructure is trivial and non-intrusive...
|
|
<akpm> |
davej: yes. Someone who cares needs to ping davem and try to move that forward
|
|
patman (~patman@bi01p1.co.us.ibm.com) has left channel #lse: Client Exiting
|
|
bongani (~bongani@196.30.125.234) has quit: Max SendQ exceeded
|
|
<akpm> |
next, a couple of kbuild things
|
|
<akpm> |
o Kai: Introduce a sane, easy and standard way to build external modules
|
|
<akpm> |
o Kai: Allow separate src/objdir
|
|
<akpm> |
these both sound important.
|
|
<akpm> |
kai does good work. It'll happen.
|
|
<davej> |
'nice to have', can go in after 2.6 IMO.
|
|
<akpm> |
yup
|
|
bongani (~bongani@196.30.125.234) has joined channel #lse
|
|
<akpm> |
o general confusion over firmware policy:
|
|
<akpm> |
o do we mandate that it be uploaded from userspace?
|
|
<akpm> |
o Is binary-blob-in-kernel-image OK?
|
|
<akpm> |
o Each driver (wireless, scsi, etc) seems to do it in a different,
|
|
<akpm> |
private manner.
|
|
<davej> |
I'll be surprised if the firmware stuff happens for 2.6
|
|
<gregkh> |
akpm: there's a patch on lkml that is looking good for this.
|
|
<akpm> |
I don't get 2.6 feelings when I see this discussed
|
|
<gregkh> |
akpm: as a way for drivers to standardise on how to do this.
|
|
<akpm> |
gregkh: ok. How much effort is it to port drivers?
|
|
<gregkh> |
akpm: drivers can be converted/moved to it later in 2.6.
|
|
<gregkh> |
akpm: doesn't look like much at all.
|
|
<oxymoron> |
davej: I don't think we need to fix all the broken instances now, but setting a firm policy for everything future would be nice.
|
|
<willy> |
binary blob in kernelspace is getting drivers taken out by debian, FWIW
|
|
<davej> |
oxymoron: true.
|
|
<gregkh> |
akpm: any new drivers should use it, existing ones can convert later.
|
|
<davej> |
willy: ouch.
* willy disapproves, but i don't maintain the kernel package
|
|
ueimor (~romieu@213.41.134.224) is now known as ueimor-away
|
|
<akpm> |
gregkh: nice. How does it work btw?
|
|
<gregkh> |
willy: I've disscused this with them in the past. That's there loss.
|
|
<gregkh> |
akpm: uses sysfs and /sbin/hotplug to notify userspace that firmware should be sent to the device.
|
|
<oxymoron> |
gregkh: Their hands are tied..
|
|
ndabney (~smurf@208.186.192.194) has quit: Quit: ircII EPIC4-1.1.11 -- Are we there yet?
|
|
<gregkh> |
oxymoron: hey, I've said for 3 years for someone to send me a patch, no one has. That shows how serious they are about this.
|
|
<akpm> |
drivers/acpi/
|
|
<gregkh> |
akpm: anyway, should go into the 2.5 tree soon.
|
|
<davej> |
willy: how about an extra kernel-image in non-free? 8-)
|
|
<akpm> |
Any general comments on the ACPI situation?
|
|
<gregkh> |
akpm: getting better, but still needs work.
|
|
<davej> |
borked on quite a few boxes here
|
|
<willy> |
akpm: As long as Linus is taking patches, seems under control
|
|
<hch> |
akpm: ia64 tree has some changes
|
|
<willy> |
hch: err.. in 2.5?
|
|
<hch> |
I wonder why david didn't feed them to grover yet
|
|
<hch> |
willy: yes
|
|
<davej> |
I have ~3 boxes that have the 'NIC stops getting packets' bug, and another which doesnt even boot with acpi
|
|
<hch> |
willy: not as massive as in 2.4 8)
|
|
<davej> |
theyre in bugzilla, hopefully grover et al will get to the bottom of them eventually
|
|
<oxymoron> |
Not sure if it's supposed to work on my T30..
|
|
<davej> |
oxymoron: it should at least make it bootable.
|
|
<willy> |
hch: well, what's in 2.4 is a different version than marcelo's tree. In 2.4, I'm just replacing the ACPI bits with fresh bits from Andy and everything's working fine
|
|
<mochel> |
it seems the acpi irq routing code could use a serious rewrite
|
|
<plars> |
bug #586 was the ACPI one we were seeing on some 8-ways but Andy fixed it recently
|
|
<mochel> |
it's completely flaky across different systems.
|
|
<oxymoron> |
davej: Boots, ACPI sticks directories in /proc rather than /proc/acpi..
|
|
<mochel> |
but, i don't know how plausible it is for 2.6
|
|
<wli> |
plars: Are the fixes for the SMI stuff and tables overflowing the boot-time virtually remapped area so ACPI can function properly on x440 merged yet?
|
|
<davej> |
its the only thing that bothers me for 2.6-test tbh
* akpm has bad premonitions about ACPI
|
|
<davej> |
its the only thing that stops boxes booting I've seen (apart from user error upgrading .configs from 2.4)
|
|
<mbligh> |
wli, ACPI works on 8-way but not 16, IIRC
|
|
<akpm> |
wli: an SMI fix was merged
|
|
<wli> |
SMI is okay
|
|
<oxymoron> |
I've spent several days trying to get APM or ACPI suspend working with X on my new laptop..
|
|
<wli> |
mbligh: don't be coy; the issue is a table overflowing the virtually remapped area, no?
|
|
<mochel> |
oxymoron: acpi suspend will not work
|
|
<mbligh> |
wli, check with jstultz
|
|
<akpm> |
mochel: why not?
|
|
<mbligh> |
not sure he knew, last time I spoke to him.
|
|
<mochel> |
akpm: that's the first thing we talked about :)
|
|
<oxymoron> |
mochel: Oh, the irq chatter?
|
|
<wli> |
okay worse comes to worse I'll debug it myself
|
|
<plars> |
wli: I don't even have a 440 to test one nowadays
|
|
<mochel> |
oxymoron: that, few drivers have suspend/resume methods implemented, the code is fragile, and in many places FITH
|
|
jstultz (~moog@pixpat.austin.ibm.com) has joined channel #lse
|
|
davej_ (~davej@80.194.74.10) has joined channel #lse
|
|
<wli> |
speaking of the devil
|
|
<jstultz> |
wli: i was summoned?
|
|
<wli> |
jstultz: what's the 16x x440 vs. ACPI issue? I thought it was a table overflowing a virtually mapped area
|
|
<akpm> |
mochel: memory fails me. Is ACPI suspend important, and can it be got going?
|
|
bongani (~bongani@196.30.125.234) has quit: Max SendQ exceeded
|
|
<jstultz> |
wli: 2.5 has the patch that fixes 16x x440 w/ ACPI. it was the clear-smi-fix patch
|
|
<mochel> |
akpm: important, but not critical.
|
|
<wli> |
jstultz: so I'm thinking of an earlier issue no longer present. Thanks.
|
|
<oxymoron> |
akpm: Laptops are much less useful without it..
|
|
<mochel> |
akpm: can be got going; working on it now..
|
|
<jstultz> |
wli: i can only hope. let me know if you still see problems.
|
|
<akpm> |
mochel: great, thanks.
|
|
bongani (~bongani@196.30.125.234) has joined channel #lse
|
|
<akpm> |
enough acpi?
|
|
<akpm> |
drivers/block/
|
|
<akpm> |
o Floppy is almost unusably buggy still
|
|
<barryn> |
floppy seems fine for me as a user
|
|
<rdunlap> |
me too, but alan referred to some corner case data corruption IIRC
|
|
<akpm> |
ho hum. Can a few more people please test their floppies?
|
|
<rdunlap> |
i will ..more
|
|
davej__ (~davej@81.86.107.140) has joined channel #lse
|
|
alan (~alan@213.105.254.86) has quit: Quit: EPIC - EOF from stdin
|
|
<akpm> |
drivers/char/
|
|
<akpm> |
o Alan: Multiple serious bugs in the DRI drivers (most now with patches
|
|
<akpm> |
thankfully). "The badness I know about is almost entirely IRQ mishandling.
|
|
<akpm> |
DRI failing to mask PCI irqs on exit paths."
|
|
<akpm> |
o Various suspect things in AGP.
|
|
<davej__> |
I've got a bunch of DRI patches queued up. They'll go this week
|
|
<davej__> |
AGP is 'getting there'
|
|
<davej__> |
backend is pretty much ok, frontend is (in linus' words - shit)
|
|
<akpm> |
ok, "in progress" is good, thanks.
|
|
<akpm> |
drivers/isdn/
|
|
<akpm> |
a bunch of things from kai there.
|
|
<akpm> |
drivers/net/
|
|
<hch> |
davej: backend still needs lots of work, too
|
|
<akpm> |
o davej: Either Wireless network drivers or PCMCIA broke somewhen. A
|
|
<akpm> |
configuration that worked fine under 2.4 doesn't receive any packets. Need
|
|
<akpm> |
to look into this more to make sure I don't have any misconfiguration that
|
|
<akpm> |
just 'happened to work' under 2.4
|
|
<davej__> |
hch: sure, but its lots better than it was
|
|
<davej__> |
hch: nothing worth holding up 2.6 for
|
|
<davej__> |
akpm: going to retry wireless stuff this week some time to be sure I didnt goof that up
|
|
<davej__> |
works fine under 2.4, breaks under 2.5 last I tried
|
|
<akpm> |
davej__: probably the drivers broke.
|
|
<davej__> |
could well be
|
|
<akpm> |
davej__: Jeremy Fitzhardinge here is looking at airo. Not pretty
|
|
<davej__> |
I've been diffing net drivers from 2.4 and auditting the changes, theres nothing amazing so far, but I've not got to wireless/ yet
* akpm discovers another arch/i386/ section
|
|
<akpm> |
o 2.5.x won't boot on some 440GX
|
|
<akpm> |
alan: Problem understood now, feasible fix in 2.4/2.4-ac. (440GX has two
|
|
<akpm> |
IRQ routers, we use the $PIR table with the PIIX, but the 440GX doesnt use
|
|
<akpm> |
the PIIX for its IRQ routing). Fall back to BIOS for 440GX works and Intel
|
|
<akpm> |
concurs.
|
|
<akpm> |
o 2.5.x doesn't handle VIA APIC right yet.
|
|
<akpm> |
1. We must write the PCI_INTERRUPT_LINE
|
|
<akpm> |
2. We have quirk handlers that seem to trash it.
|
|
<davej__> |
will fall out in the 2.4 fixes I'm accumulating
|
|
<davej__> |
unless _A_ beats me to it
|
|
<akpm> |
o ACPI needs the relax patches merging to work on lots of laptops
|
|
<akpm> |
o ECC driver questions are not yet sorted (DaveJ is working on this)
|
|
<davej__> |
ECC stuff is Dan Hollis
|
|
barryn (~knoppix@128.195.31.157) has left channel #lse
|
|
<davej__> |
I looked at it, and ran away
|
|
<riel> |
yeah, scary stuff ;)
|
|
<riel> |
unfortunately
|
|
<akpm> |
davej__: heh. I thought it was purely userspace?
|
|
ndabney (~ndabney@208.186.192.194) has joined channel #lse
|
|
<davej__> |
akpm: no, it does various pci jiggery pokery
|
|
<davej__> |
at the least it needs splitting into pieces
|
|
<davej__> |
but there are a lot of other bits in there that need cleaning/fixing too
|
|
<akpm> |
I had it going on my 480nx box, don't recall altering the kernel.
|
|
<davej__> |
post 2.6 driver addition perhaps.
|
|
<akpm> |
450nx
|
|
<akpm> |
yup
|
|
<akpm> |
arch/x86_64/
|
|
<davej__> |
acpi relax stuff - talk to Andi.
|
|
davej_ (~davej@80.194.74.10) has quit: Ping timeout: 485 seconds
|
|
<davej__> |
he did the ones for SuSE, so has a handle on that
|
|
<akpm> |
o time handling is broken. Need to move up 2.4 time.c code
|
|
<akpm> |
that reminds me.
|
|
<akpm> |
the gettimeofday-goes-backwards bug
|
|
<riel> |
is that TSCs getting out of sync ?
|
|
<davej__> |
vojtech solved a bunch of those recently for amd64, might be worth bugging him
|
|
<wli> |
akpm: how many minutes backward do you want it to go? I think I'm going about 3 minutes backward atm.
|
|
<davej__> |
theres still a bunch of places we fuck with the RTC without locking too
|
|
<davej__> |
which could explain it
|
|
<akpm> |
riel: apparently it happens when something blocks interrupts for a lot of ticks.
|
|
<akpm> |
riel: with HZ=1000 it got 10x worse
|
|
<davej__> |
ftape is the only one that springs to mind, but I think there are other drivers too
|
|
<riel> |
that might be a different thing, then
|
|
<oxymoron> |
I think SMM was mentioned as a culprit.
|
|
<akpm> |
davej__: SMM and ACPi and APM will do it I think.
|
|
<davej__> |
k
|
|
<akpm> |
davej__: basically out of our control
|
|
<davej__> |
well, SMM is. but we can trap ACPI/APM at least
|
|
<akpm> |
So david m-t did some work on that and I think people were OK with it.
|
|
<jstultz> |
akpm: are we getting reports outside of laptops seeing gtod going backwards on i386?
|
|
<akpm> |
It needs someone to pick it up and do the ia32 hooks, generally finish it off
|
|
<akpm> |
jstultz: I don't recall any reports at all actually
|
|
<jstultz> |
akpm: I believe the lost ticks code already there will compensate for a number of lost ticks (although once we get past a second or so, the low bits of hte TSC can wrap)
|
|
<akpm> |
jstultz: hm, I thought it was decided that lost-ticks didn't solve the problem.
|
|
<jstultz> |
akpm: well, the lost ticks patch was i386 specific..
|
|
<akpm> |
jstultz: have you been following david m-t's work?
|
|
<jstultz> |
akpm: i've been swamped this weekend. i apologize i've been unable to.
|
|
bongani (~bongani@196.30.125.234) has quit: Max SendQ exceeded
|
|
davej__ (~davej@81.86.107.140) is now known as davej_
|
|
viro (~al@user-2ivf6os.dialup.mindspring.com) has quit: Quit: Leaving
|
|
<jstultz> |
akpm: i'm not opposed to it, but I'm not sure how the i386 implementation would be done.
|
|
<akpm> |
jstultz: ah, OK. Please have a think about it anyway.
|
|
<akpm> |
next..
|
|
<akpm> |
x86-64: o need to coredump 64bit vsyscall code with dwarf2
|
|
<akpm> |
o move 64bit signal trampolines into vsyscall code and add dwarf2 for it.
|
|
<akpm> |
o describe kernel assembly with dwarf2 annotations for kgdb (currently
|
|
<akpm> |
waiting on some binutils changes for this)
|
|
<akpm> |
that's all happening
|
|
bongani (~bongani@196.30.125.234) has joined channel #lse
|
|
<akpm> |
arch/alpha/
|
|
<akpm> |
o rth: Ptrace writes are broken. This means we can't (reliably) set
|
|
<akpm> |
breakpoints or modify variables from gdb.
|
|
<akpm> |
arch/arm/
|
|
<akpm> |
o rmk: missing raw keyboard translation tables for all ARM machines.
|
|
<akpm> |
arch/others/
|
|
<akpm> |
o SH/SH-64 need resyncing, as do some other ports. No impact on
|
|
<akpm> |
mainstream platforms hopefully.
|
|
<akpm> |
sounds like all that is under control.
|
|
<akpm> |
that's the end of the list.
|
|
<hch> |
o IA64 needs merging, has impact on core code
|
|
<akpm> |
cpumask_t. Where's it at?
|
|
<davej_> |
what areas?
|
|
<hch> |
davej: all over the place
|
|
<davej_> |
ugh
|
|
<wli> |
akpm: ftp://ftp.kernel.org/pub/linux/kernel/people/wli/cpu/
|
|
<akpm> |
wli: bah. I meant what's its status?
|
|
<wli> |
akpm: been talking to i386 subarch maintainers about the stuff, jejb seems to be the only one currently responding, but updates for everything but pc9800 are all set up.
|
|
<akpm> |
wli: other architectures?
|
|
<hch> |
wli: pc98 doesn't compile in mainline, you don't have to care
|
|
<hch> |
akpm: I've done ppc32
|
|
<akpm> |
wli: what is the impact on small smp?
|
|
<wli> |
akpm: ia64 is being handled 100% by SGI, ppc64 is being handled by antonb
|
|
<wli> |
akpm: zero. It collapses back to identical to the original.
|
|
<hch> |
UP only because SMP is far from compiling
|
|
<hch> |
akpm: none
|
|
<akpm> |
wli: how important is it?
|
|
<hch> |
akpm: the boxens that needs this exist now
|
|
<hch> |
akpm: SGI has this patched into a 2.4 product tree
|
|
<wli> |
akpm: without it we knock 2-3 vendors who have been shipping such boxen for over 5 years out of 2.6 mainline
|
|
<akpm> |
wli: if the patch is applied and there's code which still does it the "old way", will that code fail to compile?
|
|
<phillips> |
<jstultz>
akpm: are we getting reports outside of laptops seeing gtod going backwards on i386? <- I hereby submit one (sorry for the lag)
|
|
<wli> |
akpm: no, the arithmetic is allowed to happen on the "narrow" cpumasks as normal.
|
|
<akpm> |
phillips: please send jstultz angry emails ;)
|
|
<phillips> |
roger
|
|
<akpm> |
wli: can it be arranged so that it breaks?
|
|
<wli> |
akpm: yes
|
|
<wli> |
akpm: either (1) by artificially increasing NR_CPUS or (2) by introducing extra wrapping around smaller cpumasks
|
|
<akpm> |
wli: it would be best I think.
|
|
<akpm> |
wli: catching bugs with big NR_CPUS would be acceptable
|
|
<wli> |
akpm: the codegen would change but only slightly
|
|
<akpm> |
wli: so how much remains to be done?
|
|
<wli> |
akpm: I checked and it's slightly different with the structure wrapper around it
|
|
<hch> |
akpm: the other arches need fixing
|
|
<wli> |
akpm: converting the rest of non-i386 after ppc32 and ia64.
|
|
<wli> |
akpm: with a big bag of cross-compilers I could do it myself in < 24 hours.
|
|
<wli> |
akpm: assembling that array of cross-compilers will take longer than that of course
|
|
<akpm> |
wli: minimal approach is just to rudely break things. But sending the maintainers a best-effort uncompiled patch would suit
|
|
<wli> |
akpm: that could be done in < 24 hours also.
|
|
<akpm> |
wli: OK, can you send me a patch when you think the time is right?
|
|
<wli> |
I anticipate a couple cycles of post and resend.
|
|
<wli> |
akpm: will do. diff vs. -mm in a few days?
|
|
<akpm> |
wli: sure. -mm diff is too damn big at present. I'm thinking of dropping keec.
|
|
<akpm> |
kexec
|
|
bongani (~bongani@196.30.125.234) has quit: Max SendQ exceeded
|
|
zaxl (alex@212.39.68.18) has quit: Quit: .
|
|
<akpm> |
hch: irq.c consolidation. Nice to have, but late.
|
|
<wli> |
akpm: minor nit: SGI wants some switchover to call-by-reference for certain things to automatically happen for large enough stuff. I'll have it out with the rest and it can bounce separately.
|
|
<hch> |
akpm: it's just code reshuffing
|
|
<hch> |
akpm: reshuffling
|
|
<hch> |
akpm: no actual code changes
|
|
<hch> |
akpm: e.g. it would have saved me from doing all the cpumask_t changes for PPC32
|
|
<akpm> |
wli: ouch
|
|
htj (~htj@213.237.17.105) has quit: Quit: Client exiting
|
|
bongani (~bongani@196.30.125.234) has joined channel #lse
|
|
<hch> |
Andrey Panin has patches
|
|
<wli> |
akpm: smaller than it sounds; I'll let ppl look
|
|
<hch> |
maybe you can ask him to resend against -mm
|
|
<wli> |
hch: IIRC ppc64 needs to do some arch-private stuff
|
|
<akpm> |
hch: I know. I almost sucked them up, but wimped out.
|
|
<oxymoron> |
wli: Several arches do.
|
|
<wli> |
hch: basically it has 2**24 interrupts and needs to be a bit sparse...
|
|
<hch> |
wli: for what?
|
|
<hanna> |
coming up on 2.5 hours...
|
|
<hch> |
wli: consolidated irq.c is a config option
|
|
<wli> |
hch: ah; no problem then.
|
|
<hch> |
wli: if the arch doesn't want it it doesn't have to
|
|
<wli> |
hch: they'll be fine.
|
|
<hch> |
wli: it's just to collect the 7 or 8 dupes
|
|
bongani (~bongani@196.30.125.234) has quit: Client Quit
|
|
<akpm> |
OK, I have one more thing here
|
|
<akpm> |
o aio: fs IO isn't async at present. suparna has restart patches, they're
|
|
<akpm> |
in -mm. Need to get Ben to review/comment.
|
|
<akpm> |
late, a bit intrusive, a bit messy, but AIO seems fairly pointless without them.
|
|
<akpm> |
or an equiv
|
|
<gerrit> |
akpm: agreed that we need them. we'll do whatever we need to clean, polish, measure, test...
|
|
<wli> |
is massive testing enough or do we need something deeper?
|
|
<gerrit> |
akpm: still needs a lot of perf work on it...
|
|
<akpm> |
wli: I'd like to see numbers from real workloads.
|
|
<oxymoron> |
There was talk about exploitable aio/O_DIRECT vs truncate race?
|
|
<akpm> |
wli: a: cvs co db2. b: hack
|
|
<wli> |
akpm: okay I know a couple of ppl are on that
|
|
<wli> |
akpm: the equivalent of that is already done IIRC, I've already seen numbers flying around internal to IBM
|
|
<akpm> |
oxymoron: yes, we need to fix that up. adding an AIO/DIO-vs-truncate rwsem will plug it simply enough
|
|
<hch> |
akpm: oracle supports aio
|
|
<hch> |
akpm: maybe get the oracle folks to do something useful and put some benchmarks up?
|
|
<akpm> |
hch: yup, that's happening
|
|
<akpm> |
there's also the aiopoll patch. Ben had concerns wrt its scalability and I have concerns wrt its testability.
|
|
<akpm> |
Does anyone have anything else?
|
|
<hch> |
akpm: I don't think it matters whether aio poll goes into 2.6.0 or 2.6.<n> |
|
|
<akpm> |
recursive spinlocks?
|
|
<hch> |
if it goes in at all
|
|
<akpm> |
hch: sure.
|
|
<gerrit> |
recursive spinlock: die die die
|
|
<akpm> |
I think we're done here folks. Thanks again.
|
|
<hanna> |
Looks like we are done! Thanks akpm and everyone else. Looks like we should be focusing on checking things off this list primarily.
|
|
rdunlap (~rddunlap@208.186.192.194) has quit: Quit: cooked
|
|
<gerrit> |
akpm: any chance you could post a +items added -items closed to the top of your list?
|
|
<akpm> |
hanna: I'll go through the late-features list and assign priorities to them
|
|
<willy> |
akpm: I sent a couple more to you ..
|
|
<hanna> |
akpm, I dont see a need to do this again do ou?
|
|
cliffman (~cliffw@208.186.192.194) has quit: Quit: Client exiting
|
|
<akpm> |
willy: I have those, thanks
|
|
<akpm> |
hanna: no I don't think so
|