|
|
Subscribe / Log in / New account

2.5.59-mm5

From:  Andrew Morton <akpm@digeo.com>
To:  linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject:  2.5.59-mm5
Date:  Thu, 23 Jan 2003 19:50:44 -0800


http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/

.  -mm3 and -mm4 were not announced - they were sync-up patches as we
  worked on the I/O scheduler.

.  -mm5 has the first cut of Nick Piggin's anticipatory I/O scheduler.
  Here's the scoop:

  The problem being addressed here is (mainly) kernel behaviour when there
  is a stream of writeout happening, and someone submits a read.

  In 2.4.x, the disk queues contain up to 30 megabytes of writes (say, one
  seconds's worth).  When a read is submitted the 2.4 I/O scheduler will try
  to insert that at the right place between the writes.  Usually, there is no
  right place and the read is appended to the queue.  That is: it will be
  serviced in one second.

  But the problem with reads is that they are dependent - neither the
  application nor the kernel can submit read #N until read #N-1 has
  completed.  So something as simple as

	cat /usr/src/linux/kernel/*.c > /dev/null

  requires several hundred dependent reads.  And in the presence of a
  streaming write, each and every one of those reads gets stuck at the end of
  the queue, and takes a second to propagate to the head.  The `cat' takes
  hundreds of seconds.

  The celebrated read-latency2 patch recognises the fact that appending a
  read to a tail of writes is dumb, and puts the read near the head of the
  queue of writes.  It provides an improvement of up to 30x.  The deadline
  I/O scheduler in 2.5 does the same thing: if reads are queued up, promote
  them past writes, even if those writes have been waiting longer.


  So far so good, but these fixes are still dumb.  Because we're solving
  the dependent read problem by creating a seek storm.  Every time someone
  submits a read, we stop writing, seek over and service the read, and then
  *immediately* seek back and start servicing writes again.

  But in the common case, the application which submitted a read is about
  to go and submit another one, closeby on-disk to the first.  So whoops, we
  have to seek back to service that one as well.


  So what anticipatory scheduling does is very simple: if an application
  has performed a read, do *nothing at all* for a few milliseconds.  Just
  return to userspace (or to the filesystem) in the expectation that the
  application or filesystem will quickly submit another read which is
  closeby.

  If the application _does_ submit the read then fine - we service that
  quickly.  If it does not submit a read then we lose.  Time out and go back
  to doing writes.

  The end result is a large reduction in seeking - decreased read latency,
  increased read bandwidth and increased write bandwidth.


  The code as-is has rough spots and still needs quite some work.  But it
  appears to be stable.  The test which I have concentrated on is "how long
  does my laptop take to compile util-linux when there is a continuous write
  happening".  On ext2, mounted noatime:

	2.4.20:                 538 seconds
	2.5.59:                 400 seconds
	2.5.59-mm5:             70 seconds
	No streaming write:     48 seconds

  A couple of VFS changes were needed as well.

  More details on anticipatory scheduling may be found at

	http://www.cs.rice.edu/~ssiyer/r/antsched/




Changes since 2.5.59-mm2:


+preempt-locking.patch

 Speed up the smp preempt locking.

+ext2-allocation-failure-fix.patch

 ext2 ENOSPC crash fix

+ext2_new_block-fixes.patch

 ext2 cleanups

+hangcheck-timer.patch

 A form of software watchdog

+slab-irq-fix.patch

 Fix a BUG() in slab when memory exhaustion happens at a bad time.

+sendfile-security-hooks.patch

 Reinstate lost security hooks around sendfile()

+buffer-io-accounting.patch

 Fix IO-wait acounting

+aic79xx-linux-2.5.59-20030122.patch

 aic7xxx driver update

+topology-remove-underbars.patch

 cleanup

+mandlock-oops-fix.patch

 file locking fix

+reiserfs_file_write.patch

 reworked reiserfs write code.

-exit_mmap-fix2.patch

 Dropped

+generic_file_readonly_mmap-fix.patch

 Fix MAP_PRIVATE mmaps for filesystems which don't support ->writepage()

+seq_file-page-defn.patch

 Compile fix

+exit_mmap-fix-ppc64.patch
+exit_mmap-ia64-fix.patch

 Fix the exit_mmap() problem in arch code.

+show_task-fix.patch

 Fix oops in show_task()

+scsi-iothread.patch

 software suspend fix

+numaq-ioapic-fix2.patch

 NUMAQ stuff

+misc.patch

 Random fixes

+writeback-sync-cleanup.patch

 remove some junk from fs-writeback.c

+dont-wait-on-inode.patch

 Fix large delays in the writeback path

+unlink-latency-fix.patch

 Fix large delays in unlink()

+anticipatory_io_scheduling-2_5_59-mm3.patch

 Anticipatory scheduling implementation



All 65 patches:


kgdb.patch

devfs-fix.patch

deadline-np-42.patch
  (undescribed patch)

deadline-np-43.patch
  (undescribed patch)

setuid-exec-no-lock_kernel.patch
  remove lock_kernel() from exec of setuid apps

buffer-debug.patch
  buffer.c debugging

warn-null-wakeup.patch

reiserfs-readpages.patch
  reiserfs v3 readpages support

fadvise.patch
  implement posix_fadvise64()

ext3-scheduling-storm.patch
  ext3: fix scheduling storm and lockups

auto-unplug.patch
  self-unplugging request queues

less-unplugging.patch
  Remove most of the blk_run_queues() calls

lockless-current_kernel_time.patch
  Lockless current_kernel_timer()

scheduler-tunables.patch
  scheduler tunables

htlb-2.patch
  hugetlb: fix MAP_FIXED handling

kirq.patch

kirq-up-fix.patch
  Subject: Re: 2.5.59-mm1

ext3-truncate-ordered-pages.patch
  ext3: explicitly free truncated pages

prune-icache-stats.patch
  add stats for page reclaim via inode freeing

vma-file-merge.patch

mmap-whitespace.patch

read_cache_pages-cleanup.patch
  cleanup in read_cache_pages()

remove-GFP_HIGHIO.patch
  remove __GFP_HIGHIO

quota-lockfix.patch
  quota locking fix

quota-offsem.patch
  quota semaphore fix

oprofile-p4.patch

oprofile_cpu-as-string.patch
  oprofile cpu-as-string

preempt-locking.patch
  Subject: spinlock efficiency problem [was 2.5.57 IO slowdown with CONFIG_PREEMPT enabled)

wli-11_pgd_ctor.patch
  (undescribed patch)

wli-11_pgd_ctor-update.patch
  pgd_ctor update

stack-overflow-fix.patch
  stack overflow checking fix

ext2-allocation-failure-fix.patch
  Subject: [PATCH] ext2 allocation failures

ext2_new_block-fixes.patch
  ext2_new_block cleanups and fixes

hangcheck-timer.patch
  hangcheck-timer

slab-irq-fix.patch
  slab IRQ fix

Richard_Henderson_for_President.patch
  Subject: [PATCH] Richard Henderson for President!

parenthesise-pgd_index.patch
  Subject: i386 pgd_index() doesn't parenthesize its arg

sendfile-security-hooks.patch
  Subject: [RFC][PATCH] Restore LSM hook calls to sendfile

macro-double-eval-fix.patch
  Subject: Re: i386 pgd_index() doesn't parenthesize its arg

mmzone-parens.patch
  asm-i386/mmzone.h macro paren/eval fixes

blkdev-fixes.patch
  blkdev.h fixes

remove-will_become_orphaned_pgrp.patch
  remove will_become_orphaned_pgrp()

buffer-io-accounting.patch
  correct wait accounting in wait_on_buffer()

aic79xx-linux-2.5.59-20030122.patch
  aic7xxx update

MAX_IO_APICS-ifdef.patch
  MAX_IO_APICS #ifdef'd wrongly

dac960-error-retry.patch
  Subject: [PATCH] linux2.5.56 patch to DAC960 driver for error retry

topology-remove-underbars.patch
  Remove __ from topology macros

mandlock-oops-fix.patch
  ftruncate/truncate oopses with mandatory locking

put_user-warning-fix.patch
  Subject: Re: Linux 2.5.59

reiserfs_file_write.patch
  Subject: reiserfs file_write patch

vmlinux-fix.patch
  vmlinux fix

smalldevfs.patch
  smalldevfs

sound-firmware-load-fix.patch
  soundcore.c referenced non-existent errno variable

generic_file_readonly_mmap-fix.patch
  Fix generic_file_readonly_mmap()

seq_file-page-defn.patch
  Include <asm/page.h> in fs/seq_file.c, as it uses PAGE_SIZE

exit_mmap-fix-ppc64.patch

exit_mmap-ia64-fix.patch
  Fix ia64's 64bit->32bit app switching

show_task-fix.patch
  Subject: [PATCH] 2.5.59: show_task() oops

scsi-iothread.patch
  scsi_eh_* needs to run even during suspend

numaq-ioapic-fix2.patch
  NUMAQ io_apic programming fix

misc.patch
  misc fixes

writeback-sync-cleanup.patch

dont-wait-on-inode.patch

unlink-latency-fix.patch

anticipatory_io_scheduling-2_5_59-mm3.patch
  Subject: [PATCH] 2.5.59-mm3 antic io sched


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds