User: Password:
|
|
Subscribe / Log in / New account

[PATCH] make the pagecache lock irq-safe.

From:  Linux Kernel Mailing List <linux-kernel-AT-vger.kernel.org>
To:  bk-commits-head-AT-vger.kernel.org
Subject:  [PATCH] make the pagecache lock irq-safe.
Date:  Mon, 12 Apr 2004 20:10:41 +0000

ChangeSet 1.1928, 2004/04/12 13:10:41-07:00, akpm@osdl.org

	[PATCH] make the pagecache lock irq-safe.
	
	Intro to these patches:
	
	- Major surgery against the pagecache, radix-tree and writeback code.  This
	  work is to address the O_DIRECT-vs-buffered data exposure horrors which
	  we've been struggling with for months.
	
	  As a side-effect, 32 bytes are saved from struct inode and eight bytes
	  are removed from struct page.  At a cost of approximately 2.5 bits per page
	  in the radix tree nodes on 4k pagesize, assuming the pagecache is densely
	  populated.  Not all pages are pagecache; other pages gain the full 8 byte
	  saving.
	
	  This change will break any arch code which is using page->list and will
	  also break any arch code which is using page->lru of memory which was
	  obtained from slab.
	
	  The basic problem which we (mainly Daniel McNeil) have been struggling
	  with is in getting a really reliable fsync() across the page lists while
	  other processes are performing writeback against the same file.  It's like
	  juggling four bars of wet soap with your eyes shut while someone is
	  whacking you with a baseball bat.  Daniel pretty much has the problem
	  plugged but I suspect that's just because we don't have testcases to
	  trigger the remaining problems.  The complexity and additional locking
	  which those patches add is worrisome.
	
	  So the approach taken here is to remove the page lists altogether and
	  replace the list-based writeback and wait operations with in-order
	  radix-tree walks.
	
	  The radix-tree code has been enhanced to support "tagging" of pages, for
	  later searches for pages which have a particular tag set.  This means that
	  we can ask the radix tree code "find me the next 16 dirty pages starting at
	  pagecache index N" and it will do that in O(log64(N)) time.
	
	  This affects I/O scheduling potentially quite significantly.  It is no
	  longer the case that the kernel will submit pages for I/O in the order in
	  which the application dirtied them.  We instead submit them in file-offset
	  order all the time.
	
	  This is likely to be advantageous when applications are seeking all over
	  a large file randomly writing small amounts of data.  I haven't performed
	  much benchmarking, but tiobench random write throughput seems to be
	  increased by 30%.  Other tests appear to be unaltered.  dbench may have got
	  10-20% quicker, but it's variable.
	
	  There is one large file which everyone seeks all over randomly writing
	  small amounts of data: the blockdev mapping which caches filesystem
	  metadata.  The kernel's IO submission patterns for this are now ideal.
	
	
	  Because writeback and wait-for-writeback use a tree walk instead of a
	  list walk they are no longer livelockable.  This probably means that we no
	  longer need to hold i_sem across O_SYNC writes and perhaps fsync() and
	  fdatasync().  This may be beneficial for databases: multiple processes
	  writing and syncing different parts of the same file at the same time can
	  now all submit and wait upon writes to just their own little bit of the
	  file, so we can get a lot more data into the queues.
	
	  It is trivial to implement a part-file-fdatasync() as well, so
	  applications can say "sync the file from byte N to byte M", and multiple
	  applications can do this concurrently.  This is easy for ext2 filesystems,
	  but probably needs lots of work for data-journalled filesystems and XFS and
	  it probably doesn't offer much benefit over an i_semless O_SYNC write.
	
	
	  These patches can end up making ext3 (even) slower:
	
		for i in 1 2 3 4
		do
			dd if=/dev/zero of=$i bs=1M count=2000 &
		done          
	
	  runs awfully slow on SMP.  This is, yet again, because all the file
	  blocks are jumbled up and the per-file linear writeout causes tons of
	  seeking.  The above test runs sweetly on UP because the on UP we don't
	  allocate blocks to different files in parallel.
	
	  Mingming and Badari are working on getting block reservation working for
	  ext3 (preallocation on steroids).  That should fix ext3 up.
	
	
	This patch:
	
	- Later, we'll need to access the radix trees from inside disk I/O
	  completion handlers.  So make mapping->page_lock irq-safe.  And rename it
	  to tree_lock to reliably break any missed conversions.


# This patch includes the following deltas:
#	           ChangeSet	1.1927  -> 1.1928 
#	      fs/cifs/file.c	1.41    -> 1.42   
#	      mm/readahead.c	1.41    -> 1.42   
#	         mm/vmscan.c	1.199   -> 1.200  
#	       mm/swapfile.c	1.92    -> 1.93   
#	          fs/mpage.c	1.45    -> 1.46   
#	  include/linux/fs.h	1.299   -> 1.300  
#	        mm/filemap.c	1.230   -> 1.231  
#	     mm/swap_state.c	1.62    -> 1.63   
#	           ipc/shm.c	1.34    -> 1.35   
#	         fs/buffer.c	1.226   -> 1.227  
#	 mm/page-writeback.c	1.77    -> 1.78   
#	       mm/truncate.c	1.12    -> 1.13   
#	          fs/inode.c	1.115   -> 1.116  
#	   fs/fs-writeback.c	1.46    -> 1.47   
#

 fs/buffer.c         |    8 ++++----
 fs/cifs/file.c      |   10 +---------
 fs/fs-writeback.c   |    4 ++--
 fs/inode.c          |    2 +-
 fs/mpage.c          |   10 +++++-----
 include/linux/fs.h  |    2 +-
 ipc/shm.c           |    2 --
 mm/filemap.c        |   50 +++++++++++++++++++++++++-------------------------
 mm/page-writeback.c |   10 +++++-----
 mm/readahead.c      |    8 ++++----
 mm/swap_state.c     |   22 +++++++++++-----------
 mm/swapfile.c       |    8 ++++----
 mm/truncate.c       |    8 ++++----
 mm/vmscan.c         |   13 ++++---------
 14 files changed, 71 insertions(+), 86 deletions(-)


diff -Nru a/fs/buffer.c b/fs/buffer.c
--- a/fs/buffer.c	Tue Apr 13 01:32:22 2004
+++ b/fs/buffer.c	Tue Apr 13 01:32:22 2004
@@ -396,7 +396,7 @@
  * Hack idea: for the blockdev mapping, i_bufferlist_lock contention
  * may be quite high.  This code could TryLock the page, and if that
  * succeeds, there is no need to take private_lock. (But if
- * private_lock is contended then so is mapping->page_lock).
+ * private_lock is contended then so is mapping->tree_lock).
  */
 static struct buffer_head *
 __find_get_block_slow(struct block_device *bdev, sector_t block, int unused)
@@ -867,14 +867,14 @@
 	spin_unlock(&mapping->private_lock);
 
 	if (!TestSetPageDirty(page)) {
-		spin_lock(&mapping->page_lock);
+		spin_lock_irq(&mapping->tree_lock);
 		if (page->mapping) {	/* Race with truncate? */
 			if (!mapping->backing_dev_info->memory_backed)
 				inc_page_state(nr_dirty);
 			list_del(&page->list);
 			list_add(&page->list, &mapping->dirty_pages);
 		}
-		spin_unlock(&mapping->page_lock);
+		spin_unlock_irq(&mapping->tree_lock);
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 	}
 	
@@ -1254,7 +1254,7 @@
  * inode to its superblock's dirty inode list.
  *
  * mark_buffer_dirty() is atomic.  It takes bh->b_page->mapping->private_lock,
- * mapping->page_lock and the global inode_lock.
+ * mapping->tree_lock and the global inode_lock.
  */
 void fastcall mark_buffer_dirty(struct buffer_head *bh)
 {
diff -Nru a/fs/cifs/file.c b/fs/cifs/file.c
--- a/fs/cifs/file.c	Tue Apr 13 01:32:22 2004
+++ b/fs/cifs/file.c	Tue Apr 13 01:32:22 2004
@@ -898,11 +898,9 @@
 		if(list_empty(pages))
 			break;
 
-		spin_lock(&mapping->page_lock);
 		page = list_entry(pages->prev, struct page, list);
 
 		list_del(&page->list);
-		spin_unlock(&mapping->page_lock);
 
 		if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) {
 			page_cache_release(page);
@@ -962,14 +960,10 @@
 	pagevec_init(&lru_pvec, 0);
 
 	for(i = 0;i<num_pages;) {
-		spin_lock(&mapping->page_lock);
-		if(list_empty(page_list)) {
-			spin_unlock(&mapping->page_lock);
+		if(list_empty(page_list))
 			break;
-		}
 		page = list_entry(page_list->prev, struct page, list);
 		offset = (loff_t)page->index << PAGE_CACHE_SHIFT;
-	        spin_unlock(&mapping->page_lock);
 
 		/* for reads over a certain size could initiate async read ahead */
 
@@ -989,12 +983,10 @@
 			cFYI(1,("Read error in readpages: %d",rc));
 			/* clean up remaing pages off list */
             
-			spin_lock(&mapping->page_lock);
 			while (!list_empty(page_list) && (i < num_pages)) {
 				page = list_entry(page_list->prev, struct page, list);
 				list_del(&page->list);
 			}
-			spin_unlock(&mapping->page_lock);
 			break;
 		} else if (bytes_read > 0) {
 			pSMBr = (struct smb_com_read_rsp *)smb_read_data;
diff -Nru a/fs/fs-writeback.c b/fs/fs-writeback.c
--- a/fs/fs-writeback.c	Tue Apr 13 01:32:22 2004
+++ b/fs/fs-writeback.c	Tue Apr 13 01:32:22 2004
@@ -159,10 +159,10 @@
 	 * read speculatively by this cpu before &= ~I_DIRTY  -- mikulas
 	 */
 
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	if (wait || !wbc->for_kupdate || list_empty(&mapping->io_pages))
 		list_splice_init(&mapping->dirty_pages, &mapping->io_pages);
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 	spin_unlock(&inode_lock);
 
 	ret = do_writepages(mapping, wbc);
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c	Tue Apr 13 01:32:22 2004
+++ b/fs/inode.c	Tue Apr 13 01:32:22 2004
@@ -187,7 +187,7 @@
 	sema_init(&inode->i_sem, 1);
 	init_rwsem(&inode->i_alloc_sem);
 	INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
-	spin_lock_init(&inode->i_data.page_lock);
+	spin_lock_init(&inode->i_data.tree_lock);
 	init_MUTEX(&inode->i_data.i_shared_sem);
 	atomic_set(&inode->i_data.truncate_count, 0);
 	INIT_LIST_HEAD(&inode->i_data.private_list);
diff -Nru a/fs/mpage.c b/fs/mpage.c
--- a/fs/mpage.c	Tue Apr 13 01:32:22 2004
+++ b/fs/mpage.c	Tue Apr 13 01:32:22 2004
@@ -635,7 +635,7 @@
 	if (get_block == NULL)
 		writepage = mapping->a_ops->writepage;
 
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	while (!list_empty(&mapping->io_pages) && !done) {
 		struct page *page = list_entry(mapping->io_pages.prev,
 					struct page, list);
@@ -655,10 +655,10 @@
 		list_add(&page->list, &mapping->locked_pages);
 
 		page_cache_get(page);
-		spin_unlock(&mapping->page_lock);
+		spin_unlock_irq(&mapping->tree_lock);
 
 		/*
-		 * At this point we hold neither mapping->page_lock nor
+		 * At this point we hold neither mapping->tree_lock nor
 		 * lock on the page itself: the page may be truncated or
 		 * invalidated (changing page->mapping to NULL), or even
 		 * swizzled back from swapper_space to tmpfs file mapping.
@@ -695,12 +695,12 @@
 			unlock_page(page);
 		}
 		page_cache_release(page);
-		spin_lock(&mapping->page_lock);
+		spin_lock_irq(&mapping->tree_lock);
 	}
 	/*
 	 * Leave any remaining dirty pages on ->io_pages
 	 */
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 	if (bio)
 		mpage_bio_submit(WRITE, bio);
 	return ret;
diff -Nru a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h	Tue Apr 13 01:32:22 2004
+++ b/include/linux/fs.h	Tue Apr 13 01:32:22 2004
@@ -322,7 +322,7 @@
 struct address_space {
 	struct inode		*host;		/* owner: inode, block_device */
 	struct radix_tree_root	page_tree;	/* radix tree of all pages */
-	spinlock_t		page_lock;	/* and spinlock protecting it */
+	spinlock_t		tree_lock;	/* and spinlock protecting it */
 	struct list_head	clean_pages;	/* list of clean pages */
 	struct list_head	dirty_pages;	/* list of dirty pages */
 	struct list_head	locked_pages;	/* list of locked pages */
diff -Nru a/ipc/shm.c b/ipc/shm.c
--- a/ipc/shm.c	Tue Apr 13 01:32:22 2004
+++ b/ipc/shm.c	Tue Apr 13 01:32:22 2004
@@ -380,9 +380,7 @@
 
 		if (is_file_hugepages(shp->shm_file)) {
 			struct address_space *mapping = inode->i_mapping;
-			spin_lock(&mapping->page_lock);
 			*rss += (HPAGE_SIZE/PAGE_SIZE)*mapping->nrpages;
-			spin_unlock(&mapping->page_lock);
 		} else {
 			struct shmem_inode_info *info = SHMEM_I(inode);
 			spin_lock(&info->lock);
diff -Nru a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c	Tue Apr 13 01:32:22 2004
+++ b/mm/filemap.c	Tue Apr 13 01:32:22 2004
@@ -59,7 +59,7 @@
  *    ->private_lock		(__free_pte->__set_page_dirty_buffers)
  *      ->swap_list_lock
  *        ->swap_device_lock	(exclusive_swap_page, others)
- *          ->mapping->page_lock
+ *          ->mapping->tree_lock
  *
  *  ->i_sem
  *    ->i_shared_sem		(truncate->invalidate_mmap_range)
@@ -78,12 +78,12 @@
  *
  *  ->inode_lock
  *    ->sb_lock			(fs/fs-writeback.c)
- *    ->mapping->page_lock	(__sync_single_inode)
+ *    ->mapping->tree_lock	(__sync_single_inode)
  *
  *  ->page_table_lock
  *    ->swap_device_lock	(try_to_unmap_one)
  *    ->private_lock		(try_to_unmap_one)
- *    ->page_lock		(try_to_unmap_one)
+ *    ->tree_lock		(try_to_unmap_one)
  *    ->zone.lru_lock		(follow_page->mark_page_accessed)
  *
  *  ->task->proc_lock
@@ -93,7 +93,7 @@
 /*
  * Remove a page from the page cache and free it. Caller has to make
  * sure the page is locked and that nobody else uses it - or that usage
- * is safe.  The caller must hold a write_lock on the mapping's page_lock.
+ * is safe.  The caller must hold a write_lock on the mapping's tree_lock.
  */
 void __remove_from_page_cache(struct page *page)
 {
@@ -114,9 +114,9 @@
 	if (unlikely(!PageLocked(page)))
 		PAGE_BUG(page);
 
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	__remove_from_page_cache(page);
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 }
 
 static inline int sync_page(struct page *page)
@@ -148,9 +148,9 @@
 	if (mapping->backing_dev_info->memory_backed)
 		return 0;
 
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	list_splice_init(&mapping->dirty_pages, &mapping->io_pages);
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 	ret = do_writepages(mapping, &wbc);
 	return ret;
 }
@@ -185,7 +185,7 @@
 
 restart:
 	progress = 0;
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
         while (!list_empty(&mapping->locked_pages)) {
 		struct page *page;
 
@@ -199,7 +199,7 @@
 		if (!PageWriteback(page)) {
 			if (++progress > 32) {
 				if (need_resched()) {
-					spin_unlock(&mapping->page_lock);
+					spin_unlock_irq(&mapping->tree_lock);
 					__cond_resched();
 					goto restart;
 				}
@@ -209,16 +209,16 @@
 
 		progress = 0;
 		page_cache_get(page);
-		spin_unlock(&mapping->page_lock);
+		spin_unlock_irq(&mapping->tree_lock);
 
 		wait_on_page_writeback(page);
 		if (PageError(page))
 			ret = -EIO;
 
 		page_cache_release(page);
-		spin_lock(&mapping->page_lock);
+		spin_lock_irq(&mapping->tree_lock);
 	}
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 
 	/* Check for outstanding write errors */
 	if (test_and_clear_bit(AS_ENOSPC, &mapping->flags))
@@ -267,7 +267,7 @@
 
 	if (error == 0) {
 		page_cache_get(page);
-		spin_lock(&mapping->page_lock);
+		spin_lock_irq(&mapping->tree_lock);
 		error = radix_tree_insert(&mapping->page_tree, offset, page);
 		if (!error) {
 			SetPageLocked(page);
@@ -275,7 +275,7 @@
 		} else {
 			page_cache_release(page);
 		}
-		spin_unlock(&mapping->page_lock);
+		spin_unlock_irq(&mapping->tree_lock);
 		radix_tree_preload_end();
 	}
 	return error;
@@ -411,11 +411,11 @@
 	 * We scan the hash list read-only. Addition to and removal from
 	 * the hash-list needs a held write-lock.
 	 */
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	page = radix_tree_lookup(&mapping->page_tree, offset);
 	if (page)
 		page_cache_get(page);
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 	return page;
 }
 
@@ -428,11 +428,11 @@
 {
 	struct page *page;
 
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	page = radix_tree_lookup(&mapping->page_tree, offset);
 	if (page && TestSetPageLocked(page))
 		page = NULL;
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 	return page;
 }
 
@@ -454,15 +454,15 @@
 {
 	struct page *page;
 
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 repeat:
 	page = radix_tree_lookup(&mapping->page_tree, offset);
 	if (page) {
 		page_cache_get(page);
 		if (TestSetPageLocked(page)) {
-			spin_unlock(&mapping->page_lock);
+			spin_unlock_irq(&mapping->tree_lock);
 			lock_page(page);
-			spin_lock(&mapping->page_lock);
+			spin_lock_irq(&mapping->tree_lock);
 
 			/* Has the page been truncated while we slept? */
 			if (page->mapping != mapping || page->index != offset) {
@@ -472,7 +472,7 @@
 			}
 		}
 	}
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 	return page;
 }
 
@@ -546,12 +546,12 @@
 	unsigned int i;
 	unsigned int ret;
 
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	ret = radix_tree_gang_lookup(&mapping->page_tree,
 				(void **)pages, start, nr_pages);
 	for (i = 0; i < ret; i++)
 		page_cache_get(pages[i]);
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 	return ret;
 }
 
diff -Nru a/mm/page-writeback.c b/mm/page-writeback.c
--- a/mm/page-writeback.c	Tue Apr 13 01:32:22 2004
+++ b/mm/page-writeback.c	Tue Apr 13 01:32:22 2004
@@ -472,12 +472,12 @@
 	if (wait)
 		wait_on_page_writeback(page);
 
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	list_del(&page->list);
 	if (test_clear_page_dirty(page)) {
 		list_add(&page->list, &mapping->locked_pages);
 		page_cache_get(page);
-		spin_unlock(&mapping->page_lock);
+		spin_unlock_irq(&mapping->tree_lock);
 		ret = mapping->a_ops->writepage(page, &wbc);
 		if (ret == 0 && wait) {
 			wait_on_page_writeback(page);
@@ -487,7 +487,7 @@
 		page_cache_release(page);
 	} else {
 		list_add(&page->list, &mapping->clean_pages);
-		spin_unlock(&mapping->page_lock);
+		spin_unlock_irq(&mapping->tree_lock);
 		unlock_page(page);
 	}
 	return ret;
@@ -515,7 +515,7 @@
 		struct address_space *mapping = page->mapping;
 
 		if (mapping) {
-			spin_lock(&mapping->page_lock);
+			spin_lock_irq(&mapping->tree_lock);
 			if (page->mapping) {	/* Race with truncate? */
 				BUG_ON(page->mapping != mapping);
 				if (!mapping->backing_dev_info->memory_backed)
@@ -523,7 +523,7 @@
 				list_del(&page->list);
 				list_add(&page->list, &mapping->dirty_pages);
 			}
-			spin_unlock(&mapping->page_lock);
+			spin_unlock_irq(&mapping->tree_lock);
 			if (!PageSwapCache(page))
 				__mark_inode_dirty(mapping->host,
 							I_DIRTY_PAGES);
diff -Nru a/mm/readahead.c b/mm/readahead.c
--- a/mm/readahead.c	Tue Apr 13 01:32:22 2004
+++ b/mm/readahead.c	Tue Apr 13 01:32:22 2004
@@ -230,7 +230,7 @@
 	/*
 	 * Preallocate as many pages as we will need.
 	 */
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
 		unsigned long page_offset = offset + page_idx;
 		
@@ -241,16 +241,16 @@
 		if (page)
 			continue;
 
-		spin_unlock(&mapping->page_lock);
+		spin_unlock_irq(&mapping->tree_lock);
 		page = page_cache_alloc_cold(mapping);
-		spin_lock(&mapping->page_lock);
+		spin_lock_irq(&mapping->tree_lock);
 		if (!page)
 			break;
 		page->index = page_offset;
 		list_add(&page->list, &page_pool);
 		ret++;
 	}
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 
 	/*
 	 * Now start the IO.  We ignore I/O errors - if the page is not
diff -Nru a/mm/swap_state.c b/mm/swap_state.c
--- a/mm/swap_state.c	Tue Apr 13 01:32:22 2004
+++ b/mm/swap_state.c	Tue Apr 13 01:32:22 2004
@@ -25,7 +25,7 @@
 
 struct address_space swapper_space = {
 	.page_tree	= RADIX_TREE_INIT(GFP_ATOMIC),
-	.page_lock	= SPIN_LOCK_UNLOCKED,
+	.tree_lock	= SPIN_LOCK_UNLOCKED,
 	.clean_pages	= LIST_HEAD_INIT(swapper_space.clean_pages),
 	.dirty_pages	= LIST_HEAD_INIT(swapper_space.dirty_pages),
 	.io_pages	= LIST_HEAD_INIT(swapper_space.io_pages),
@@ -182,9 +182,9 @@
   
 	entry.val = page->index;
 
-	spin_lock(&swapper_space.page_lock);
+	spin_lock_irq(&swapper_space.tree_lock);
 	__delete_from_swap_cache(page);
-	spin_unlock(&swapper_space.page_lock);
+	spin_unlock_irq(&swapper_space.tree_lock);
 
 	swap_free(entry);
 	page_cache_release(page);
@@ -195,8 +195,8 @@
 	struct address_space *mapping = page->mapping;
 	int err;
 
-	spin_lock(&swapper_space.page_lock);
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&swapper_space.tree_lock);
+	spin_lock(&mapping->tree_lock);
 
 	err = radix_tree_insert(&swapper_space.page_tree, entry.val, page);
 	if (!err) {
@@ -204,8 +204,8 @@
 		___add_to_page_cache(page, &swapper_space, entry.val);
 	}
 
-	spin_unlock(&mapping->page_lock);
-	spin_unlock(&swapper_space.page_lock);
+	spin_unlock(&mapping->tree_lock);
+	spin_unlock_irq(&swapper_space.tree_lock);
 
 	if (!err) {
 		if (!swap_duplicate(entry))
@@ -231,8 +231,8 @@
 
 	entry.val = page->index;
 
-	spin_lock(&swapper_space.page_lock);
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&swapper_space.tree_lock);
+	spin_lock(&mapping->tree_lock);
 
 	err = radix_tree_insert(&mapping->page_tree, index, page);
 	if (!err) {
@@ -240,8 +240,8 @@
 		___add_to_page_cache(page, mapping, index);
 	}
 
-	spin_unlock(&mapping->page_lock);
-	spin_unlock(&swapper_space.page_lock);
+	spin_unlock(&mapping->tree_lock);
+	spin_unlock_irq(&swapper_space.tree_lock);
 
 	if (!err) {
 		swap_free(entry);
diff -Nru a/mm/swapfile.c b/mm/swapfile.c
--- a/mm/swapfile.c	Tue Apr 13 01:32:22 2004
+++ b/mm/swapfile.c	Tue Apr 13 01:32:22 2004
@@ -253,10 +253,10 @@
 		/* Is the only swap cache user the cache itself? */
 		if (p->swap_map[swp_offset(entry)] == 1) {
 			/* Recheck the page count with the pagecache lock held.. */
-			spin_lock(&swapper_space.page_lock);
+			spin_lock_irq(&swapper_space.tree_lock);
 			if (page_count(page) - !!PagePrivate(page) == 2)
 				retval = 1;
-			spin_unlock(&swapper_space.page_lock);
+			spin_unlock_irq(&swapper_space.tree_lock);
 		}
 		swap_info_put(p);
 	}
@@ -324,13 +324,13 @@
 	retval = 0;
 	if (p->swap_map[swp_offset(entry)] == 1) {
 		/* Recheck the page count with the pagecache lock held.. */
-		spin_lock(&swapper_space.page_lock);
+		spin_lock_irq(&swapper_space.tree_lock);
 		if ((page_count(page) == 2) && !PageWriteback(page)) {
 			__delete_from_swap_cache(page);
 			SetPageDirty(page);
 			retval = 1;
 		}
-		spin_unlock(&swapper_space.page_lock);
+		spin_unlock_irq(&swapper_space.tree_lock);
 	}
 	swap_info_put(p);
 
diff -Nru a/mm/truncate.c b/mm/truncate.c
--- a/mm/truncate.c	Tue Apr 13 01:32:22 2004
+++ b/mm/truncate.c	Tue Apr 13 01:32:22 2004
@@ -62,7 +62,7 @@
  * This is for invalidate_inode_pages().  That function can be called at
  * any time, and is not supposed to throw away dirty pages.  But pages can
  * be marked dirty at any time too.  So we re-check the dirtiness inside
- * ->page_lock.  That provides exclusion against the __set_page_dirty
+ * ->tree_lock.  That provides exclusion against the __set_page_dirty
  * functions.
  */
 static int
@@ -74,13 +74,13 @@
 	if (PagePrivate(page) && !try_to_release_page(page, 0))
 		return 0;
 
-	spin_lock(&mapping->page_lock);
+	spin_lock_irq(&mapping->tree_lock);
 	if (PageDirty(page)) {
-		spin_unlock(&mapping->page_lock);
+		spin_unlock_irq(&mapping->tree_lock);
 		return 0;
 	}
 	__remove_from_page_cache(page);
-	spin_unlock(&mapping->page_lock);
+	spin_unlock_irq(&mapping->tree_lock);
 	ClearPageUptodate(page);
 	page_cache_release(page);	/* pagecache ref */
 	return 1;
diff -Nru a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c	Tue Apr 13 01:32:22 2004
+++ b/mm/vmscan.c	Tue Apr 13 01:32:22 2004
@@ -354,7 +354,6 @@
 				goto keep_locked;
 			if (!may_write_to_queue(mapping->backing_dev_info))
 				goto keep_locked;
-			spin_lock(&mapping->page_lock);
 			if (test_clear_page_dirty(page)) {
 				int res;
 				struct writeback_control wbc = {
@@ -364,9 +363,6 @@
 					.for_reclaim = 1,
 				};
 
-				list_move(&page->list, &mapping->locked_pages);
-				spin_unlock(&mapping->page_lock);
-
 				SetPageReclaim(page);
 				res = mapping->a_ops->writepage(page, &wbc);
 				if (res < 0)
@@ -381,7 +377,6 @@
 				}
 				goto keep;
 			}
-			spin_unlock(&mapping->page_lock);
 		}
 
 		/*
@@ -415,7 +410,7 @@
 		if (!mapping)
 			goto keep_locked;	/* truncate got there first */
 
-		spin_lock(&mapping->page_lock);
+		spin_lock_irq(&mapping->tree_lock);
 
 		/*
 		 * The non-racy check for busy page.  It is critical to check
@@ -423,7 +418,7 @@
 		 * not in use by anybody. 	(pagecache + us == 2)
 		 */
 		if (page_count(page) != 2 || PageDirty(page)) {
-			spin_unlock(&mapping->page_lock);
+			spin_unlock_irq(&mapping->tree_lock);
 			goto keep_locked;
 		}
 
@@ -431,7 +426,7 @@
 		if (PageSwapCache(page)) {
 			swp_entry_t swap = { .val = page->index };
 			__delete_from_swap_cache(page);
-			spin_unlock(&mapping->page_lock);
+			spin_unlock_irq(&mapping->tree_lock);
 			swap_free(swap);
 			__put_page(page);	/* The pagecache ref */
 			goto free_it;
@@ -439,7 +434,7 @@
 #endif /* CONFIG_SWAP */
 
 		__remove_from_page_cache(page);
-		spin_unlock(&mapping->page_lock);
+		spin_unlock_irq(&mapping->tree_lock);
 		__put_page(page);
 
 free_it:
-
To unsubscribe from this list: send the line "unsubscribe bk-commits-head" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


(Log in to post comments)


Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds