|From:||Ian Campbell <Ian.Campbell@citrix.com>|
|To:||David Miller <email@example.com>|
|Subject:||[PATCH 0/4] skb paged fragment destructors|
|Date:||Wed, 9 Nov 2011 15:01:35 +0000|
|Cc:||Jesse Brandeburg <firstname.lastname@example.org>, <email@example.com>|
The following series makes use of the skb fragment API (which is in 3.2) to add a per-paged-fragment destructor callback. This can be used by creators of skbs who are interested in the lifecycle of the pages included in that skb after they have handed it off to the network stack. I think these have all been posted before, but have been backed up behind the skb fragment API. The mail at  contains some more background and rationale but basically the completed series will allow entities which inject pages into the networking stack to receive a notification when the stack has really finished with those pages (i.e. including retransmissions, clones, pull-ups etc) and not just when the original skb is finished with, which is beneficial to many subsystems which wish to inject pages into the network stack without giving up full ownership of those page's lifecycle. It implements something broadly along the lines of what was described in . I have also included a patch to the RPC subsystem which uses this API to fix the bug which I describe at . I presented this work at LPC in September and there was a question/concern raised (by Jesse Brandenburg IIRC) regarding the overhead of adding this extra field per fragment. If I understand correctly it seems that in the there have been performance regressions in the past with allocations outgrowing one allocation size bucket and therefore using the next. The change in datastructure size resulting from this series is: BEFORE AFTER AMD64: sizeof(struct skb_frag_struct) = 16 24 sizeof(struct skb_shared_info) = 344 488 sizeof(struct sk_buff) = 240 240 i386: sizeof(struct skb_frag_struct) = 8 12 sizeof(struct skb_shared_info) = 188 260 sizeof(struct sk_buff) = 192 192 (I think these are representative of 32 and 64 bit arches generally) On amd64 this doesn't in itself push the shared_info over a slab boundary but since the linear data is part of the same allocation the size of the linear data which will push us into the next size is reduced from 168 to 24 bytes, which is effectively the same thing as pushing directly into the next size. On i386 we go straight to the next bucket (although the 68 bytes available slack for linear area becomes 252 in that larger size). I'm not sure if this is a showstopper or the particular issue with slab still exists (or maybe it was only slab/slub/slob specific?). I need to find some benchmark which might demonstrate the issue (presumably something where frames are commonly 24<size<168). Jesse, any hints on how to test this or references to the previous occurrence(s) would be gratefully accepted. Possible solutions all seem a bit fugly: * suitably sized slab caches appropriate to these new sizes (no suitable number leaps out at me...) * split linear data allocation and shinfo allocation into two. I suspect this will have its own performance implications? On the positive side skb_shared_info could come from its own fixed size pool/cache which might have some benefits * steal a bit a pointer to indicate destructor pointer vs regular struct page pointer (moving the struct page into the destructor datastructure for that case). Stops us sharing a single destructor between multiple pages, but that isn't critical * add a bitmap to shinfo indicating the kind of each frag. Requires modification of anywhere which propagates the page member (doable, but another huge series) * some sort of pointer compression scheme? I'm sure there are others I'm not seeing right now. Cheers, Ian.  http://marc.info/?l=linux-netdev&m=131072801125521&...  http://marc.info/?l=linux-netdev&m=130925719513084&...  http://marc.info/?l=linux-nfs&m=122424132729720&w=2
Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds