I am curious as to why you allow only single outstanding SRCU callback per thread. The problem with RCU is that it allows basically unbounded number of outstanding callbacks, so why just not bound number of outstanding callbacks in SRCU? Memory blocks are frequently quite small, so that subsystem can tolerate up to let's say 1000 pending memory blocks. Restriction on single pending callback looks quite severe (may cause unnecessary blocking), why not provide:
int init_srcu_struct(struct srcu_struct *sp, int limit_of_pending_callbacks);
While limit is not reached synchronize_srcu() is non blocking, otherwise it waits for grace period. I think in many situation it will make synchronize_srcu() practically non-blocking.
Or it's just not worth doing (because of the additional implementation complexity)?
One more question (it does not directly relates to SRCU, but I remember you were providing some computations regarding required number of grace periods somewhere, I hope I am not mixing up your reasoning now).
You are removing all memory fences from reader side, including release fence in read_unlock(). In order to compensate this you are waiting for additional grace periods before executing callbacks. But on some architectures (IA-32, Intel 64, SPARC TSO) release fence is implied with every store, so isn't it possible to reduce the number of required grace periods before executing callbacks on these architectures?
I.e. something like:
#ifdef ACQUIRE_RELEASE_FENCES_ARE_IMPLIED_ON_ARCH // defined for x86 etc
#define NUMBER_OF_GRACE_PERIODS_BEFORE_EXECUTING_CALLBACKS 2
#define NUMBER_OF_GRACE_PERIODS_BEFORE_EXECUTING_CALLBACKS 4
Have you considered such variant? Is it worth doing?
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds