Kevent subsystem.
From: | Evgeniy Polyakov <johnpol@2ka.mipt.ru> | |
To: | netdev@vger.kernel.org | |
Subject: | [1/2] Kevent subsystem. | |
Date: | Thu, 9 Feb 2006 16:56:19 +0300 | |
Cc: | David Miller <davem@davemloft.net>, Jamal Hadi Salim <hadi@cyberus.ca>, Stephen Hemminger <shemminger@osdl.org> |
Kevent subsystem. Here is initial draft of kevent subsystem, which incorporates several AIO/kqueue design notes and ideas. Kevent can be used both for edge and level notifications. It supports socket notifications (accept and receiving), inode notifications (create/remove), generic poll()/select() notifications, timer notifications and network AIO notifications. storage - each source of events (socket, inode, timer, aio) has structure kevent_storage incorporated into it, which is basically a list of registered interests for this source of events. user - it is abstraction which holds all requested kevents. It is similar to FreeBSD's kqueue. kevent - set of interests for given source of events or storage. Each kevent now is queued into three lists: * kevent_user->kevent_list - list of all registered kevents. * kevent_user->ready_list - list of ready kevents. * kevent_storage->list - list of all interests for given kevent_storage. When kevent is queued into storage, it will live there until removed by kevent_dequeue(). When some activity is noticed in given storage, it scans it's kevent_storage->list for kevents which match activity event. If kevents are found and they are not already in the kevent_user->ready_list, they will be added there at the end. ioctl(WAIT) will wait until either requested number of kevents are ready or timeout elapsed or at least one kevent is ready, it's behaviour depends on parameters. Any event can be added/removed/modified by ioctl. I suspect people do not like such an interface, but it is the simplest one. Main goal is to implement aio set of networking operations. There are couple of benchmarks of simple web servers based on kevent_poll notifications and kevent_socket notificatins against the same server using epoll() and, just for reference, obviously they are in different weights, Apache2. Running httperf with 30k connections with maximum burst size of 3k connections with 1 sec timeout between bursts on single-threaded handmade http server (using sendfile() for file) on Xeon 2.4 Ghz, 1 Gb RAM, HT enabled, 1Gb network with modified version of kevent_socket notification mechanism and userspace daemon. [s0mbre@uganda httperf-0.8]$ ./httperf --server pcix --num-conns 30000 --rate 3000 --timeout 1 httperf --timeout=1 --client=0/1 --server=pcix --port=80 --uri=/ --rate=3000 --send-buffer=4096 \ --recv-buffer=16384 --num-conns=30000 --num-calls=1 Request rate: 2623.7 req/s (0.4 ms/req) Errors: total 1964 client-timo 555 socket-timo 0 connrefused 0 connreset 0 Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 1409 Unfortunately such high rates can not be obtained all the time due to port/socket rollover. Both epoll and kevent_poll show about 1600-1800 requests per second in this setup with much higher number of errors (upto 7 times more). More details [1], benchmarks [2] with FreeBSD kqueue [3], some conculsions are available on project's homepage at: [1] http://tservice.net.ru/~s0mbre/old/?section=projects&... [2] http://tservice.net.ru/~s0mbre/blog/2006/01/16#2006_01_16_1 [3] http://tservice.net.ru/~s0mbre/blog/2006/01/21#2006_01_21 Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> diff --git a/fs/file_table.c b/fs/file_table.c index c3a5e2f..7d73a2b 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -89,6 +89,9 @@ struct file *get_empty_filp(void) if (security_file_alloc(f)) goto fail_sec; +#ifdef CONFIG_KEVENT_POLL + kevent_storage_init(KEVENT_POLL, KEVENT_MASK_EMPTY, f, &f->st); +#endif eventpoll_init_file(f); atomic_set(&f->f_count, 1); f->f_uid = current->fsuid; @@ -135,6 +138,9 @@ void fastcall __fput(struct file *file) might_sleep(); fsnotify_close(file); +#ifdef CONFIG_KEVENT_POLL + kevent_storage_fini(&file->st); +#endif /* * The function eventpoll_release() should be the first called * in the file cleanup chain. diff --git a/fs/inode.c b/fs/inode.c index d8d04bd..185bd67 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -22,6 +22,7 @@ #include <linux/cdev.h> #include <linux/bootmem.h> #include <linux/inotify.h> +#include <linux/kevent.h> /* * This is needed for the following functions: @@ -165,6 +166,9 @@ static struct inode *alloc_inode(struct } memset(&inode->u, 0, sizeof(inode->u)); inode->i_mapping = mapping; +#if defined CONFIG_KEVENT_INODE || defined CONFIG_KEVENT_SOCKET + kevent_storage_init(KEVENT_INODE, KEVENT_MASK_EMPTY, inode, &inode->st); +#endif } return inode; } @@ -173,6 +177,9 @@ void destroy_inode(struct inode *inode) { if (inode_has_buffers(inode)) BUG(); +#if defined CONFIG_KEVENT_INODE || defined CONFIG_KEVENT_SOCKET + kevent_storage_fini(&inode->st); +#endif security_inode_free(inode); if (inode->i_sb->s_op->destroy_inode) inode->i_sb->s_op->destroy_inode(inode); diff --git a/include/linux/fs.h b/include/linux/fs.h index cc35b6a..6566600 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -224,6 +224,9 @@ extern int dir_notify_enable; #include <asm/atomic.h> #include <asm/semaphore.h> #include <asm/byteorder.h> +#ifdef CONFIG_KEVENT +#include <linux/kevent_storage.h> +#endif struct iovec; struct nameidata; @@ -483,6 +486,10 @@ struct inode { struct semaphore inotify_sem; /* protects the watches list */ #endif +#ifdef CONFIG_KEVENT_INODE + struct kevent_storage st; +#endif + unsigned long i_state; unsigned long dirtied_when; /* jiffies of first dirtying */ @@ -616,6 +623,9 @@ struct file { struct list_head f_ep_links; spinlock_t f_ep_lock; #endif /* #ifdef CONFIG_EPOLL */ +#ifdef CONFIG_KEVENT_POLL + struct kevent_storage st; +#endif struct address_space *f_mapping; }; extern spinlock_t files_lock; diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 03b8e79..85449ac 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -15,6 +15,7 @@ #include <linux/dnotify.h> #include <linux/inotify.h> +#include <linux/kevent.h> /* * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir @@ -56,6 +57,7 @@ static inline void fsnotify_nameremove(s isdir = IN_ISDIR; dnotify_parent(dentry, DN_DELETE); inotify_dentry_parent_queue_event(dentry, IN_DELETE|isdir, 0, dentry->d_name.name); + kevent_inode_notify_parent(dentry, KEVENT_INODE_REMOVE); } /* @@ -65,6 +67,7 @@ static inline void fsnotify_inoderemove( { inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL); inotify_inode_is_dead(inode); + kevent_inode_remove(inode); } /* @@ -74,6 +77,7 @@ static inline void fsnotify_create(struc { inode_dir_notify(inode, DN_CREATE); inotify_inode_queue_event(inode, IN_CREATE, 0, name); + kevent_inode_notify(inode, KEVENT_INODE_CREATE); } /* @@ -83,6 +87,7 @@ static inline void fsnotify_mkdir(struct { inode_dir_notify(inode, DN_CREATE); inotify_inode_queue_event(inode, IN_CREATE | IN_ISDIR, 0, name); + kevent_inode_notify(inode, KEVENT_INODE_CREATE); } /* diff --git a/include/linux/kevent.h b/include/linux/kevent.h new file mode 100644 index 0000000..3e164d1 --- /dev/null +++ b/include/linux/kevent.h @@ -0,0 +1,247 @@ +/* + * kevent.h + * + * 2006 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef __KEVENT_H +#define __KEVENT_H + +/* + * Kevent request flags. + */ + +#define KEVENT_REQ_ONESHOT 0x1 /* Process this event only once and then dequeue. */ + +/* + * Kevent return flags. + */ +#define KEVENT_RET_BROKEN 0x1 /* Kevent is broken. */ +#define KEVENT_RET_DONE 0x2 /* Kevent processing was finished successfully. */ + +/* + * Kevent type set. + */ +enum { + KEVENT_SOCKET = 0, + KEVENT_INODE, + KEVENT_TIMER, + KEVENT_POLL, + + KEVENT_MAX, +}; + +/* + * Per-type event sets. + * Number of per-event sets should be exactly as number of kevent types. + */ + +/* + * Timer events. + */ +enum { + KEVENT_TIMER_FIRED = 0x1, +}; + +/* + * Socket events. + */ +enum { + KEVENT_SOCKET_RECV = 0x1, + KEVENT_SOCKET_ACCEPT = 0x2, +}; + +/* + * Inode events. + */ +enum { + KEVENT_INODE_CREATE = 0x1, + KEVENT_INODE_REMOVE = 0x2, +}; + +/* + * Poll events. + */ +enum { + KEVENT_POLL_POLLIN = 0x0001, + KEVENT_POLL_POLLPRI = 0x0002, + KEVENT_POLL_POLLOUT = 0x0004, + KEVENT_POLL_POLLERR = 0x0008, + KEVENT_POLL_POLLHUP = 0x0010, + KEVENT_POLL_POLLNVAL = 0x0020, + + KEVENT_POLL_POLLRDNORM = 0x0040, + KEVENT_POLL_POLLRDBAND = 0x0080, + KEVENT_POLL_POLLWRNORM = 0x0100, + KEVENT_POLL_POLLWRBAND = 0x0200, + KEVENT_POLL_POLLMSG = 0x0400, + KEVENT_POLL_POLLREMOVE = 0x1000, +}; + +#define KEVENT_MASK_ALL 0xffffffff /* Mask of all possible event values. */ +#define KEVENT_MASK_EMPTY 0x0 /* Empty mask of ready events. */ + +struct kevent_id +{ + __u32 raw[2]; +}; + +struct ukevent +{ + struct kevent_id id; /* Id of this request, e.g. socket number, file descriptor and so on... */ + __u32 type; /* Event type, e.g. KEVENT_SOCK, KEVENT_INODE, KEVENT_TIMER and so on... */ + __u32 event; /* Event itself, e.g. SOCK_ACCEPT, INODE_CREATED, TIMER_FIRED... */ + __u32 req_flags; /* Per-event request flags */ + __u32 ret_flags; /* Per-event return flags */ + __u32 ret_data[2]; /* Event return data. Event originator fills it with anything it likes. */ + __u32 user[2]; /* User's data. It is not used, just copied to/from user. */ +}; + +enum { + KEVENT_CTL_ADD = 0, + KEVENT_CTL_REMOVE, + KEVENT_CTL_MODIFY, +}; + +struct kevent_user_control +{ + unsigned int cmd; /* Control command, e.g. KEVENT_ADD, KEVENT_REMOVE... */ + unsigned int num; /* Number of ukevents this strucutre controls. */ + unsigned int timeout; /* Timeout in milliseconds waiting for "num" events to become ready. */ +}; + +#define KEVENT_USER_SYMBOL 'K' +#define KEVENT_USER_CTL _IOWR(KEVENT_USER_SYMBOL, 0, struct kevent_user_control) +#define KEVENT_USER_WAIT _IOWR(KEVENT_USER_SYMBOL, 1, struct kevent_user_control) + +#ifdef __KERNEL__ + +#include <linux/types.h> +#include <linux/list.h> +#include <linux/spinlock.h> +#include <linux/kevent_storage.h> +#include <asm/semaphore.h> + +struct inode; +struct dentry; +struct sock; + +struct kevent; +struct kevent_storage; +typedef int (* kevent_callback_t)(struct kevent *); + +struct kevent +{ + struct ukevent event; + spinlock_t lock; /* This lock protects ukevent manipulations, e.g. ret_flags changes. */ + + struct list_head kevent_entry; /* Entry of user's queue. */ + struct list_head storage_entry; /* Entry of origin's queue. */ + struct list_head ready_entry; /* Entry of user's ready. */ + + struct kevent_user *user; /* User who requested this kevent. */ + struct kevent_storage *st; /* Kevent container. */ + + kevent_callback_t callback; /* Is called each time new event has been caught. */ + kevent_callback_t enqueue; /* Is called each time new event is queued. */ + kevent_callback_t dequeue; /* Is called each time new event is dequeued. */ + + void *priv; /* Private data for different storages. + * poll()/select storage has a list of wait_queue_t containers + * for each ->poll() { poll_wait()' } here. + */ +}; + +#define KEVENT_HASH_MASK 0xff + +struct kevent_list +{ + struct list_head kevent_list; /* List of all kevents. */ + spinlock_t kevent_lock; /* Protects all manipulations with queue of kevents. */ +}; + +struct kevent_user +{ + struct kevent_list kqueue[KEVENT_HASH_MASK+1]; + unsigned int kevent_num; /* Number of queued kevents. */ + + struct list_head ready_list; /* List of ready kevents. */ + unsigned int ready_num; /* Number of ready kevents. */ + spinlock_t ready_lock; /* Protects all manipulations with ready queue. */ + + unsigned int max_ready_num; /* Requested number of kevents. */ + + struct semaphore ctl_mutex; /* Protects against simultaneous kevent_user control manipulations. */ + struct semaphore wait_mutex; /* Protects against simultaneous kevent_user waits. */ + wait_queue_head_t wait; /* Wait until some events are ready. */ + + atomic_t refcnt; /* Reference counter, increased for each new kevent. */ +}; + +#define KEVENT_MAX_REQUESTS PAGE_SIZE/sizeof(struct kevent) + +struct kevent *kevent_alloc(gfp_t mask); +void kevent_free(struct kevent *k); +int kevent_enqueue(struct kevent *k); +int kevent_dequeue(struct kevent *k); +int kevent_init(struct kevent *k); +void kevent_requeue(struct kevent *k); +int kevent_storage_enqueue(struct kevent_storage *st, struct kevent *k); +void kevent_storage_dequeue(struct kevent_storage *st, struct kevent *k); + +#define list_for_each_entry_reverse_safe(pos, n, head, member) \ + for (pos = list_entry((head)->prev, typeof(*pos), member), \ + n = list_entry(pos->member.prev, typeof(*pos), member); \ + prefetch(pos->member.prev), &pos->member != (head); \ + pos = n, n = list_entry(pos->member.prev, typeof(*pos), member)) + +int kevent_break(struct kevent *k); +int kevent_init(struct kevent *k); +int kevent_init_socket(struct kevent *k); +int kevent_init_inode(struct kevent *k); +int kevent_init_timer(struct kevent *k); +int kevent_init_poll(struct kevent *k); + +void kevent_storage_ready(struct kevent_storage *st, kevent_callback_t ready_callback, u32 event); +int kevent_storage_init(__u32 type, __u32 event, void *origin, struct kevent_storage *st); +void kevent_storage_fini(struct kevent_storage *st); + +#ifdef CONFIG_KEVENT_INODE +void kevent_inode_notify(struct inode *inode, u32 event); +void kevent_inode_notify_parent(struct dentry *dentry, u32 event); +void kevent_inode_remove(struct inode *inode); +#else +static inline void kevent_inode_notify(struct inode *inode, u32 event) +{ +} +static inline void kevent_inode_notify_parent(struct dentry *dentry, u32 event) +{ +} +static inline void kevent_inode_remove(struct inode *inode) +{ +} +#endif /* CONFIG_KEVENT_INODE */ +#ifdef CONFIG_KEVENT_SOCKET +void kevent_socket_notify(struct sock *sock, u32 event); +#else +static inline void kevent_socket_notify(struct sock *sock, u32 event) +{ +} +#endif +#endif /* __KERNEL__ */ +#endif /* __KEVENT_H */ diff --git a/include/linux/kevent_storage.h b/include/linux/kevent_storage.h new file mode 100644 index 0000000..7c68170 --- /dev/null +++ b/include/linux/kevent_storage.h @@ -0,0 +1,21 @@ +#ifndef __KEVENT_STORAGE_H +#define __KEVENT_STORAGE_H + +struct kevent_storage +{ + __u32 type; /* Event type, e.g. KEVENT_SOCK, KEVENT_INODE, KEVENT_TIMER and so on... */ + __u32 event; /* Event itself, e.g. SOCK_ACCEPT, INODE_CREATED, TIMER_FIRED, + * which were NOT updated by any kevent in origin's queue, + * i.e. when new event happens and there are no requests in + * origin's queue for that event, it will be placed here. + * New events are ORed with old one, so when new kevent is being added + * into origin's queue, it just needs to check if requested event + * is in this mask, and if so, return positive value from ->enqueu() + */ + void *origin; /* Originator's pointer, e.g. struct sock or struct file. Can be NULL. */ + struct list_head list; /* List of queued kevents. */ + unsigned int qlen; /* Number of queued kevents. */ + spinlock_t lock; /* Protects users queue. */ +}; + +#endif /* __KEVENT_STORAGE_H */ diff --git a/include/net/sock.h b/include/net/sock.h index 982b4ec..e7eaaad 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -48,6 +48,7 @@ #include <linux/netdevice.h> #include <linux/skbuff.h> /* struct sk_buff */ #include <linux/security.h> +#include <linux/kevent.h> #include <linux/filter.h> @@ -444,12 +445,28 @@ static inline int sk_stream_memory_free( extern void sk_stream_rfree(struct sk_buff *skb); +struct socket_alloc { + struct socket socket; + struct inode vfs_inode; +}; + +static inline struct socket *SOCKET_I(struct inode *inode) +{ + return &container_of(inode, struct socket_alloc, vfs_inode)->socket; +} + +static inline struct inode *SOCK_INODE(struct socket *socket) +{ + return &container_of(socket, struct socket_alloc, socket)->vfs_inode; +} + static inline void sk_stream_set_owner_r(struct sk_buff *skb, struct sock *sk) { skb->sk = sk; skb->destructor = sk_stream_rfree; atomic_add(skb->truesize, &sk->sk_rmem_alloc); sk->sk_forward_alloc -= skb->truesize; + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); } static inline void sk_stream_free_skb(struct sock *sk, struct sk_buff *skb) @@ -470,6 +487,7 @@ static inline void sk_add_backlog(struct sk->sk_backlog.tail = skb; } skb->next = NULL; + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); } #define sk_wait_event(__sk, __timeo, __condition) \ @@ -664,21 +682,6 @@ static inline struct kiocb *siocb_to_kio return si->kiocb; } -struct socket_alloc { - struct socket socket; - struct inode vfs_inode; -}; - -static inline struct socket *SOCKET_I(struct inode *inode) -{ - return &container_of(inode, struct socket_alloc, vfs_inode)->socket; -} - -static inline struct inode *SOCK_INODE(struct socket *socket) -{ - return &container_of(socket, struct socket_alloc, socket)->vfs_inode; -} - extern void __sk_stream_mem_reclaim(struct sock *sk); extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind); diff --git a/include/net/tcp.h b/include/net/tcp.h index d78025f..77626a1 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -950,6 +950,7 @@ static __inline__ int tcp_prequeue(struc tp->ucopy.memory = 0; } else if (skb_queue_len(&tp->ucopy.prequeue) == 1) { wake_up_interruptible(sk->sk_sleep); + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); if (!inet_csk_ack_scheduled(sk)) inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK, (3 * TCP_RTO_MIN) / 4, diff --git a/init/Kconfig b/init/Kconfig index 9fc0759..4a4f26c 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -224,6 +224,8 @@ config KOBJECT_UEVENT Say Y, unless you are building a system requiring minimal memory consumption. +source "kernel/kevent/Kconfig" + config IKCONFIG bool "Kernel .config support" ---help--- diff --git a/kernel/Makefile b/kernel/Makefile index 4f5a145..7c5aa88 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -32,6 +32,7 @@ obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_SECCOMP) += seccomp.o obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o +obj-$(CONFIG_KEVENT) += kevent/ ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is diff --git a/kernel/kevent/Kconfig b/kernel/kevent/Kconfig new file mode 100644 index 0000000..5aae2ef --- /dev/null +++ b/kernel/kevent/Kconfig @@ -0,0 +1,33 @@ +config KEVENT + bool "Kernel event notification mechanism" + help + This option enables event queue mechanism. + It can be used as replacement for poll()/select(), AIO callback invocations, + advanced timer notifications and other kernel object status changes. + +config KEVENT_SOCKET + bool "Kernel event notifications for sockets" + depends on NET && KEVENT + help + This option enables notifications through KEVENT subsystem of + sockets operations, like new packet receiving conditions, ready for accept + conditions and so on. + +config KEVENT_INODE + bool "Kernel event notifications for inodes" + depends on KEVENT + help + This option enables notifications through KEVENT subsystem of + inode operations, like file creation, removal and so on. + +config KEVENT_TIMER + bool "Kernel event notifications for timers" + depends on KEVENT + help + This option allows to use timers through KEVENT subsystem. + +config KEVENT_POLL + bool "Kernel event notifications for poll()/select()" + depends on KEVENT + help + This option allows to use kevent subsystem for poll()/select() notifications. diff --git a/kernel/kevent/Makefile b/kernel/kevent/Makefile new file mode 100644 index 0000000..4609205 --- /dev/null +++ b/kernel/kevent/Makefile @@ -0,0 +1,5 @@ +obj-y := kevent.o kevent_user.o kevent_init.o +obj-$(CONFIG_KEVENT_SOCKET) += kevent_socket.o +obj-$(CONFIG_KEVENT_INODE) += kevent_inode.o +obj-$(CONFIG_KEVENT_TIMER) += kevent_timer.o +obj-$(CONFIG_KEVENT_POLL) += kevent_poll.o diff --git a/kernel/kevent/kevent.c b/kernel/kevent/kevent.c new file mode 100644 index 0000000..61a442f --- /dev/null +++ b/kernel/kevent/kevent.c @@ -0,0 +1,300 @@ +/* + * kevent.c + * + * 2006 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/mempool.h> +#include <linux/sched.h> +#include <linux/wait.h> +#include <linux/kevent.h> + +static kmem_cache_t *kevent_cache; + +/* + * Attempts to add an event into appropriate origin's queue. + * Returns positive value if this event is ready immediately, + * negative value in case of error and zero if event has been queued. + * ->enqueue() callback must increase origin's reference counter. + */ +int kevent_enqueue(struct kevent *k) +{ + if (k->event.type >= KEVENT_MAX) + return -E2BIG; + + if (!k->enqueue) { + kevent_break(k); + return -EINVAL; + } + + return k->enqueue(k); +} + +/* + * Remove event from the appropriate queue. + * ->dequeue() callback must decrease origin's reference counter. + */ +int kevent_dequeue(struct kevent *k) +{ + if (k->event.type >= KEVENT_MAX) + return -E2BIG; + + if (!k->dequeue) { + kevent_break(k); + return -EINVAL; + } + + return k->dequeue(k); +} + +/* + * Must be called before event is going to be added into some origin's queue. + * Initializes ->enqueue(), ->dequeue() and ->callback() callbacks. + * If failed, kevent should not be used or kevent_enqueue() will fail to add + * this kevent into origin's queue with setting + * KEVENT_RET_BROKEN flag in kevent->event.ret_flags. + */ +int kevent_init(struct kevent *k) +{ + int err; + + spin_lock_init(&k->lock); + k->kevent_entry.next = LIST_POISON1; + k->storage_entry.next = LIST_POISON1; + k->ready_entry.next = LIST_POISON1; + + if (k->event.type >= KEVENT_MAX) + return -E2BIG; + + switch (k->event.type) { + case KEVENT_SOCKET: + err = kevent_init_socket(k); + break; + case KEVENT_INODE: + err = kevent_init_inode(k); + break; + case KEVENT_TIMER: + err = kevent_init_timer(k); + break; + case KEVENT_POLL: + err = kevent_init_poll(k); + break; + default: + err = -ENODEV; + } + + return err; +} + +/* + * Checks if "event" is requested by given kevent and if so setup kevent's ret_data. + * Also updates storage's mask of pending events. + * Returns set of events requested by user which are ready. + */ +static inline u32 kevent_set_event(struct kevent_storage *st, struct kevent *k, u32 event) +{ + u32 ev = event & k->event.event; + + st->event &= ~ev; + if (ev) + k->event.ret_data[1] = ev; + + return ev; +} + +/* + * Called from ->enqueue() callback when reference counter for given + * origin (socket, inode...) has been increased. + */ +int kevent_storage_enqueue(struct kevent_storage *st, struct kevent *k) +{ + unsigned long flags; + u32 ev; + + k->st = st; + spin_lock_irqsave(&st->lock, flags); + + spin_lock(&k->lock); + ev = kevent_set_event(st, k, st->event); + spin_unlock(&k->lock); + + list_add_tail(&k->storage_entry, &st->list); + st->qlen++; + + spin_unlock_irqrestore(&st->lock, flags); +#if 0 + if (ev) { + spin_lock_irqsave(&k->user->ready_lock, flags); + list_add_tail(&k->ready_entry, &k->user->ready_list); + k->user->ready_num++; + spin_unlock_irqrestore(&k->user->ready_lock, flags); + wake_up(&k->user->wait); + } +#endif + return !!ev; +} + +/* + * Dequeue kevent from origin's queue. + * It does not decrease origin's reference counter in any way and must be called before it, + * so storage itself must be valid. + * It is called from ->dequeue() callback. + */ +void kevent_storage_dequeue(struct kevent_storage *st, struct kevent *k) +{ + unsigned long flags; + + spin_lock_irqsave(&st->lock, flags); + list_del(&k->storage_entry); + st->qlen--; + spin_unlock_irqrestore(&st->lock, flags); +} + +static void __kevent_requeue(struct kevent *k, u32 event) +{ + int err, broken; + + err = k->callback(k); + if (err < 0) + kevent_break(k); + + spin_lock(&k->lock); + broken = (k->event.ret_flags & KEVENT_RET_BROKEN); + spin_unlock(&k->lock); + + if (err || broken) { + spin_lock(&k->lock); + k->event.ret_flags |= KEVENT_RET_DONE; + if (event & k->event.event) + k->event.ret_data[0] = event & k->event.event; + spin_unlock(&k->lock); + + list_del(&k->storage_entry); + list_add_tail(&k->storage_entry, &k->st->list); + + spin_lock(&k->user->ready_lock); + if (k->ready_entry.next == LIST_POISON1) { + list_add_tail(&k->ready_entry, &k->user->ready_list); + k->user->ready_num++; + } + spin_unlock(&k->user->ready_lock); + wake_up(&k->user->wait); + } +} + +void kevent_requeue(struct kevent *k) +{ + unsigned long flags; + + spin_lock_irqsave(&k->st->lock, flags); + __kevent_requeue(k, 0); + spin_unlock_irqrestore(&k->st->lock, flags); +} + +/* + * Called each time some activity in origin (socket, inode...) is noticed. + */ +void kevent_storage_ready(struct kevent_storage *st, kevent_callback_t ready_callback, u32 event) +{ + unsigned long flags; + struct kevent *k, *n; + u32 ev; + unsigned int qlen; + + spin_lock_irqsave(&st->lock, flags); + + st->event |= event; + qlen = st->qlen; + + if (qlen) { + list_for_each_entry_safe(k, n, &st->list, storage_entry) { + if (qlen-- <= 0) + break; + + if (ready_callback) + ready_callback(k); + + spin_lock(&k->lock); + ev = (k->event.event & event); + if (!ev) { + spin_unlock(&k->lock); + continue; + } + kevent_set_event(st, k, event); + spin_unlock(&k->lock); + + __kevent_requeue(k, event); + } + } + spin_unlock_irqrestore(&st->lock, flags); +} + +int kevent_storage_init(__u32 type, __u32 event, void *origin, struct kevent_storage *st) +{ + spin_lock_init(&st->lock); + st->type = type; + st->event = event; + st->origin = origin; + st->qlen = 0; + INIT_LIST_HEAD(&st->list); + return 0; +} + +void kevent_storage_fini(struct kevent_storage *st) +{ + kevent_storage_ready(st, kevent_break, KEVENT_MASK_ALL); +} + +struct kevent *kevent_alloc(gfp_t mask) +{ + struct kevent *k; + + if (kevent_cache) + k = kmem_cache_alloc(kevent_cache, mask); + else + k = kzalloc(sizeof(struct kevent), mask); + + return k; +} + +void kevent_free(struct kevent *k) +{ + if (kevent_cache) + kmem_cache_free(kevent_cache, k); + else + kfree(k); +} + +int __init kevent_sys_init(void) +{ + int err = 0; + + kevent_cache = kmem_cache_create("kevent_cache", sizeof(struct kevent), + 0, 0, NULL, NULL); + if (!kevent_cache) + err = -ENOMEM; + + return err; +} + +late_initcall(kevent_sys_init); diff --git a/kernel/kevent/kevent_init.c b/kernel/kevent/kevent_init.c new file mode 100644 index 0000000..2a75c40 --- /dev/null +++ b/kernel/kevent/kevent_init.c @@ -0,0 +1,70 @@ +/* + * kevent_init.c + * + * 2006 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/list.h> +#include <linux/spinlock.h> +#include <linux/errno.h> +#include <linux/kevent.h> + +int kevent_break(struct kevent *k) +{ + unsigned long flags; + + spin_lock_irqsave(&k->lock, flags); + k->event.ret_flags |= KEVENT_RET_BROKEN; + spin_unlock_irqrestore(&k->lock, flags); + printk("%s: k=%p.\n", __func__, k); + return 0; +} + +#ifndef CONFIG_KEVENT_SOCKET +int kevent_init_socket(struct kevent *k) +{ + kevent_break(k); + return -ENODEV; +} +#endif + +#ifndef CONFIG_KEVENT_INODE +int kevent_init_inode(struct kevent *k) +{ + kevent_break(k); + return -ENODEV; +} +#endif + +#ifndef CONFIG_KEVENT_TIMER +int kevent_init_timer(struct kevent *k) +{ + kevent_break(k); + return -ENODEV; +} +#endif + +#ifndef CONFIG_KEVENT_POLL +int kevent_init_poll(struct kevent *k) +{ + kevent_break(k); + return -ENODEV; +} +#endif diff --git a/kernel/kevent/kevent_inode.c b/kernel/kevent/kevent_inode.c new file mode 100644 index 0000000..b6d12f6 --- /dev/null +++ b/kernel/kevent/kevent_inode.c @@ -0,0 +1,110 @@ +/* + * kevent_inode.c + * + * 2006 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/timer.h> +#include <linux/file.h> +#include <linux/kevent.h> +#include <linux/fs.h> + +static int kevent_inode_enqueue(struct kevent *k) +{ + struct file *file; + struct inode *inode; + int err, fput_needed; + + file = fget_light(k->event.id.raw[0], &fput_needed); + if (!file) + return -ENODEV; + + err = -EINVAL; + if (!file->f_dentry || !file->f_dentry->d_inode) + goto err_out_fput; + + inode = igrab(file->f_dentry->d_inode); + if (!inode) + goto err_out_fput; + + err = kevent_storage_enqueue(&inode->st, k); + if (err < 0) + goto err_out_iput; + + fput_light(file, fput_needed); + return 0; + +err_out_iput: + iput(inode); +err_out_fput: + fput_light(file, fput_needed); + return err; +} + +static int kevent_inode_dequeue(struct kevent *k) +{ + struct inode *inode = k->st->origin; + + kevent_storage_dequeue(k->st, k); + iput(inode); + + return 0; +} + +static int kevent_inode_callback(struct kevent *k) +{ + return 1; +} + +int kevent_init_inode(struct kevent *k) +{ + k->enqueue = &kevent_inode_enqueue; + k->dequeue = &kevent_inode_dequeue; + k->callback = &kevent_inode_callback; + return 0; +} + +void kevent_inode_notify_parent(struct dentry *dentry, u32 event) +{ + struct dentry *parent; + struct inode *inode; + + spin_lock(&dentry->d_lock); + parent = dentry->d_parent; + inode = parent->d_inode; + + dget(parent); + spin_unlock(&dentry->d_lock); + kevent_inode_notify(inode, KEVENT_INODE_REMOVE); + dput(parent); +} + +void kevent_inode_remove(struct inode *inode) +{ + kevent_storage_fini(&inode->st); +} + +void kevent_inode_notify(struct inode *inode, u32 event) +{ + kevent_storage_ready(&inode->st, NULL, event); +} diff --git a/kernel/kevent/kevent_poll.c b/kernel/kevent/kevent_poll.c new file mode 100644 index 0000000..12b06bb --- /dev/null +++ b/kernel/kevent/kevent_poll.c @@ -0,0 +1,213 @@ +/* + * kevent_poll.c + * + * 2006 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/timer.h> +#include <linux/file.h> +#include <linux/kevent.h> +#include <linux/poll.h> +#include <linux/fs.h> + +static kmem_cache_t *kevent_poll_container_cache; +static kmem_cache_t *kevent_poll_priv_cache; + +struct kevent_poll_ctl +{ + struct poll_table_struct pt; + struct kevent *k; +}; + +struct kevent_poll_wait_container +{ + struct list_head container_entry; + wait_queue_head_t *whead; + wait_queue_t wait; + struct kevent *k; +}; + +struct kevent_poll_private +{ + struct list_head container_list; + spinlock_t container_lock; +}; + +static int kevent_poll_wait_callback(wait_queue_t *wait, unsigned mode, int sync, void *key) +{ + struct kevent_poll_wait_container *cont = container_of(wait, struct kevent_poll_wait_container, wait); + struct kevent *k = cont->k; + struct file *file = k->st->origin; + unsigned long flags; + u32 revents, event; + + revents = file->f_op->poll(file, NULL); + spin_lock_irqsave(&k->lock, flags); + event = k->event.event; + spin_unlock_irqrestore(&k->lock, flags); + + kevent_storage_ready(k->st, NULL, revents); + + return 0; +} + +static void kevent_poll_qproc(struct file *file, wait_queue_head_t *whead, struct poll_table_struct *poll_table) +{ + struct kevent *k = container_of(poll_table, struct kevent_poll_ctl, pt)->k; + struct kevent_poll_private *priv = k->priv; + struct kevent_poll_wait_container *cont; + unsigned long flags; + + cont = kmem_cache_alloc(kevent_poll_container_cache, SLAB_KERNEL); + if (!cont) { + kevent_break(k); + return; + } + + cont->k = k; + init_waitqueue_func_entry(&cont->wait, kevent_poll_wait_callback); + cont->whead = whead; + + spin_lock_irqsave(&priv->container_lock, flags); + list_add_tail(&cont->container_entry, &priv->container_list); + spin_unlock_irqrestore(&priv->container_lock, flags); + + add_wait_queue(whead, &cont->wait); +} + +static int kevent_poll_enqueue(struct kevent *k) +{ + struct file *file; + int err, ready = 0; + unsigned int revents; + struct kevent_poll_ctl ctl; + struct kevent_poll_private *priv; + + file = fget(k->event.id.raw[0]); + if (!file) + return -ENODEV; + + err = -EINVAL; + if (!file->f_op || !file->f_op->poll) + goto err_out_fput; + + err = -ENOMEM; + priv = kmem_cache_alloc(kevent_poll_priv_cache, SLAB_KERNEL); + if (!priv) + goto err_out_fput; + + spin_lock_init(&priv->container_lock); + INIT_LIST_HEAD(&priv->container_list); + + k->priv = priv; + + ctl.k = k; + init_poll_funcptr(&ctl.pt, &kevent_poll_qproc); + + revents = file->f_op->poll(file, &ctl.pt); + if (revents & k->event.event) + ready = 1; + + err = kevent_storage_enqueue(&file->st, k); + if (err < 0) + goto err_out_free; + + return ready; + +err_out_free: + kmem_cache_free(kevent_poll_priv_cache, priv); +err_out_fput: + fput(file); + return err; +} + +static int kevent_poll_dequeue(struct kevent *k) +{ + struct file *file = k->st->origin; + struct kevent_poll_private *priv = k->priv; + struct kevent_poll_wait_container *w, *n; + unsigned long flags; + + kevent_storage_dequeue(k->st, k); + + spin_lock_irqsave(&priv->container_lock, flags); + list_for_each_entry_safe(w, n, &priv->container_list, container_entry) { + list_del(&w->container_entry); + remove_wait_queue(w->whead, &w->wait); + kmem_cache_free(kevent_poll_container_cache, w); + } + spin_unlock_irqrestore(&priv->container_lock, flags); + + kmem_cache_free(kevent_poll_priv_cache, priv); + k->priv = NULL; + + fput(file); + + return 0; +} + +static int kevent_poll_callback(struct kevent *k) +{ + struct file *file = k->st->origin; + unsigned int revents = file->f_op->poll(file, NULL); + return (revents & k->event.event); +} + +int kevent_init_poll(struct kevent *k) +{ + if (!kevent_poll_container_cache || !kevent_poll_priv_cache) + return -ENOMEM; + + k->enqueue = &kevent_poll_enqueue; + k->dequeue = &kevent_poll_dequeue; + k->callback = &kevent_poll_callback; + return 0; +} + + +static int __init kevent_poll_sys_init(void) +{ + kevent_poll_container_cache = kmem_cache_create("kevent_poll_container_cache", sizeof(struct kevent_poll_wait_container), + 0, 0, NULL, NULL); + if (!kevent_poll_container_cache) { + printk(KERN_ERR "Failed to create kevent poll container cache.\n"); + return -ENOMEM; + } + + kevent_poll_priv_cache = kmem_cache_create("kevent_poll_priv_cache", sizeof(struct kevent_poll_private), + 0, 0, NULL, NULL); + if (!kevent_poll_priv_cache) { + printk(KERN_ERR "Failed to create kevent poll private data cache.\n"); + kmem_cache_destroy(kevent_poll_container_cache); + kevent_poll_container_cache = NULL; + return -ENOMEM; + } + + printk(KERN_INFO "Kevent poll()/select() subsystem has been initialized.\n"); + return 0; +} + +static void __exit kevent_poll_sys_fini(void) +{ + kmem_cache_destroy(kevent_poll_priv_cache); + kmem_cache_destroy(kevent_poll_container_cache); +} + +module_init(kevent_poll_sys_init); +module_exit(kevent_poll_sys_fini); diff --git a/kernel/kevent/kevent_socket.c b/kernel/kevent/kevent_socket.c new file mode 100644 index 0000000..1606eb6 --- /dev/null +++ b/kernel/kevent/kevent_socket.c @@ -0,0 +1,118 @@ +/* + * kevent_socket.c + * + * 2006 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/timer.h> +#include <linux/file.h> +#include <linux/tcp.h> +#include <linux/kevent.h> + +#include <net/sock.h> +#include <net/request_sock.h> +#include <net/inet_connection_sock.h> + +static int kevent_socket_callback(struct kevent *k) +{ + struct inode *inode = k->st->origin; + struct sock *sk = SOCKET_I(inode)->sk; + int rmem; + + if (k->event.event & KEVENT_SOCKET_RECV) { + int ret = 0; + + if ((rmem = atomic_read(&sk->sk_rmem_alloc)) > 0 || !skb_queue_empty(&sk->sk_receive_queue)) + ret = 1; + if (sk->sk_shutdown & RCV_SHUTDOWN) + ret = 1; + if (ret) + return ret; + } + if ((k->event.event & KEVENT_SOCKET_ACCEPT) && + (!reqsk_queue_empty(&inet_csk(sk)->icsk_accept_queue) || + reqsk_queue_len_young(&inet_csk(sk)->icsk_accept_queue))) { + k->event.ret_data[1] = reqsk_queue_len(&inet_csk(sk)->icsk_accept_queue); + return 1; + } + + return 0; +} + +static int kevent_socket_enqueue(struct kevent *k) +{ + struct file *file; + struct inode *inode; + int err, fput_needed; + + file = fget_light(k->event.id.raw[0], &fput_needed); + if (!file) + return -ENODEV; + + err = -EINVAL; + if (!file->f_dentry || !file->f_dentry->d_inode) + goto err_out_fput; + + inode = igrab(file->f_dentry->d_inode); + if (!inode) + goto err_out_fput; + + err = kevent_storage_enqueue(&inode->st, k); + if (err < 0) + goto err_out_iput; + + err = kevent_socket_callback(k); + + fput_light(file, fput_needed); + return err; + +err_out_iput: + iput(inode); +err_out_fput: + fput_light(file, fput_needed); + return err; +} + +static int kevent_socket_dequeue(struct kevent *k) +{ + struct inode *inode = k->st->origin; + + kevent_storage_dequeue(k->st, k); + iput(inode); + + return 0; +} + +int kevent_init_socket(struct kevent *k) +{ + k->enqueue = &kevent_socket_enqueue; + k->dequeue = &kevent_socket_dequeue; + k->callback = &kevent_socket_callback; + return 0; +} + +void kevent_socket_notify(struct sock *sk, u32 event) +{ + if (sk->sk_socket) + kevent_storage_ready(&SOCK_INODE(sk->sk_socket)->st, NULL, event); +} diff --git a/kernel/kevent/kevent_timer.c b/kernel/kevent/kevent_timer.c new file mode 100644 index 0000000..3ae05c2 --- /dev/null +++ b/kernel/kevent/kevent_timer.c @@ -0,0 +1,111 @@ +/* + * kevent_timer.c + * + * 2006 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/timer.h> +#include <linux/jiffies.h> +#include <linux/kevent.h> + +static void kevent_timer_func(unsigned long data) +{ + struct kevent *k = (struct kevent *)data; + struct timer_list *t = k->st->origin; + + kevent_storage_ready(k->st, NULL, KEVENT_MASK_ALL); + mod_timer(t, jiffies + msecs_to_jiffies(k->event.id.raw[0])); +} + +static int kevent_timer_enqueue(struct kevent *k) +{ + struct timer_list *t; + struct kevent_storage *st; + int err; + + t = kmalloc(sizeof(struct timer_list) + sizeof(struct kevent_storage), GFP_KERNEL); + if (!t) + return -ENOMEM; + + init_timer(t); + t->function = kevent_timer_func; + t->expires = jiffies + msecs_to_jiffies(k->event.id.raw[0]); + t->data = (unsigned long)k; + + st = (struct kevent_storage *)(t+1); + err = kevent_storage_init(k->event.type, KEVENT_MASK_EMPTY, t, st); + if (err) + goto err_out_free; + + err = kevent_storage_enqueue(st, k); + if (err < 0) + goto err_out_st_fini; + + add_timer(t); + + return 0; + +err_out_st_fini: + kevent_storage_fini(st); +err_out_free: + kfree(t); + + return err; +} + +static int kevent_timer_dequeue(struct kevent *k) +{ + struct kevent_storage *st = k->st; + struct timer_list *t = st->origin; + + if (!t) + return -ENODEV; + + del_timer_sync(t); + + kevent_storage_dequeue(st, k); + + kfree(t); + + return 0; +} + +static int kevent_timer_callback(struct kevent *k) +{ + struct kevent_storage *st = k->st; + struct timer_list *t = st->origin; + + if (!t) + return -ENODEV; + + k->event.ret_data[0] = (__u32)jiffies; + return 1; +} + +int kevent_init_timer(struct kevent *k) +{ + k->enqueue = &kevent_timer_enqueue; + k->dequeue = &kevent_timer_dequeue; + k->callback = &kevent_timer_callback; + return 0; +} diff --git a/kernel/kevent/kevent_user.c b/kernel/kevent/kevent_user.c new file mode 100644 index 0000000..314ad18 --- /dev/null +++ b/kernel/kevent/kevent_user.c @@ -0,0 +1,526 @@ +/* + * kevent_user.c + * + * 2006 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/types.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/fs.h> +#include <linux/device.h> +#include <linux/poll.h> +#include <linux/kevent.h> +#include <linux/jhash.h> +#include <asm/uaccess.h> +#include <asm/semaphore.h> + +static struct class *kevent_user_class; +static char kevent_name[] = "kevent"; +static int kevent_user_major; +static DECLARE_MUTEX(kevent_user_mutex); + +static int kevent_user_open(struct inode *, struct file *); +static int kevent_user_release(struct inode *, struct file *); +static int kevent_user_ioctl(struct inode *, struct file *, unsigned int, unsigned long); +static unsigned int kevent_user_poll(struct file *, struct poll_table_struct *); + +static struct file_operations kevent_user_fops = { + .open = kevent_user_open, + .release = kevent_user_release, + .ioctl = kevent_user_ioctl, + .poll = kevent_user_poll, + .owner = THIS_MODULE, +}; + +static unsigned int kevent_user_poll(struct file *file, struct poll_table_struct *wait) +{ + struct kevent_user *u = file->private_data; + unsigned int mask; + + poll_wait(file, &u->wait, wait); + mask = 0; + + if (u->ready_num) + mask |= POLLIN | POLLRDNORM; + + return mask; +} + +static int kevent_user_open(struct inode *inode, struct file *file) +{ + struct kevent_user *u; + int i; + + u = kzalloc(sizeof(struct kevent_user), GFP_KERNEL); + if (!u) + return -ENOMEM; + + INIT_LIST_HEAD(&u->ready_list); + spin_lock_init(&u->ready_lock); + u->ready_num = 0; + + for (i=0; i<KEVENT_HASH_MASK+1; ++i) { + INIT_LIST_HEAD(&u->kqueue[i].kevent_list); + spin_lock_init(&u->kqueue[i].kevent_lock); + } + u->kevent_num = 0; + + init_MUTEX(&u->ctl_mutex); + init_MUTEX(&u->wait_mutex); + init_waitqueue_head(&u->wait); + u->max_ready_num = 0; + + atomic_set(&u->refcnt, 1); + + file->private_data = u; + + return 0; +} + +static inline void kevent_user_get(struct kevent_user *u) +{ + atomic_inc(&u->refcnt); +} + +static inline void kevent_user_put(struct kevent_user *u) +{ + if (atomic_dec_and_test(&u->refcnt)) + kfree(u); +} + +#if 0 +static inline unsigned int kevent_user_hash(struct ukevent *uk) +{ + unsigned int h = (uk->user[0] ^ uk->user[1]) ^ (uk->id.raw[0] ^ uk->id.raw[1]); + + h = (((h >> 16) & 0xffff) ^ (h & 0xffff)) & 0xffff; + h = (((h >> 8) & 0xff) ^ (h & 0xff)) & KEVENT_HASH_MASK; + + return h; +} +#else +static inline unsigned int kevent_user_hash(struct ukevent *uk) +{ + return jhash_1word(uk->id.raw[0], uk->id.raw[1]) & KEVENT_HASH_MASK; +} +#endif + +/* + * Remove kevent from user's list of all events, + * dequeue it from storage and decrease user's reference counter, + * since this kevent does not exist anymore. That is why it is freed here. + */ +static void kevent_finish_user(struct kevent *k, int lock) +{ + struct kevent_user *u = k->user; + unsigned long flags; + + if (lock) { + unsigned int hash = kevent_user_hash(&k->event); + struct kevent_list *l = &u->kqueue[hash]; + + spin_lock_irqsave(&l->kevent_lock, flags); + list_del(&k->kevent_entry); + u->kevent_num--; + spin_unlock_irqrestore(&l->kevent_lock, flags); + } else { + list_del(&k->kevent_entry); + u->kevent_num--; + } + kevent_dequeue(k); + kevent_user_put(u); + kevent_free(k); +} + +/* + * Dequeue one entry from user's ready queue. + */ +static struct kevent *__kqueue_dequeue_one_ready(struct list_head *q, unsigned int *qlen) +{ + struct kevent *k = NULL; + unsigned int len = *qlen; + + if (len) { + k = list_entry(q->next, struct kevent, ready_entry); + list_del(&k->ready_entry); + *qlen = len - 1; + } + + return k; +} + +static struct kevent *kqueue_dequeue_ready(struct kevent_user *u) +{ + unsigned long flags; + struct kevent *k; + + spin_lock_irqsave(&u->ready_lock, flags); + k = __kqueue_dequeue_one_ready(&u->ready_list, &u->ready_num); + spin_unlock_irqrestore(&u->ready_lock, flags); + + return k; +} + +struct kevent *kevent_search(struct ukevent *uk, struct kevent_user *u) +{ + struct kevent *k; + unsigned int hash = kevent_user_hash(uk); + struct kevent_list *l = &u->kqueue[hash]; + int found = 0; + unsigned long flags; + + spin_lock_irqsave(&l->kevent_lock, flags); + list_for_each_entry(k, &l->kevent_list, kevent_entry) { + spin_lock(&k->lock); + if (k->event.user[0] == uk->user[0] && k->event.user[1] == uk->user[1] && + k->event.id.raw[0] == uk->id.raw[0] && k->event.id.raw[1] == uk->id.raw[1]) { + found = 1; + spin_unlock(&k->lock); + break; + } + spin_unlock(&k->lock); + } + spin_unlock_irqrestore(&l->kevent_lock, flags); + + return (found)?k:NULL; +} + +/* + * No new entry can be added or removed from any list at this point. + * It is not permitted to call ->ioctl() and ->release() in parallel. + */ +static int kevent_user_release(struct inode *inode, struct file *file) +{ + struct kevent_user *u = file->private_data; + struct kevent *k, *n; + int i; + + down(&kevent_user_mutex); + + for (i=0; i<KEVENT_HASH_MASK+1; ++i) { + struct kevent_list *l = &u->kqueue[i]; + + list_for_each_entry_safe(k, n, &l->kevent_list, kevent_entry) + kevent_finish_user(k, 1); + } + + kevent_user_put(u); + file->private_data = NULL; + + up(&kevent_user_mutex); + + return 0; +} + +static int kevent_user_ctl_modify(struct kevent_user *u, struct kevent_user_control *ctl, unsigned long arg) +{ + int err = 0, i; + struct kevent *k; + unsigned long flags; + struct ukevent uk; + + if (down_interruptible(&u->ctl_mutex)) + return -ERESTARTSYS; + + for (i=0; i<ctl->num; ++i) { + if (copy_from_user(&uk, (const void __user *)arg, sizeof(struct ukevent))) { + err = -EINVAL; + break; + } + + k = kevent_search(&uk, u); + if (k) { + spin_lock_irqsave(&k->lock, flags); + k->event.event = uk.event; + k->event.req_flags = uk.req_flags; + spin_unlock_irqrestore(&k->lock, flags); + kevent_requeue(k); + } else + uk.ret_flags |= KEVENT_RET_BROKEN; + + uk.ret_flags |= KEVENT_RET_DONE; + + if (copy_to_user((void __user *)arg, &uk, sizeof(struct ukevent))) { + err = -EINVAL; + break; + } + + arg += sizeof(struct ukevent); + } + + up(&u->ctl_mutex); + + return err; +} + +static int kevent_user_ctl_remove(struct kevent_user *u, struct kevent_user_control *ctl, unsigned long arg) +{ + int err = 0, i; + struct kevent *k; + struct ukevent uk; + + if (down_interruptible(&u->ctl_mutex)) + return -ERESTARTSYS; + + for (i=0; i<ctl->num; ++i) { + if (copy_from_user(&uk, (const void __user *)arg, sizeof(struct ukevent))) { + err = -EINVAL; + break; + } + + k = kevent_search(&uk, u); + if (k) { + kevent_finish_user(k, 1); + } else + uk.ret_flags |= KEVENT_RET_BROKEN; + + uk.ret_flags |= KEVENT_RET_DONE; + + if (copy_to_user((void __user *)arg, &uk, sizeof(struct ukevent))) { + err = -EINVAL; + break; + } + + arg += sizeof(struct ukevent); + } + + up(&u->ctl_mutex); + + return err; +} + +/* + * Copy all ukevents from userspace, allocate kevent for each one and add them + * into appropriate kevent_storages, e.g. sockets, inodes and so on... + * If something goes wrong, all events will be dequeued and negative error will be returned. + * On success zero is returned and ctl->num will be a number of finished events, either completed + * or failed. Array of finished events (struct ukevent) will be placed behind kevent_user_control structure. + * User must run through that array and check ret_flags field of each ukevent structure + * to determine if it is fired or failed event. + */ +static int kevent_user_ctl_add(struct kevent_user *u, struct kevent_user_control *ctl, unsigned long arg) +{ + int err = 0, cerr = 0, num = 0, knum = 0, i; + struct kevent *k; + void __user *orig, *ctl_addr; + unsigned long flags; + + if (down_interruptible(&u->ctl_mutex)) + return -ERESTARTSYS; + + orig = (void __user *)arg; + ctl_addr = (void __user *)(arg - sizeof(struct kevent_user_control)); + + for (i=0; i<ctl->num; ++i) { + k = kevent_alloc(GFP_KERNEL); + if (!k) { + err = -ENOMEM; + break; + } + + if (copy_from_user(&k->event, (const void __user *)arg, sizeof(struct ukevent))) { + kevent_free(k); + err = -EINVAL; + break; + } + + arg += sizeof(struct ukevent); + + err = kevent_init(k); + if (err) { + kevent_free(k); + break; + } + k->user = u; + + err = kevent_enqueue(k); + if (err) { + if (copy_to_user(orig, &k->event, sizeof(struct ukevent))) + cerr = -EINVAL; + + orig += sizeof(struct ukevent); + + if (err < 0) + kevent_free(k); + num++; + } + + if (err >= 0) { + unsigned int hash = kevent_user_hash(&k->event); + struct kevent_list *l = &u->kqueue[hash]; + + spin_lock_irqsave(&l->kevent_lock, flags); + list_add_tail(&k->kevent_entry, &l->kevent_list); + u->kevent_num++; + kevent_user_get(u); + spin_unlock_irqrestore(&l->kevent_lock, flags); + knum++; + } + } + + if (err < 0) + goto err_out_remove; + + ctl->num = num; + if (copy_to_user(ctl_addr, ctl, sizeof(struct kevent_user_control))) + cerr = -EINVAL; + + if (cerr) + err = cerr; + if (!err) + err = num; + +err_out_remove: + up(&u->ctl_mutex); + + return err; +} + +/* + * Waits until at least ctl->ready_num events are ready or timeout and returns + * number of ready events (in case of timeout) or number of requested events. + */ +static int kevent_user_wait(struct file *file, struct kevent_user *u, struct kevent_user_control *ctl, unsigned long arg) +{ + struct kevent *k; + int cerr = 0, num = 0; + void __user *ptr = (void __user *)(arg + sizeof(struct kevent_user_control)); + + if (down_interruptible(&u->wait_mutex)) + return -ERESTARTSYS; + + if (!(file->f_flags & O_NONBLOCK)) { + if (ctl->timeout) + wait_event_interruptible_timeout(u->wait, + u->ready_num >= ctl->num, msecs_to_jiffies(ctl->timeout)); + else + wait_event_interruptible_timeout(u->wait, + u->ready_num > 0, msecs_to_jiffies(1000)); + } + while (num < ctl->num && (k = kqueue_dequeue_ready(u))) { + if (copy_to_user(ptr + num*sizeof(struct ukevent), &k->event, sizeof(struct ukevent))) + cerr = -EINVAL; + /* + * If it is one-shot kevent, it has been removed already from + * origin's queue, so we can easily free it here. + */ + if (k->event.req_flags & KEVENT_REQ_ONESHOT) + kevent_finish_user(k, 1); + ++num; + } + + ctl->num = num; + if (copy_to_user((void __user *)arg, ctl, sizeof(struct kevent_user_control))) + cerr = -EINVAL; + + up(&u->wait_mutex); + + return (cerr)?cerr:num; +} + +static int kevent_user_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg) +{ + int err = -ENODEV; + struct kevent_user_control ctl; + struct kevent_user *u = file->private_data; + + if (copy_from_user(&ctl, (const void __user *)arg, sizeof(struct kevent_user_control))) + return -EINVAL; + + down(&kevent_user_mutex); + + switch (cmd) { + case KEVENT_USER_CTL: + switch (ctl.cmd) { + case KEVENT_CTL_ADD: + err = kevent_user_ctl_add(u, &ctl, arg+sizeof(struct kevent_user_control)); + break; + case KEVENT_CTL_REMOVE: + err = kevent_user_ctl_remove(u, &ctl, arg+sizeof(struct kevent_user_control)); + break; + case KEVENT_CTL_MODIFY: + err = kevent_user_ctl_modify(u, &ctl, arg+sizeof(struct kevent_user_control)); + break; + default: + err = -EINVAL; + break; + } + break; + case KEVENT_USER_WAIT: + err = kevent_user_wait(file, u, &ctl, arg); + break; + default: + break; + } + + up(&kevent_user_mutex); + + return err; +} + +static int __devinit kevent_user_init(void) +{ + struct class_device *dev; + int err = 0; + + kevent_user_major = register_chrdev(0, kevent_name, &kevent_user_fops); + if (kevent_user_major < 0) { + printk(KERN_ERR "Failed to register \"%s\" char device: err=%d.\n", kevent_name, kevent_user_major); + return -ENODEV; + } + + kevent_user_class = class_create(THIS_MODULE, "kevent"); + if (IS_ERR(kevent_user_class)) { + printk(KERN_ERR "Failed to register \"%s\" class: err=%ld.\n", kevent_name, PTR_ERR(kevent_user_class)); + err = PTR_ERR(kevent_user_class); + goto err_out_unregister; + } + + dev = class_device_create(kevent_user_class, NULL, MKDEV(kevent_user_major, 0), NULL, kevent_name); + if (IS_ERR(dev)) { + printk(KERN_ERR "Failed to create %d.%d class device in \"%s\" class: err=%ld.\n", + kevent_user_major, 0, kevent_name, PTR_ERR(dev)); + err = PTR_ERR(dev); + goto err_out_class_destroy; + } + + printk("KEVENT subsystem: chardev helper: major=%d.\n", kevent_user_major); + + return 0; + +err_out_class_destroy: + class_destroy(kevent_user_class); +err_out_unregister: + unregister_chrdev(kevent_user_major, kevent_name); + + return err; +} + +static void __devexit kevent_user_fini(void) +{ + class_device_destroy(kevent_user_class, MKDEV(kevent_user_major, 0)); + class_destroy(kevent_user_class); + unregister_chrdev(kevent_user_major, kevent_name); +} + +module_init(kevent_user_init); +module_exit(kevent_user_fini); diff --git a/net/core/sock.c b/net/core/sock.c index 13cc3be..cdc3f82 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1201,6 +1201,7 @@ static void sock_def_wakeup(struct sock if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) wake_up_interruptible_all(sk->sk_sleep); read_unlock(&sk->sk_callback_lock); + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); } static void sock_def_error_report(struct sock *sk) @@ -1210,6 +1211,7 @@ static void sock_def_error_report(struct wake_up_interruptible(sk->sk_sleep); sk_wake_async(sk,0,POLL_ERR); read_unlock(&sk->sk_callback_lock); + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); } static void sock_def_readable(struct sock *sk, int len) @@ -1219,6 +1221,7 @@ static void sock_def_readable(struct soc wake_up_interruptible(sk->sk_sleep); sk_wake_async(sk,1,POLL_IN); read_unlock(&sk->sk_callback_lock); + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); } static void sock_def_write_space(struct sock *sk) @@ -1235,6 +1238,8 @@ static void sock_def_write_space(struct /* Should agree with poll, otherwise some programs break */ if (sock_writeable(sk)) sk_wake_async(sk, 2, POLL_OUT); + + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); } read_unlock(&sk->sk_callback_lock); diff --git a/net/core/stream.c b/net/core/stream.c index 15bfd03..67ab414 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -36,6 +36,7 @@ void sk_stream_write_space(struct sock * wake_up_interruptible(sk->sk_sleep); if (sock->fasync_list && !(sk->sk_shutdown & SEND_SHUTDOWN)) sock_wake_async(sock, 2, POLL_OUT); + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); } } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bf2e230..15d3340 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3016,6 +3016,7 @@ static void tcp_ofo_queue(struct sock *s __skb_unlink(skb, &tp->out_of_order_queue); __skb_queue_tail(&sk->sk_receive_queue, skb); + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; if(skb->h.th->fin) tcp_fin(skb, sk, skb->h.th); @@ -3079,8 +3080,8 @@ queue_and_out: !sk_stream_rmem_schedule(sk, skb)) goto drop; } - sk_stream_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); + sk_stream_set_owner_r(skb, sk); } tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; if(skb->len) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 4d5021e..c2a542b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -62,6 +62,7 @@ #include <linux/jhash.h> #include <linux/init.h> #include <linux/times.h> +#include <linux/kevent.h> #include <net/icmp.h> #include <net/inet_hashtables.h> @@ -1007,6 +1008,7 @@ int tcp_v4_conn_request(struct sock *sk, reqsk_free(req); } else { inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT); + kevent_socket_notify(sk, KEVENT_SOCKET_ACCEPT); } return 0; -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html