NAPI performance - a weighty matter
Posted Jun 17, 2005 0:38 UTC (Fri) by hadi
Parent article: NAPI performance - a weighty matter
I just read the conclusion and although ive never posted here before, I think it is misleading, so let me set the record straight - because the entire explanation of the behavior has nothing to do with NAPI ;->
Maybe an anology would help:
Lets say you had a restaurant that could seat 64 people (weight). And lets say you also had a queue outside the restaurant that could accomodate up to 256(rx ring size) people and any person arriving when the queue was full was stamped by the bouncer to never come back (strange restaurant -it's a bad remake of the seinfield soup nazi episode).
Lets also say that this is a strange queue in which every time the bouncer allowed someone into the restaurant the person behind doesnt move forward to take the empty slot unless the bouncer told them to.
On Tuesdays the 2-for-1 day the rate of people arriving is a lot faster than they are departing departing the restaurant.
Now lets see what the (e1000 driver) bouncer was doing on that tuesday:
Bouncer allows 64 people in but doesnt move the queue to fill the empty slots until all 64 are all done eating. Because people are coming in faster than they are departing the restaurant, this means the 256 people queue is filling up and people arriving after that are sent back. All in the meantime these 64 empty slots exist in the front ;-> (i.e no replenishing is happening)
Lets say we hired a new bouncer who decides to allow the queue to move forward every time someone goes into the restaurant (this means one more person can move into the last slot of the queue). Of course the implicit assumption is everytime someone goes in, it is because someone is done eating (i.e a packet has been processed). If this mode was followed, then over a specific period of time more people will eat at that restaurant because relatively less people will be turned away by the new bouncer.
So as you can see, this really has nothing to do with the seating of the
restaurant (the weight); it has to do with how fast people can eat and how fast new ones can come in (assuming the bouncer was doing the right thing to begin with - which the e1000 wasnt).
On tuesdays people take longer to eat - to improve the capacity, we need to figure out why they take that long.
So onto the conclusions and to refute them;->
Bullet 1 of conclusion:
-Weight (the size of the restaurant) has no effect on this specific issue.
Get yourself a smarter bouncer;-> You are wrong if you think that the smarter bouncer is the one that allows only 10 people into the restaurant
on tuesdays. And 20 on wednesdays. To reiterate the smarter bouncer is the one that allows a new person into the restaurant every time someone leaves.
Bullet 2 of conclusion:
- As a result of the above, quota has nothing to do with how much work the system can handle. It's how fast the customers arrive and how fast they are fed.
Bullet 3 of conclusion:
- Thats exactly what napi does already. Interupts are never enabled unless there are absolutely no packets detected as coming in
Now on what the weight and quota are really for:
The drivers which have packets are scheduled on whats known as a
Deficit Round Robin(DRR) Algorithm to provide packets to the system.
This system is used to enforce fairness among nics with incoming packets.
If a 10Mbps nic has packets, it should not be overrun because a 10Gbps
card has more packets to send. The weight is the maximum opportunity
that a specific NIC will have packets to send onto the stack.
If you wanted to make a NIC more important than another, you give it a
higher weight (which is what Stephens patch will allow).
Overall on that thread:
I think the question that needs asking is why people are taking so long
in the restaurant?
Is it the fact they dont get their food on time, or is it because they dont
get their bills on time? Now that would be a very useful exercise. Unfortunately the majority of the thread was spent on explaining it on how to improve NAPI.
I think one thing that should have been turned off is contracking;
I actually dont think the act of replenishing the descriptor on every
packet is the best scheme - but thats an entirely different topic and i have said enough already.
to post comments)