net: Reset NAPI bit if IPI failed
During hotplug if an RPS CPU goes offline, then there is a possibility that the IPI delivery to the RPS core might fail, this happens in the cases when unruly drivers use netif_rx API in the wrong context. This happens due to two reasons a) Firstly using netif_rx API in non preemptive context leads to enough latencies that the IPI delivery might fail to an RPS core. This is because the softIRQ trigger will become unpredictable. b) by using netif_rx it becomes an architectural issue where we are trying to do two things in two different contexts. We set the NAPI bit in context and sent the IPI in other context. Now since the context switch is allowed, the remote CPU is allowed to go finish its hotplug. If there was no context switch in the first place, which typically happens by either using the correct version of netif_rx or switching to NAPI framework, then the remote CPU is not allowed to go to CPU DOWN state. This is by design since hotplug framework causes the remote dying CPU to wait until atleast one context switch happens on all other CPUS. If preemption is disabled then the dying CPU has to wait until preemption is enabled and a context switch happens. This patch catches these unruly drivers and handles IPI misses by clearing NAPI sate on remote RPS CPUs Please refere here for more documentation on hotplug and preemption cases https://lwn.net/Articles/569686/ CRs-Fixed: 2062245 Change-Id: I072f91bdb4d7e444e3624e8e010ef1b66a67b1ed Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Loading
Please register or sign in to comment