Hand-rolled linked list for Parcel performance.
While recently looking at pprof data, we noticed that Parcel.obtain() and recycle() were some of the most heavily used methods inside the system process. On the surface, this makes sense because these methods are typically invoked at least twice for every Binder call. Internally these methods had been maintaining a pool of cached Parcel objects to avoid GC pressure, but unfortunately it was using an array which resulted in O(n) scanning under heavy load, increasing lock contention encountered by all Binder calls. This change greatly reduces that contention by using a hand-rolled linked list approach, mirroring android.os.Message. For a 1-thread benchmark, this new approach has almost 2x throughput, and for a 16-thread benchmark it has almost 8x throughput. As part of making this change we evaluated several approaches, including using pure-GC with no pooling, a single AtomicReference, and a pool of several AtomicReferences. To measure these we wrote ParcelPoolBenchmark which simulates various levels of Binder load using 1, 4 and 16 threads. Below are the relative benchmark results compared to the previous approach before this CL: 1 thread 4 threads 16 threads Pure GC 131.74% 32.76% 43.90% Single AR 95.22% 25.54% 13.66% Pooled AR 57.65% 16.21% 11.55% Linked list 52.66% 18.06% 12.55% On balance, the linked list approach performs well across the board, and we bias towards it over the two AtomicReference approaches since it performs slightly better on the single-threaded case, which is the most representative of typical Binder load across all processes. Bug: 165032569 Test: ./frameworks/base/libs/hwui/tests/scripts/prep_generic.sh little && atest CorePerfTests:android.os.ParcelObtainPerfTest Change-Id: I190b1c8f7fd59855c3c2d36032512279691e2c04
Loading
Please register or sign in to comment