Add zygote native fork loop
Do not return to Java mode between consecutive fork operations. This greatly reduces the Zygote overhead, since we no longer need to stop and restart Java daemons. By not switching back to Java mode, and being careful about what memory we touch between forks, we also keep the Zygote heaps much more stable, facilitating page sharing between the zygote and all its children. Under normal operation we should no longer allocate any memory in the zygote between forks. That applies to both the Java and C++ heap. This makes the zygote behave much more like the mental model many of us had assumed: It has nearly constant memory contents, which are copy-on-right cloned at each fork. This does not apply to the initial system server and webzygote forks, that are currently still handled differently. This includes 1. Add ZygoteCommandBuffer, and switch the argument parsing code to use it. This slightly reduces allocation and enables (3). 2. Support process specialization in the child, even when the arguments are already know, Leverages existing Usap code. 3. Add support for forking multiple child processes directly to the ZygoteCommandBuffer data structure. This directly uses the buffer internals, and avoids returning to Java so long as it can handle the zygote commands it sees. FUNCTIONALITY CHANGE: We now limit the total size of the zygote command, rather than the number of arguments. Initial performance observations: [ These are not perfect, since I'm comparing to numbers before I started. There may have been other moving parts, but they should be minor. ] System-server-observed launch latency: [Not the best metric, but easy to measure. In particular, this does not represent a significant reduction in application launch time.] Based on measuring the last 10 launches in a lightly used cf AOSP instance, the system server latency from requesting an app launch to response with the pid (which does not require the child to execute anything) went from an average of about 10.7(25) msecs to 6.8(9) and 7.9(16) in two tries with the CL. (The parenthetical numbers are maxima from among the 10; the variance appears to have decreased appreciably.) Dirty pages: The number of private dirty pages in the zygote itself appears to have decreased from about 4000 to about 2200. The number of dalvik-main private dirty pages went from about 1500 to nearly zero. Initially ART benchmarking service claim -1.88% in PSS. But this is not consistently repeatable. Drive-by fix: Call setAllowNetworkingForProcess on usap / native loop path. Bug: 159631815 Bug: 174211442 Test: Boots AOSP Change-Id: I90d2e381bada1b6c9857666d5e87372b6a4c1a70
Loading
Please register or sign in to comment