Skip to content
Commit 69d44b0b authored by Hans Boehm's avatar Hans Boehm
Browse files

Add zygote native fork loop

Do not return to Java mode between consecutive fork operations.
This greatly reduces the Zygote overhead, since we no longer need to
stop and restart Java daemons.

By not switching back to Java mode, and being careful about what memory
we touch between forks, we also keep the Zygote heaps much more stable,
facilitating page sharing between the zygote and all its children.
Under normal operation we should no longer allocate any memory in the
zygote between forks. That applies to both the Java and C++ heap.
This makes the zygote behave much more like the mental model many
of us had assumed: It has nearly constant memory contents, which are
copy-on-right cloned at each fork. This does not apply to the initial
system server and webzygote forks, that are currently still handled
differently.

This includes

1. Add ZygoteCommandBuffer, and switch the argument parsing code to use it.
This slightly reduces allocation and enables (3).

2. Support process specialization in the child, even when the arguments
are already know, Leverages existing Usap code.

3. Add support for forking multiple child processes directly to the
ZygoteCommandBuffer data structure. This directly uses the buffer
internals, and avoids returning to Java so long as it can handle the
zygote commands it sees.

FUNCTIONALITY CHANGE:

We now limit the total size of the zygote command, rather than the
number of arguments.

Initial performance observations:

[ These are not perfect, since I'm comparing to numbers before I
started. There may have been other moving parts, but they should be
minor. ]

System-server-observed launch latency:

[Not the best metric, but easy to measure. In particular, this
does not represent a significant reduction in application launch
time.]

Based on measuring the last 10 launches in a lightly used cf AOSP
instance, the system server latency from requesting an app launch to
response with the pid (which does not require the child to execute
anything) went from an average of about 10.7(25) msecs to 6.8(9) and
7.9(16) in two tries with the CL. (The parenthetical numbers are
maxima from among the 10; the variance appears to have decreased
appreciably.)

Dirty pages:

The number of private dirty pages in the zygote itself appears to have
decreased from about 4000 to about 2200. The number of dalvik-main
private dirty pages went from about 1500 to nearly zero.

Initially ART benchmarking service claim -1.88% in PSS. But this is not
consistently repeatable.

Drive-by fix:

Call setAllowNetworkingForProcess on usap / native loop path.

Bug: 159631815
Bug: 174211442
Test: Boots AOSP
Change-Id: I90d2e381bada1b6c9857666d5e87372b6a4c1a70
parent 30d9bfa0
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment