Small changes to the memory benchmark

Many people used the memory benchmark from my site and filed bugs for memory leaks, GC pause times and similar. Most of them got fixed over the last few months.

I made some small changes to the script today because the main benchmark window kept references to already closed windows. Now I manually delete the references. I also removed the recently closed 280slides.com page. If anybody sees problems with other pages, please tell me. I am happy to add new pages.

Everybody that is interested in memory behavior across browsers should check out recent measurements from PC Advisor. The results are getting better and better. Congratulations to Nick Nethercote and the memshrink team!

Posted in Uncategorized | 1 Comment

Scalability cont.

Thanks for all the comments on my last post! Due to popular demand I was also running the tests on Opera. During the first run Opera crashed after opening 140 tabs. The crash ID was “crash20110803153559″, whatever that means.
The following runs were more successful but I noticed a slowdown during rendering after about 120 tabs.
The timing results for Opera 11.50 are:

real    6m55.074s
user   5m23.217s
sys     1m13.607s

The results are similar to Firefox. Opera uses a single process with 290 threads and 2.5GB of RAM. For comparison: Firefox uses 27 threads and 2.02GB of RAM.

I was also running the V8 benchmarks on Opera with 150 other tabs:
Score: 2719
Richards: 2665
DeltaBlue: 2027
Crypto: 3213
RayTrace: 4440
EarleyBoyer: 3135
RegExp: 1008
Splay: 4515

Opera with a single tab:
Score: 4028
Richards: 3901
DeltaBlue: 2999
Crypto: 4625
RayTrace: 5596
EarleyBoyer: 4744
RegExp: 1543
Splay: 7766

The JavaScript performance is pretty good. Firefox got 3954 for 150 open tabs and 5125 with a single tab. The browser is still responsive but closing 150 tabs at once takes about 30 seconds. I can’t provide any insights to the Opera event loop. Maybe some developers can help me here?

Posted in Firefox GC | 21 Comments

Scalability

I like breaking stuff! Whenever I try a new feature it breaks. This used to be an annoying “skill” when I started programming but over the time I learned to appreciate it.

Currently I am testing various memory allocation strategies for Firefox. Over the last few weeks we have learned how important good memory allocation is. We saw the impressive memory reduction for regular workloads but I was still a little bit worried about huge workloads. How do we scale for 100+ tabs?

Testing scalability shouldn’t be too hard. Take the browser and open many many tabs. It gets boring if you do it manually so I borrowed a script from Nick Nethercote. The new version includes about 150 web pages from the most popular web page list. The script opens a new page every 1.5 seconds until all 150 pages are opened, waits 90 seconds until all pages are loaded and shows a text-box that the test has finished. I close all windows except one and close the browser afterwards. The results are measured with the time command on my 1.5 year old Dual-Core MacBook Pro with 8GB RAM. The script can be found here if you want to try it yourself.

For a current nightly build of Firefox I get following:

real     6m14.406s
user    3m55.302s
sys      0m49.366s

I also tried it with a canary build of Chrome:

real    28m55.573s
user    21m58.383s
sys     14m40.860s

Huh that’s a big difference! I realized that Chrome has a hard time opening new sites after about 70 open pages. With 150 sites I can’t even scroll on a normal page. Firefox instead is still pretty snappy and scrolling is like there is no other open tab.

So what’s the reason? Firefox has a single-process but multiple compartments model and uses 27 threads and 2.02GB RAM for all 150 tabs. You can find a short or long description about our compartment model.

Chrome has a multi-process model where functionality is separated into different processes. The Google Chrome Renderer process is using 100% of the CPU all the time and gets up to 1.5GB for 150 tabs. The main Google Chrome process uses about 212 threads and 1.3GB. There is also an additional Helper process with 200 MB. Wait, but isn’t Google Chrome multi-process? Where are the processes? It turns out that opening a new page with JavaScript doesn’t automatically create a new process. I looked around and found following workaround:

Chromium allows pages to fork a new rendering process via JavaScript, in a way that is compatible with the links that appear in Gmail messages. A page can open a new tab to about:blank, set the new tab’s window.opener variable to null, and then redirect the new tab to a cross-site URL. In that case, a renderer-initiated navigation will cause Chromium to switch renderer processes in the new tab. Thus, opening a link from Gmail (or pages with similar logic) will not adversely affect the Gmail process.

I wrote a new script with this workaround and get following results:

real     27m58.560s
user    41m5.719s
sys      20m35.248s

Now I see 43 Google Chrome Renderer, the main Google Chrome and a Helper Process. The resident size in about:memory is a little bit above 5GB and the browser becomes unresponsive. I have to close the browser without closing individual sites because the close-windows button in my script doesn’t work with the multi-process model. I also notice an uneven mapping between sites and processes. Some processes only host 2-3 sites and one process hosts about 50% of all sites. Maybe a bug? The main Google Chrome process has 368 threads with 150 open sites and up to 420  during browser shutdown. A regular renderer process has 6 threads. Well, all that complexity but the system still doesn’t scale. It even got worse. Towards the end of the test I can see that the browser performance stagnates and opening a new site takes forever.

My ultimate test is running the V8 benchmark after all 150 pages are fully loaded.

Firefox Score: 3954
Richards: 8014
DeltaBlue: 4149
Crypto: 8781
RayTrace: 3007
EarleyBoyer: 3112
RegExp: 959
Splay: 5764

In comparison our scores with a single tab:

Firefox Score: 5125
Richards: 7925
DeltaBlue: 5005
Crypto: 8791
RayTrace: 3976
EarleyBoyer: 5003
RegExp: 2188
Splay: 6120

I also tried running the V8 benchmarks with Chrome but the browser stopped rendering and the main Google Chrome process was always at 100% CPU performance.

My conclusion: If you have many open tabs, use Firefox!

Posted in Firefox GC | 168 Comments

It works!

Firefox 7  Aurora build is now available and users already measure the impact of our memory efforts. One example can be found here! Sweet!

Posted in Firefox GC | Leave a comment

What’s new in Firefox 6 from the GC side?

Firefox 7 comes with awesome memory fixes but there are also major Garbage Collection (GC) improvements in Firefox 6. Firefox has a stop-the-world mark-and-sweep garbage collector with conservative stack scanning. In prior work we reduced the workload of an average GC to the compartment (or tab) that allocates most memory and therefore is a good target to reclaim most memory.

For huge workloads within a single tab we still end up with long GC pauses because every single reachable object has to be marked and every unreachable object has to be finalized. Finalization in JavaScript has a different meaning than in Java where the language supports an explicit finalization function. JavaScript doesn’t support such behavior but we call a VM internal finalization function on every reclaimed object that frees dynamically allocated memory for example.

Depending on the workload we can get GC pauses up to  200 msec or even more on low-end laptops. If most objects are reachable we have high marking cost and if most objects are unreachable we spend most time in sweeping. We talked about an improvement of the sweeping phase for a long time and a few months ago I landed a patch where most finalization is moved from the main thread to a background thread. We can’t do all finalization on the background thread because the browser uses external finalizes that rely on the finalization during the GC event. Our measurements show that we usually finalize more than 95% of all objects on the background thread.

The main JavaScript thread can still allocate new objects during the background thread is running. We simply allocate new memory if the finalization is not done yet. This gives us a 300 points or 10% improvement on the V8 benchmark scores and regular users will see a GC pause time reduction between 20% and 80%.

Bill McCloskey is also working on a generational and incremental GC model that will show further improvements.

Posted in Firefox GC | 45 Comments

Fragmentation

After fixing a very annoying GC trigger problem I noticed that our heap doesn’t shrink to its original size. During startup we usually have 20 – 30MB in our JavaScript heap. After random surfing I noticed that it doesn’t go back even if I close all tabs and trigger the GC many times. The heap size stays at about 100MB. There are two possible reasons for that: 1) leaks and 2) fragmentation.

We allocate 1MB chunks from the OS and once they are completely empty, we can return them to the OS. Our problem was that we allocate all sort of objects in these chunks and we didn’t use obvious life-time information during allocation. A short profiling showed that after random surfing we end up in a situation where 30% of our 1MB chunks are only alive because on arena or 4KB are used. That’s terrible!

I implemented a patch that separates long and short-lived objects by just placing system and user objects into separate chunks. Long lived objects are all system objects that are created by the browser and not from a web-page and all immutable strings that are shared across the whole JavaScript VM. The main advantage of the patch is that single long-lived objects don’t keep whole 1MB chunks alive and we can return empty chunks to the OS much quicker.

The outcome was amazing! Measurements show that we reduce the memory footprint of the JavaScript heap by 30% on average during regular surfing and 500% if we close all tabs after surfing. In numbers…

Closing all tabs after surfing:

  • 108,003,328 B — js-gc-heap (without patch)
  • 20,971,520 B — js-gc-heap (with patch)

Memory footprint of the whole browser:

  • 310,890,496 B — resident (without patch)
  • 219,856,896 B — resident (with patch)

WOW a major improvement for all Firefox users that is already in the nightly and aurora builds. It will be shipped with the Firefox 7 release for everybody!

Posted in Firefox GC | 5 Comments

Firefox memory bloat fix

Firefox users complained a lot about the memory situation since the Firefox 4 release. The browser consumes endless memory if it stays idle for a while and after an allocation heavy task it doesn’t trigger a GC to shrink the memory footprint.

One of the reasons why this happened is the landing of the Bug 558451 - Merge JSScope into JSScopeProperty, JSObject“. JSScopes were allocated with malloc and not on our JS heap. These off-heap allocations triggered most of our GCs. Without these allocations, our main GC trigger vanished over night. I did a quick fix in Bug 592007 - TM: New Scope patch changes GC behavior in browser“. The main goal was to imitate our old GC behavior. Our off-heap allocation trigger is pretty simple: Once we hit 128MB of mallocs after the last GC trigger a new GC.

The problem was that the new trigger has to monitor and predict heap growth rather than simply add off-heap allocations. I introduced a heap-growth-factor that allows the JS heap to grow by 300% before we force another GC. So the additional trigger was based on the amount of memory that survives the GC. This number has to allow the heap to grow very fast because we don’t want to trigger a GC during a critical page-load or benchmark. With these changes, the GC behavior was almost the same as before.

Well now we can see that it was not good enough because we only trigger the GC when we allocate objects. This means that we don’t perform a GC even if we have a huge heap because the trigger limit is not reached. Running the V8 benchmark suite is an example for this bad behavior. Right at the end of the suite the splay benchmark allocates a huge splay tree. We have to increase our JS heap and the amount of reachable memory after each GC is around 300MB. Our GC trigger allows the heap to grow 3x before the next GC is performed.

So after the benchmark we end up with a heap size between 300 and 900MB and we don’t perform a GC until the trigger limit (900MB) is reached. This can be forever if you just read a blog or surf on some pages that are not JS allocation heavy. I did most of my testing on my MacBook Pro with 4GB of RAM. So I never realized this bad behavior. Recently I bought a new netbook with 1GB RAM and running the V8 benchmark suite on it was painful because afterwards my browser used up all memory and no GC made my browser useable again.

In order to make FF work on my netbook I implemented Bug 656120 - Increase GC frequency: GC occasionally based on a timer (except when completely idle)“. The idea is now that we perform a GC even if the trigger is not reached after 20 seconds. This should shrink the heap without hurting any allocation heavy pages like benchmarks or animations.

Nick Nethercote found that this patch fixed many other bugs that other users filed. It didn’t make it into the FF 6 release. Maybe not enough people complained about the memory bloat problem or we should just buy all release drivers a low-end netbook :) Well the good thing is that the fix will be in FF7! Everybody should see a reduced memory footprint and it should definitely help users with limited devices like netbooks!

PS: This patch also reduces fragmentation but that’s for the next post!

Posted in Firefox GC | 14 Comments

Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!

Posted in Uncategorized | 38 Comments