Wednesday, December 21, 2016

How V8 measures real-world performance

Over the last year the V8 team has developed a new methodology to measure and understand real-world JavaScript performance. We’ve used the insights that we gleaned from it to change how the V8 team makes JavaScript faster. Our new real-world focus represents a significant shift from our traditional performance focus. We’re confident that as we continue to apply this methodology in 2017, it will significantly improve users’ and developers’ ability to rely on predictable performance from V8 for real-world JavaScript in both Chrome and Node.js.

The old adage “what gets measured gets improved” is particularly true in the world of JavaScript virtual machine (VM) development. Choosing the right metrics to guide performance optimization is one of the most important things a VM team can do over time. The following timeline roughly illustrates how JavaScript benchmarking has evolved since the initial release of V8:

Evolution of JavaScript benchmarks.

Historically, V8 and other JavaScript engines have measured performance using synthetic benchmarks. Initially, VM developers used microbenchmarks like SunSpider and Kraken. As the browser market matured a second benchmarking era began, during which they used larger but nevertheless synthetic test suites such as Octane and JetStream.

Microbenchmarks and static test suites have a few benefits: they’re easy to bootstrap, simple to understand, and able to run in any browser, making comparative analysis easy. But this convenience comes with a number of downsides. Because they include a limited number of test cases, it is difficult to design benchmarks which accurately reflect the characteristics of the web at large. Moreover, benchmarks are usually updated infrequently; thus, they tend to have a hard time keeping up with new trends and patterns of JavaScript development in the wild. Finally, over the years VM authors explored every nook and cranny of the traditional benchmarks, and in the process they discovered and took advantage of opportunities to improve benchmark scores by shuffling around or even skipping externally unobservable work during benchmark execution. This kind of benchmark-score-driven improvement and over-optimizing for benchmarks doesn’t always provide much user- or developer-facing benefit, and history has shown that over the long-term it’s very difficult to make an “ungameable” synthetic benchmark.

Measuring real websites: WebPageReplay & Runtime Call Stats

Given an intuition that we were only seeing one part of the performance story with traditional static benchmarks, the V8 team set out to measure real-world performance by benchmarking the loading of actual websites. We wanted to measure use cases that reflected how end users actually browsed the web, so we decided to derive performance metrics from websites like Twitter, Facebook, and Google Maps. Using a piece of Chrome infrastructure called WebPageReplay we were able to record and replay page loads deterministically.

In tandem, we developed a tool called Runtime Call Stats which allowed us to profile how different JavaScript code stressed different V8 components. For the first time, we had the ability not only to test V8 changes easily against real websites, but to fully understand how and why V8 performed differently under different workloads.

We now monitor changes against a test suite of approximately 25 websites in order to guide V8 optimization. In addition to the aforementioned websites and others from the Alexa Top 100, we selected sites which were implemented using common frameworks (React, Polymer, Angular, Ember, and more), sites from a variety of different geographic locales, and sites or libraries whose development teams have collaborated with us, such as Wikipedia, Reddit, Twitter, and Webpack. We believe these 25 sites are representative of the web at large and that performance improvements to these sites will be directly reflected in similar speedups for sites being written today by JavaScript developers.

For an in-depth presentation about the development of our test suite of websites and Runtime Call Stats, see the BlinkOn 6 presentation on real-world performance. You can even run the Runtime Call Stats tool yourself.


Making a real difference

Analyzing these new, real-world performance metrics and comparing them to traditional benchmarks with Runtime Call Stats has also given us more insight into how various workloads stress V8 in different ways.

From these measurements, we discovered that Octane performance was actually a poor proxy for performance on the majority of our 25 tested websites. You can see in the chart below: Octane’s color bar distribution is very different than any other workload, especially those for the real-world websites. When running Octane, V8’s bottleneck is often the execution of JavaScript code. However, most real-world websites instead stress V8’s parser and compiler. We realized that optimizations made for Octane often lacked impact on real-world web pages, and in some cases these optimizations made real-world websites slower.

Distribution of time running all of Octane, running the line-items of Speedometer and loading websites from our test suite on Chrome M57.

We also discovered that another benchmark was actually a better proxy for real websites. Speedometer, a WebKit benchmark that includes applications written in React, Angular, Ember, and other frameworks, demonstrated a very similar runtime profile to the 25 sites. Although no benchmark matches the fidelity of real web pages, we believe Speedometer does a better job of approximating the real-world workloads of modern JavaScript on the web than Octane.

Bottom line: A faster V8 for all

Over the course of the past year, the real-world website test suite and our Runtime Call Stats tool has allowed us to deliver V8 performance optimizations that speed up page loads across the board by an average of 10-20%. Given the historical focus on optimizing page load across Chrome, a double-digit improvement to the metric in 2016 is a significant achievement. The same optimizations also improved our score on Speedometer by 20-30%.

These performance improvements should be reflected in other sites written by web developers using modern frameworks and similar patterns of JavaScript. Our improvements to builtins such as Object.create and Function.prototype.bind, optimizations around the object factory pattern, work on V8’s inline caches, and ongoing parser improvements are intended to be generally applicable improvements to underlooked areas of JavaScript used by all developers, not just the representative sites we track.

We plan to expand our usage of real websites to guide V8 performance work. Stay tuned for more insights about benchmarks and script performance.

Posted by the V8 team

Thursday, December 15, 2016

V8 ❤️ Node.js

Node's popularity has been growing steadily over the last few years, and we have been working to make Node better. This blog post highlights some of the recent efforts in V8 and DevTools.


Debug Node.js in DevTools

You can now debug Node applications using the Chrome developer tools. The Chrome DevTools Team moved the source code that implements the debugging protocol from Chromium to V8, thereby making it easier for Node Core to stay up to date with the debugger sources and dependencies. Other browser vendors and IDEs use the Chrome debugging protocol as well, collectively improving the developer experience when working with Node.

ES6 Speed-ups

We are working hard on making V8 faster than ever. A lot of our recent performance work centers around ES6 features, including promises, generators, destructors, and rest/spread operators. Because the versions of V8 in Node 6.2 and onwards fully support ES6, Node developers can use new language features "natively", without polyfills. This means that Node developers are often the first to benefit from ES6 performance improvements. Similarly, they are often the first to recognize performance regressions. Thanks to an attentive Node community, we discovered and fixed a number of regressions, including performance issues with instanceof, buffer.length, long argument lists, and let/const.

Fixes for Node.js vm module and REPL coming

The vm module has had some long standing limitations. In order to address these issues properly, we have extended the V8 API to implement more intuitive behavior. We are excited to announce that the vm module improvements are one of the projects we’re supporting as mentors in Outreachy for the Node Foundation. We hope to see additional progress on this project and others in the near future.

Async/await

With async functions, you can drastically simplify asynchronous code by rewriting program flow by awaiting promises sequentially. Async/await will land in Node with the next V8 update. Our recent work on improving the performance of promises and generators has helped make async functions fast. On a related note, we are also working on providing promise hooks, a set of introspection APIs needed for the Node AsyncHook API.

Want to try Bleeding Edge Node.js?

If you’re excited to test the newest V8 features in Node and don’t mind using bleeding edge, unstable software, you can try out our integration branch here. V8 is continuously integrated into Node before V8 hits Node master, so we can catch issues early. Be warned though, this is more experimental than Node master.

Posted by Franziska Hinkelmann, Node Monkey Patcher

Friday, December 2, 2016

V8 Release 5.6

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.6, which will be in beta until it is released in coordination with Chrome 56 Stable in several weeks. V8 5.6 is filled with all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release.

Ignition and TurboFan pipeline for ES.next (and more) shipped

Starting with 5.6, V8 can optimize the entirety of the JavaScript language. Moreover, many language features are sent through a new optimization pipeline in V8. This pipeline uses V8’s Ignition interpreter as a baseline and optimizes frequently executed methods with V8’s more powerful TurboFan optimizing compiler. The new pipeline activates for new language features (e.g. many of the new features from the ES2015 and ES2016 specifications) or whenever Crankshaft (V8’s “classic” optimizing compiler) cannot optimize a method (e.g. try-catch, with).

Why are we only routing some JavaScript language features through the new pipeline? 

The new pipeline is better-suited to optimizing the whole spectrum of the JS language (past and present). It's a healthier, more modern codebase, and it has been designed specifically for real-world use cases including running V8 on low-memory devices.

We've started using the Ignition/TurboFan with the newest ES.next features we've added to V8 (ES.next = JavaScript features as specified in ES2015 and later) and will route more features through it as we continue improving its performance. In the middle term, the V8 team is aiming to switch all JavaScript execution in V8 to the new pipeline. However, as long as there are still real-world use cases where Crankshaft runs JavaScript faster than the new Ignition/TurboFan pipeline, for the short term we'll support both pipelines to ensure that JavaScript code running in V8 is as fast as possible in all situations.

So, why does the new pipeline use both the new Ignition interpreter and the new Turbofan optimizing compiler?

Running JavaScript fast and efficiently requires having multiple mechanisms, or tiers, under the hood in a JavaScript virtual machine to do the low-level busywork of execution. For example, it’s useful to have a first tier that starts executing code quickly, and then a second optimizing tier that spends longer compiling hot functions in order to maximize performance for longer-running code.

Ignition and TurboFan are V8’s two new execution tiers that are most effective when used together. Due to efficiency, simplicity and size considerations, TurboFan is designed to optimize JavaScript methods starting from the bytecode produced by V8's Ignition interpreter. By designing both components to work closely together, there are optimizations that can be made to both because of the presence of the other. As a result, starting with 5.6 all functions which will be optimized by TurboFan first run through the Ignition interpreter. Using this unified Ignition/TurboFan pipeline enables the optimization of features that were not optimizable in the past, since they now can take advantage of TurboFan's optimizations passes. For example, by routing Generators through both Ignition and TurboFan, Generators runtime performance has nearly tripled.

For more information on V8's journey to adopt Ignition and TurboFan please have a look at Benedikt's dedicated blog post.

Performance improvements

V8 5.6 delivers a number of key improvements in memory and performance footprint.

Memory-induced jank

Concurrent remembered set filtering was introduced: One step more towards Orinoco.

Greatly improved ES2015 performance

Developers typically start using new language features with the help of transpilers because of two challenges: backwards-compatibility and performance concerns.

V8's goal is to reduce the performance gap between transpilers and V8’s “native” ES.next performance in order to eliminate the latter challenge. We’ve made great progress in bringing the performance of new language features on-par with their transpiled ES5 equivalents. In this release you will find the the performance of ES2015 features is significantly faster than in previous V8 releases, and in some cases ES2015 feature performance is approaching that of transpiled ES5 equivalents.

Particularly the spread operator should now be ready to be used natively. Instead of writing ...
// Like Math.max, but returns 0 instead of -∞ for no arguments.
function specialMax(...args) {
    if (args.length === 0) return 0;
    return Math.max.apply(Math, args);
}
… you should now be able to write ...
function specialMax(...args) {
    if (args.length === 0) return 0;
    return Math.max(...args);
}
… and get similar performance results. In particular 5.6 includes speed-ups for the following micro-benchmarks:
See the chart below for a comparison between V8 5.4 and 5.6.

Comparing the ES2015 feature performance of V8 5.4 and 5.6
 Source:  https://fhinkel.github.io/six-speed/ (Cloned from http://kpdecker.github.io/six-speed/)

This is just the beginning, a lot more to follow in upcoming releases!

Language features

String.prototype.padStart / String.prototype.padEnd

String.prototype.padStart and String.prototype.padEnd are the latest stage 4 additions to ECMAScript. These library functions are officially shipped in 5.6.
Note: Unshipped again.

WebAssembly browser preview

Chromium 56 (which includes 5.6) is going to ship the WebAssembly browser preview. Please refer to the dedicated blog post for further information.

V8 API

Please check out our summary of API changes. This document is regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.6 -t branch-heads/5.6' to experiment with the new features in V8 5.6. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team