Think of web pages as mashups -- collections of objects (text, markup, images, scripts) fetched from anywhere on the Internet and rendered as a single coherent entity. The web enables the easy combining of content from far-flung sites to create a constantly updating and rich user experience. The mashup is one of the most wonderful things about the web and one of the things that has made it so successful. And also why it is so slow.
Consider this: when a user in Sydney, Australia browses to the ESPN.com home page, his or her browser fetches content from servers located in the following cities among others:
- Manly Vale, Australia
- Lehi, Utah
- Atlanta, Georgia
- Ashburn, Virginia
- Schaumburg, Illinois
- Chicago, Illinois
- Burbank, California
- Mountain View, California
- San Bruno, California
For me this is astounding. And it happens millions of times a day. In the face of such a technical marvel, I must have a lot of nerve to complain that the web is too slow. But it is! To understand why, we need to look at a much simpler page. I've constructed one here for us to study.
It's not much to look at but it will hopefully serve our purpose. The page consists of four objects: 1 html file named simple.html and 3 script files named blue.js, green.js, and yellow.js. The script files are aptly named as each has the job of displaying the rectangle of its respective color. While all of the files are being served by my caffeinatetheweb.com server, they could just as well be hosted by servers on four different continents.
In my test over a 20 Mbps FIOS connection, the page completed in 320 milliseconds. This 65 KB site should have loaded in 65 KB / 20 Mbps = 26 milliseconds, ignoring other factors like latency. So, while 320 ms is fast, it's much slower than we'd expect based on the speed of our connection alone. Looking at the display of the page in 100 millisecond intervals thanks again to webpagetest.org, we begin to understand why it took so long:
The web page objects, represented by the rectangles, appear to load one at a time. If we look at the network level using the standard tool for web site performance analysis, the "waterfall chart" below, we see that this is precisely what is happening:
The bars in the chart above represent the time taken to fetch each object. The page loads like a staircase, one step at a time. Why is this? Why don't we simply request all four objects as soon as we click on the link to visit the page? The answer is that references to objects we download to render a web page are often found in other prior objects -- web pages have a nested structure. The initial request to any web site is only for the object referred to by the address of the link we click (in this case http://caffeinatetheweb.com/simple.html). When we download that object we find references to other objects and those other objects contain references to still more objects. We follow the chain of nested objects to render the page. My three rectangles page is silly but it accurately demonstrates one aspect of a typical mashup page like ESPN: nested-ness kills page load performance by reducing concurrency and, more importantly, the most obvious fix to large slow sites is to improve concurrency.
With this new insight in hand, let's revisit our 6 second ESPN page load from the previous post. Looking at the waterfall chart below, the basic staircase shape from the three rectangles page is still present, but it's interrupted by short periods of cascading bars (hence the name waterfall chart) when multiple objects are in fact being transferred simultaneously.
But despite these periods of multiple simultaneous transfers, the above chart depicts a site with very low overall concurrency. While it changes over the course of the page load, the average concurrency for the entire ESPN page load is about two, which explains the 12% bandwidth utilization we're seeing. If we can double that and request four at a time, the page would load twice as fast. And other things being equal we'd still only be using roughly 24% of our available bandwidth.
Nested-ness is not the only cause for low concurrency but to me it seems the most intractable as it represents a tradeoff between optimal site design and page load time. In general, do we want the creators of web content to be worrying about the degree to which the objects on the page are fetched concurrently? The beauty of the web is it allows page designers not to have to think about the topology of where content is hosted. I don't know why ESPN is visiting 10 cities on two continents to gather the content needed to render its home page, but no doubt they have their reasons. Ideally content producers can focus on making wonderful sites and web performance software engineers and protocol designers can figure out how to make them render blazingly fast. Does there have to be a trade-off between performance and the mashup? Can't we have it both ways? I'll explain in the next post how I believe we can.