Title: Rigorous Performance Testing - Modern Testing Tools | Instart Logic
1RIGOROUS PERFORMANCE TESTING - MODERN TESTING
TOOLS
BY GRANT ELLIS
2This is my second blog post in a series of three.
If you havent already read my prior
post, Rigorous Performance Testing How We Got
Here, then mosey on over for some extra context.
3(No Transcript)
4A QUICK RECAP
We all know that the Internet has gone through
some serious evolution over the past 25 years
(really! 1989!). Data Centers and hosting
technologies have changed media has changed
(copper to fiber!) switches and peering points
have changed content has changed addressing and
routing has changed (IP Anycast) devices have
changed content has changed. In the last five
years alone, we have seen a transition to rich,
interactive, and dynamic sites and applications.
Clients are accessing those applications on
handheld devices instead of computers.
Connectivity is largely wireless instead of
wired. These are great technologies, and our
lives are better for it but these same
technologies do horrible things to web
performance. Similarly, measuring performance
has become quite complicated. Before the web, the
simple, venerable ping was sufficient while the
web was in its infancy. Then, as bandwidth
demands grew, we needed to use HTTP-aware testing
tools like cURL. With the adoption of the
commercial web, paradigms changed and it became
important to measure whole pages with tools like
Mercury Load Runner (now HP). When CDNs started
helping the middle-mile with decentralized
infrastructure, the testing tools themselves
needed to decentralize in order to capture
performance data with the CDNs in-line. Gomez
(now Compuware) and Keynote stepped in with
browser-based testing agents distributed all over
the middle-mile (backbone) of the Internet.
5USER EXPERIENCE METRICS FOR THE MODERN WEB
Now, the web is filled with super-dynamic sites
and applications. All of these applications are
dynamic on the client-side as well as the
server-side. The browser mechanics of a modern
application are complicated in themselves, and so
testing methodologies have become more
sophisticated. One huge differentiator is which
performance metrics are tracked.
FULLY LOADED
Prior testing tools would simply start a timer,
initiate the page load, and then stop the timer
after the underlying internet connection was
disused. In the Web 1.0 world, this was as
sufficient test the browser needed all the
content in order to actually render the page and
get that user happily using. On the modern Web
2.0, pages dont need everything in order to
be actually functional. Secondary content and/or
personalized content may be loaded asynchronously
(for example, below-the-fold loading), but the
page may be fully functional beforehand. Ternary
backend functions like analytics beacons have no
bearing on function from the users perspective.
With these points in mind, internet connection
idleness is no reflection of user experience, and
Fully Loaded has become less relevant.
6DOCUMENT COMPLETE
The Document Complete event is fired in the
browser when, well, when the document is
complete. Generally, this means that the page is
visually complete, responsive to the user (user
can search, scroll, click links, etc.). However,
the browser may still be loading asynchronous
content or firing beacons see Fully Loaded
above. However, this metric is imperfect as
well some sites deliberately defer loading of
prominent content until after Document Complete.
BEWARE
Some Front-End Optimization (FEO) packages can
defer execution of Javascript until after
Document Complete. Script deferral can be hugely
misleading. Visual completeness may occur sooner,
and Document Complete may be significantly
improved as well. Testers will even see evidence
of the visual completeness in videos, filmstrips,
and screen shots. However, despite visual
completeness, the page may not be responsive
until long after Document Complete users may
not be able to click links, scroll, or search.
From a user's perspective, this is hugely
frustrating and contributes to bounce rates.
Imagine if someone switched your browser window
for a screen shot, and you kept trying to click
links but nothing would happen!
7Perhaps more importantly, this tactic improves
Document Complete, but only at the cost of making
the metric meaningless altogether! One of the
primary tenets of Document Complete is that the
page is ready for the user. With script deferral,
the page is not ready for the user even if it
looks ready.
VISUALLY COMPLETE
Visually Complete is the moment that all visual
elements are painted on the screen and visible
for the user. Note that visual completeness is
not the same as functional. See the beware
block above!
START RENDER (OR RENDER START)
The Start Render event is fired in the browser
when something (anything!) is first painted on
the screen. The paint event may be the whole page
but it could instead be a single word, single
image, or single pixel. That may not sound
significant after all, if the content is not
there and the user cant interact, then what is
the value? Keep in mind that, before Start
Render fires, the user is staring at a blank
white browser screen, or, worse, the prior page
from which they just tried to navigate away from.
From the users perspective, Start Render is the
moment that the web site is clearly working
properly. There is significant evidence that
Abandonment (bounce rate) is correlated very
strongly with slow Start Render timings.
Arguably, Start Render is the most important
metric of all.
8FIRST BYTE
When the browser requests the base page, that
request must traverse the Internet (whether or
not a CDN is in play), then the hosting facility
must fetch (or assemble) the page, then the
response must traverse the Internet again back to
the device requesting the page. First Byte is the
time it takes for the first byte of the response
to reach the browser. So, First Byte is a
function of twice network latency plus server
latency. Other factors, like packet loss, may
also impact this metric. First Byte is
transparent for your users. However, the metric
is still important because it is critical path
for all browser functions.
SPEED INDEX
The Start Render event is fired in the browser
when something (anything!) is first painted on
the screen. The paint event may The Speed Index
is a metric peculiar to WebPageTest (more on that
below). Loosely speaking, the Speed Index is the
average amount of time for visual components to
be painted on the screen. More technically, if we
plotted all the paint events, then measured the
area above the curve, we would have the Speed
Index. That is, the Speed Index is the integral
of the area above the visual completeness
curve. Pages with a faster Start Render and a
faster Visually Complete would have a greater
percentage of the screen painted at any time so
the area above the curve would be less, and the
Speed Index would be less (lower is
better). WebPageTest has excellent technical
documentation on the Speed Index here. Note again
that a fast Speed Index is not the same as
functional page. See the beware block above!
9TOOLS THAT SUPPORT USER EXPERIENCE METRICS
REAL USER MONITORING (RUM) TOOLS
- Middle-mile (or backbone) testing tools are
great for measuring availability from the broader
Internet, but they never reflect the experience
your users are actually seeing especially those
using wireless connectivity (even Wi-Fi!). - RUM Tools are the best way to fill this gap.
Basically, performance data is collected from
your end users as they browser your site. RUM
tools track all of the above metrics (except
Speed Index) and represent exactly what your
users are seeing (with one or two exceptions
see below). RUM tools are really easy to install
just paste in a JavaScript tag. - Pros
- True user experience.
- Easy set-up
- Support for a broad range of browsers and
devices. - Collects data from various real-world connection
types including high-latency wireless and
packet-loss scenarios. - Open source tools are available (Boomerang.js).
10- Cons
- Inserting a third-party tag hurts performance to
a degree. The act of measuring performance with
RUM also hurts performance. - Safari doesnt support the browser APIs on which
RUM tools are dependent. Data for Safari browsers
will be a subset of the metrics above, and
remaining metrics are approximated using
JavaScript timers rather than using
hyper-accurate native browser code. - Outliers can be extreme and must be removed
before interpreting aggregate data. - RUM requires live traffic. It is not possible to
use RUM to measure performance of a site
pre-launch.
SYNTHETIC TOOLS
RUM tools are excellent for measuring
performance, but sometimes we really need
synthetic measurements especially for
evaluating performance of pre-production
environments (code/stack).
11WEBPAGETEST
WebPageTest is an open-source, community-supported
and widely endorsed tool for measuring and
analyzing performance. The testing nodes are
community-sponsored and freely available
however, it is possible to set up private testing
nodes for your own dedicated use. Scripting
capabilities are vastly improved on private nodes.
- Pros
- Measures user experience metrics, albeit from
backbone locations. - Supports traffic shaping, so testers can
configure specific bandwidth, latency, or
packet-loss scenarios. The traffic shaping is, of
course, synthetic and thus less variable than
true user connections but still this is an
excellent feature and quite representative of
real-world conditions. - Supports a subset of mobile clients, and a wide
array of browsers. - Cons
- Limited testing agent geographies available.
- Great analysis overall, but very limited
statistical support. - Extremely difficult to monitor performance on an
ongoing basis or on regular intervals for a fixed
period. Testers must set up private instances and
WebPageTest Monitor in order to monitor
performance. - Nodes are not centrally managed and therefore
have inconsistent base bandwidth and hardware
spec. Furthermore, they can sometimes be unstable
or unavailable. - Supports multi-step transactions only on private
nodes.
12CATCHPOINT
Catchpoint is a commercial synthetic testing
package. Catchpoint has a massive collection of
domestic and international testing nodes
available, and a powerful statistical analysis
package.
- Pros
- Tracks user experience metrics.
- Supports ongoing performance monitoring.
- Easy to provision complicated tests.
- Supports multi-step transactions.
- Captures waterfall diagrams for detailed
analysis. - Supports true mobile connection testing. The
agents themselves are desktop machines, but they
operate on wireless (Edge/3G/4G/LTE) modems. - Excellent statistical package.
- Cons
- No traffic shaping available. All backbone tests
have very high bandwidth and very low latency, so
results are not necessarily representative of
end-user performance. - No support for mobile devices (note that mobile
connections are supported).
13KEYNOTE SYSTEMS
Keynote is also a commercial synthetic testing
package. Keynote has existed for a LONG time, and
formerly measured only the Fully Loaded metric.
However, they have recently revised their service
to measure user experience metrics like Document
Complete and Start Render.
- Pros
- Tracks user experience metrics.
- Supports ongoing performance monitoring.
- Easy to provision complicated tests.
- Supports multi-step transactions.
- Captures waterfall diagrams for detailed
analysis. - Cons
- No traffic shaping available. All backbone tests
have very high bandwidth and very low latency, so
results are not necessarily representative of
end-user performance. - No support for mobile devices.
14PERFORMANCE DATA ANALYSIS
So, youve picked your performance metrics and
your tool, and now you have plenty of data. What
are the next steps? In the final installment of
this series, we will discuss statistical analysis
and interpretation of performance data sets.
15www.instartlogic.com/blog/