|By Kelly Tetterton||
|April 13, 2004 12:00 AM EDT||
Load testing your applications is much like flossing your teeth or taking out the garbage: you know you should do it; you know (vaguely, perhaps) that there will be consequences if you don't do it - and yet somehow it always seems to slip to the last item on the to-do list. Until, of course, there's a crisis of some kind.
It's useful to reiterate why it's worthwhile to load-test before a crisis. In the first place, you want to be able to give yourself and/or your client confidence that the application you're building can handle the expected traffic - and the only way to do that is to actually simulate that traffic. It's a little too easy to assume that if an application performs well under "normal" testing conditions that it will behave the same way (if only, perhaps, a little slower) under higher-traffic conditions. In many cases, however, that's just not so.
Second, as a developer it's important to have confidence in your application. Are there any hidden problems in your code? Load testing will often find anything lurking in the corners that hasn't been brought to light by functional testing. This is obviously important for intangible reasons (who wouldn't like to brag about their application?), but it's also important for internal and external business reasons:
- Does the application need to be re-designed in any way?
- Can the current database schema support the application under heavy load?
- Do the queries need to be optimized?
- Can this code base be reused for other high-traffic applications?
- Does the application meet current expectations? Will this application scale well for future growth?
- If it will scale, then what are the budget considerations for you and/or your client? Do you need additional hardware? Load-balancing software?
- Will this application make you and/or your client happy?
And yet, despite all those very good reasons in favor of load-testing applications prior to launch, my own company, Duo Consulting, didn't do it. Budgets were always too tight, it seemed, to make this part of the project plan - and besides, doesn't everyone know that you have to be a very large corporation to be able to afford any load testing? As a small company, we felt that we just weren't equipped to do it.
The crisis inevitably came. One of our clients is a local park district that offers online registration four times a year. The number of online registrations had been steadily increasing each season, and in particular, the first few hours of each registration period were becoming increasingly problematic. At 9:00 a.m. on the first day of registration, parents all around the greater Chicago area were poised over their keyboards to try to get their children into a limited number of slots in the park district programs. Finally, this past year, the intense traffic became too much and our production server hung right in the middle of the heaviest registration period.
Load testing was now no longer optional; it was paramount. Could our application handle any kind of serious traffic, or was the recent registration experience somehow exceptional? Did we need to rewrite any major portions of the code? Did we need to throw more hardware at the problem? What were the limits of the application, and could we scale up in time for the next registration period? And how could we afford the actual load testing itself?
In desperation, we turned to Microsoft's Web Application Stress (WAS) Tool. It's a free tool that we could download immediately, and although it's not a high-end tool like SilkPerformer, it turned out to be just right for our needs. This shouldn't be taken as a final recommendation - there are many options out there, so there's likely to be something suited to your budget and/or platform. One quick place to look for a rundown of tool options is the Software QA/Test Resource Center at www.softwareqatest.com/qatweb1.html#LOAD. There are also services that will perform load testing for you, and we even briefly considered bringing in Macromedia to help us with analysis and load testing. The cost for such a service can be prohibitive (the Macromedia consult, for instance, would have cost almost $15,000), however, and we felt that ultimately we were in the best position to analyze the site in depth. We already knew the code and the database architecture; we just needed the tools to get started.
That didn't mean we could start immediately. Setting up the environment and conditions for load testing - especially if it's the first time you've done it - does take a certain amount of time. First, we needed to set up a Web server that could approximate the production environment as closely as possible.
It's not necessary to seek perfection in this regard - it may be difficult to find a machine to dedicate to testing that has comparable hardware and memory to what's in production - but it is important to get as close to the production environment as possible. This means, at a minimum, making sure the code base is the same, the version of ColdFusion is the same (including any updates and/or hotfixes), the IIS settings are the same, the OS is the same (including any patches), the ODBC drivers are the same, and the ColdFusion administrator settings are the same.
If you do have hardware, memory, etc., that mimics your production site, that's even better - the closer you can get to your production environment, the more accurate and useful your tests will be. But don't despair if you can't create an exact duplicate; for instance, we didn't have an available machine that was the exact hardware and memory equivalent of our production server, yet our testing was still very, very useful.
You will also need to be able to point the site to a test database server (i.e., repurpose your development database). As far as your test clients are concerned, you should plan to set up multiple clients with WAS - one client machine will probably not approximate the kind of traffic you'll want to test against, and beyond a certain point, you're testing load on the client machine rather than the Web server.
We initially tried setting up a single designated client machine that would run multiple clients, but we found that, at least with the WAS tool, we repeatedly risked hanging the client machine rather than the Web server we were trying to test against, which meant that we weren't really load-testing at all because the "load" wasn't always getting from the client machine to the Web server. Once we moved to a scenario where we had multiple machines running the WAS clients, we could see from the Web server activity that we were finally getting the kind of load testing we wanted.
Here are the general steps we followed after setting up the Web server itself and determining which machines would act as WAS client machines. Your mileage will almost certainly vary, but these are good starting points:
First, make sure your test site is set up so that it will be accessible by all your designated clients. What exactly this means will be largely determined by what you want to test. If you want to be able to load-test from both inside and outside your organization, then your setup will obviously be different than if you want to test solely from within your organization.
I'd suggest strictly internal testing at first; our experience shows that once you move to external testing, you're simultaneously testing your application and your clients' bandwidth, which makes it harder to narrow down what your actual problems might be. If you're testing internally - that is, from machines on your local network to machines on your local network - all the test machines are on a level playing field as far as bandwidth goes. Once you move to external testing you have additional variables to contend with, many of which may be outside your control: connection speed (dial-up versus broadband), service providers, etc. External testing - is certainly quite useful; it's just that it's probably not the first approach you should use.
It's also important to make sure that the URL for the test site is unique and doesn't conflict with any other versions of the site that you may have running, as you want to be certain you get clean data from your tests. In our case, we set up a domain that follows this convention: preprod.[site_name].duoconsulting.com.
Second, make sure all the machines involved in the testing process are time-synced. You need to be able to compare apples to apples, using the cleanest data possible - and that means getting log files from all the machines involved in which the timing of particular events and/or errors can be matched up easily.
Third, make sure the WAS clients are configured and set up properly on each machine. This is not necessarily as straightforward as it sounds. Although the WAS tool is very useful, the setup instructions are, as a colleague of mine put it, "written by developers, for developers."
In particular, I found two documents very helpful: Microsoft's "HOW TO: Install and Use the Web Application Stress (WAS) Tool" (http://support.microsoft.com/default.aspx?scid=kb;en-us;313559) and "HOW TO: Measure ASP.NET Responsiveness with the Web Application Stress Tool" (http://support.microsoft.com/default.aspx?scid=kb;en-us;815161). The installation article will walk you through the IE configuration (don't ignore the proxy setting information - and know that for these purposes, "localhost" works where "127.0.0.1" does not). The second article provides tips on script configuration - in particular, why it's important to build in a warmup period and enable random delay. None of these items are intuitively obvious from the WAS tool itself, so be sure to read these articles, both of which are available on the Microsoft Web site.
WAS saves its scripting and report data in an Access database, so we found it useful to designate one client machine as the "parent." The parent client was then used to create the scripts needed for testing, and that database was then copied to the other client machines. WAS runs as a Windows Service, so be sure to choose the File > Exit & Stop Service command from WAS after you're done constructing your scripts and before you attempt to copy the database.
Set Aside Some Time
Next, you need to reserve the time to do the testing. Again, this may not be as straightforward as it sounds. You first need to determine who will be involved, and if there are multiple people, make sure that they're available for the duration of your testing plan.
In our case, both I, as lead programmer (and the person most familiar with the application), and our systems administrator (who monitored the machines during the tests) really needed to be present. You should plan on the testing process itself taking longer than you really expect (especially on your first try). There will be stops and starts, unexpected results and delays, not to mention environment bugginess, so don't expect the process to be completed in a single day.
Another equally important time consideration is the availability of equipment and network resources - and the question of when you can abuse them. If the point of the testing exercise is to find the limits of your application, you'll need to be able to hang the machines involved in the testing (probably multiple times) without any major repercussions.
If you're sharing a development server with someone who has a project deadline to meet, don't plan on testing when it will interfere with that deadline. If you're using a staging server that other clients have access to, then make sure that they aren't caught off guard by your testing. Ideally, you'll have other options so you won't have to work around either of the scenarios outlined above, but you may not have that luxury. If you must work around other people using your test machines, be extraordinarily conservative about timing on these machines - don't schedule the usage back-to-back, because if a machine goes down because of testing it may take some time to get the machine back in shape for its other purposes.
You then need to prepare to capture data from your WAS client testing. This will depend on what you're looking to find, of course, but in our case we threw our net as wide as possible precisely because we weren't sure what we were looking for. So we gathered reports from the WAS tool, database traces (you can reach SQL Profiler from the Tools menu in SQL Enterprise Manager; see "HOW TO: Troubleshoot Application Performance with SQL Server," http://support.microsoft.com/default.aspx?scid=kb;en-us;224587, for instructions on how to enable this), and the Web server performance monitor. We found that gathering data from all three of these sources really gave us a full picture of what was going on with the application: the WAS reports from the client provided information on timeouts, socket errors, and hits/requests per second; the database traces allowed us to track longer-running queries; and the reports from the Web server performance monitor gave us insight on simultaneous users, queue request times, and average page response times.
Using these sources in conjunction with each other, we could really narrow down what was happening and when. You should plan on running far more tests than you might initially expect, so it's important to organize the data as well. We numbered each test, and all data captured corresponded to those numbered tests; we ran more than 60 tests over the course of several weeks, so this was crucial when it came time to prepare comparative reports.
It is equally crucial to prepare to record your anecdotal observations of each test as well. You may think that the data you capture will speak for itself, but that may not be true (or at least obvious).
For instance, we at first thought that our main data point would be the number of simultaneous users that the Web server could support, but it turned out that that particular statistic didn't really speak for itself; although according to the Web server performance monitor logs our early tests showed a high number of simultaneous users on the site (good), those same tests produced very high numbers of timeouts and socket errors from the WAS reports (bad), as well as very slow page response times from other portions of the performance monitor logs (very bad).
We would have had a far more difficult time figuring out what data we should be focusing on if we hadn't kept our own personal notes as well. We made notes as to whether the site was fast or slow or erratic, and at roughly what points during the test these things were happening. And again, this is especially important if you're doing lots of testing over a period of time: if you don't have your own notes about Test 4, you'll have a very difficult time comparing it to Test 61 three weeks later - or even finding good starting points for comparison.
Creating a Script
Finally, with the testing environment in place it's time to create a script. Again, what this means will vary depending on your needs. In our case, we recorded what we felt would be a fairly typical user session over the course of a couple of minutes, including what we thought might be the problem areas. Don't get too attached to the idea of the perfect script - it may be that you'll need several different scripts over the course of your testing process as you narrow down your problem areas.
You will probably also want to keep your initial scripts fairly short, and expand them only later. Our first scripts were only 10 minutes long (in other words, we were looping over our recorded script several times) - which was certainly more than enough to see where the weak points were, especially as we added more users to the mix. Longer, endurance scripts (ones you might run overnight or over the course of a weekend), should probably be employed only after you've squashed all the obvious bugs you've found in your short scripts; ideally, longer scripts can provide another, more realistic, benchmark for you, but only if you can run them without quickly hanging the servers involved.
After you've created the scripts you think you'll need, copy the Access database that holds those scripts from the parent client to all the child clients - that way, you're sure that everyone has the same script data, and that only the generated reports will be unique.
Once the clients are set and time is reserved for people and machines, coordinate your time and set the scripts running. You should plan on scaling your tests by adding increasing numbers of clients rather than heavier scripts. In other words, we found that it's better to progress your tests as follows:
- Test 1: 1 script on 1 client machine, 100 users x 1 thread
- Test 2: 1 script on 2 client machines, 100 users x 1 thread
- Test 3: 1 script on 3 client machines, 100 users x 1 thread
- Test 4: 1 script on 4 client machines, 100 users x 1 thread
- Test 5: 1 script on 5 client machines, 100 users x 1 thread
- Test 1: 1 script on 1 client machine, 100 users x 1 thread
- Test 2: 1 script on 1 client machine, 200 users x 1 thread (or worse, 100 users x 2 threads)
- Test 3: 1 script on 1 client machine, 300 users x 1 thread (or worse, 100 users x 3 threads)
- Test 4: 1 script on 1 client machine, 400 users x 1 thread (or worse, 100 users x 4 threads)
- Test 5: 1 script on 1 client machine, 500 users x 1 thread (or worse, 100 users x 5 threads)
The number of users multiplied by the number of threads equals the number of sockets being created, and we found that creating 500 sockets on a single client machine just bogged down that machine; even the WAS Help notes that you should "be careful not to increase the stress level on the clients such that these boxes spend more time context switching between threads than doing actual work." And the more threads you have, the more work your client machines will be doing simply switching between them. Obviously, if you have only a single client machine available to you, then your options are limited; just be aware that this will then be an additional factor in your testing.
With your first series of tests, you're really looking to get some initial benchmark scripts, conditions, and results for comparison purposes later. Those may come with the very first scripts you try, or it may take, as in our case, several attempts to get something useable for a baseline. When we began testing, for instance, we had lockups and crashes at alarmingly low user levels. We had to iteratively tweak the script until we eliminated some of our longer-running queries from the script. We were not ignoring those problematic queries - we returned to tune them as soon as we could eliminate some of the other underlying problems we were seeing - but it wasn't useful in the beginning to try to slay all our dragons all at once.
Refining the Testing Process
You will also, inevitably, be refining the set of data you're going to focus on as the most important. As I noted above, when we began our testing process we assumed that we would be using the "average number of users on site" as our first and most important measure of comparison, because we knew (roughly) the number of users we needed to be able to support. But as it turns out, that particular set of data was far less helpful in measuring the user experience we were after than "average page response time." Be flexible here: this is when you should carefully compare your results data with your anecdotal team notes.
So what exactly was our specific experience? As I mentioned above, we first spent some time tweaking our baseline test scripts, and we got some pretty horrible (if revealing) numbers. At 100 simultaneous users, the site performed just as expected - fairly fast page loads of just a few seconds. This was the "normal" mode for the site. However, at 500 simultaneous users from five separate client machines going against a single dedicated MX6.1 Enterprise server, we had:
- Average page response times well over 1 minute
- Average queue request times well over 1 minute
- Hundreds of timeouts and socket errors
The first thing we did was to try to track down the worst offender - and that was clearly the database. We found that even with our short, basic scripts, eventually we would get database locks because we were using database-stored client variables. Because we had separated out our client variable storage for this application into a discrete database, we could easily see that there was far more activity there than we would have expected. Even though we had disabled the global updates to our client variable storage for the site, the application was still making unnecessary trips to the database server with each page hit.
Further research showed that in our particular instance, we could very easily switch from database-stored client variables to cookie-only client variables. This may or may not be true for others: if you are storing a great deal of information in your client variables, then database storage is probably most appropriate. If you're not storing very much information (less than 4K) and cookies won't be a problem for the site - and you're prepared with a P3P policy - then using cookie storage for your client variables may be the way to go. Once we made the change to cookie storage, site performance increased considerably.
We could then revert our scripts to include some of the earlier problematic long-running queries; we had first excluded them by looking at our initial scripts and comparing the database traces with the lines in the script that seemed to correspond to those queries, and then simply deleting those page calls from the script. We reran the tests with the modified scripts (that is, with the page calls added back in), capturing the database trace as we did so. We could then easily identify those queries that ran most often, as well as those that grabbed the most database CPU.
This tracking really gave us bang for our buck - we were able to identify just a few problem queries and concentrate our efforts on those. We optimized those queries as much as we could, and even devised a new caching strategy to eke out more performance gains. By this time, we could see the following numbers for 500 simultaneous users on the same machine:
- Average page response times under 20 seconds
- Average queue request times under 20 seconds
- No timeouts, and only a few socket errors
The Nature of the Beast
As you can see from just the short summary above, our testing was a highly iterative process, run by art at least as much as by science. In part, this is the nature of the beast - it takes a certain amount of trial and error before you hit upon the right problems and their corresponding solutions. But this also happens in part because as you refine your application environment, the source of your problems will change.
For instance, in our first tests the database CPU was maxing out during most of the script, but the Web server CPU would hardly ever rise above 10%. Why? Because of the client variable problem - it was overloading the database so much (as well as frequently locking it up) that the Web server didn't have that much to work with. Once we eliminated the client variable problem we could see from the traces that the database usage had eased significantly, but that the Web server CPU usage then rose to over 70% during certain portions of the scripts. Fix one problem, and the application bottlenecks somewhere else.
Since the process is so iterative, you'll have to clarify with your team fairly quickly what your specific endpoint will be. Of course, it has to be realistic - our client initially wanted to be able to support an entire season's possible registrants all at once, potentially 75,000 simultaneous users, which, given the budget and the actual needs of the site, didn't make sense (it had never experienced more than 1500 simultaneous users). It should be noted that upon reflection our client agreed to more realistic goals.
Even with realistic goals, however, it would be very easy to load-test yourself out of existence if they're not specific enough, because there's always more testing and tweaking that you could do. At some point, you and your team will need to decide something along the lines of, "we will tune the application so that all pages respond within 2 seconds when there are 500 simultaneous users on the site." In our case, we ultimately wanted to reduce the average page-response time and reduce the average queue time so that we could reach that 2-second goal. But whatever your particular goal is, once you get there, stop the testing.
Recode, Retest, Relaunch
The testing, after all, is just the first part of what you need to do. Now you have a game plan for refining the application or the database, or both, but you still need to recode, retest, and relaunch (or launch) the site. And that, obviously, takes time. So again, be cognizant of any looming deadlines so that the initial load-testing phase doesn't take up so much time that you won't be able to improve your production application. Once we got reasonably close to our 2-second page-load goal with our internal testing, we stopped our testing and did the actual recoding and regression testing we needed to do before relaunching the application.
Once we had recoded and relaunched our application, we did one final set of load tests - first, to verify our expectations; and second, to allow the client to experience the site while we load-tested. This second reason may seem like an afterthought, but it's not.
Remember that one of the main goals of load testing is to establish client confidence. Although we had been reporting our progress to the client throughout the process, this would be the first time for them to actually experience the faster version of the site. There's nothing that will establish confidence like setting up a test scenario and having your client experience the site at the same time. Having said that, be prepared for slightly different results than you may have had with strictly internal testing - because again, you'll also be testing bandwidth limitations, which throws another set of variables into the mix.
We set up specific times for our external, preproduction load tests, and let our client know ahead of time when those would be. As a result, many members of the organization were able to use the site while we were load-testing. They knew what to expect, they could see where the weak points were, and they could clearly see that the site performed better. We got client buy-in - and that's invaluable.
The day of reckoning finally arrived - the next registration period. But this time things went smoothly. In fact, things went even better in production than in some of our final load tests, partly because we had constructed our tests so conservatively, and partly because of the low latency over the network during our internal tests (which created many more requests per unit of time). Not only were there no crashes in production, but the site performed without any slowdowns, even when we were processing nearly 300 orders a minute with well over 500 simultaneous users:
- The Web servers' CPU usage was consistently 10% or less.
- The database server used 15% or less of its CPU.
- Pages responded in well under 2 seconds, on average.
- There were 0 queued requests (and therefore, the average queue request time was 0!).
In the end, the load testing wasn't free, but the expense we incurred was worth it. There are many different options for testing, and I've discussed only a small number of the available tools and approaches here. You should review the tools and/or service options that seem best for your organization's needs and budget. For Duo Consulting, pursuing the load testing in-house with the necessary time, patience, and resources gave us client confidence, developer confidence, and a roadmap for scaling the application as usage increased.
- WebRTC Summit at Cloud Expo Agenda Announced
- Google’s Enterprise Problem
- Building Video Calling with PubNub and WebRTC
- DataStax Announces New Startup Programme Offering Free Software, As Well As Free Training Courses For Cassandra Users And New Developer Tool
- Get Ready to Think Out (C)loud With Cloud Sherpas’ Upcoming Webinar Series
- Evaluation Report on Virtual Backup Software
- Series: Exchange 2013 and Lync 2013 Integration with AsteriskNOW PBX Pt. 1
- New PubNub App Template for WebRTC
- Strategic Enough to Matter, Code Halos and Mobile Apps
- GAMA : Quatre acteurs clefs, quatre stratégies différentes !
- Box and NSI Partnership Brings the Cloud to Businesses in the Middle East
- 7 Christmas Gifts For Your Business
- WebRTC Summit at Cloud Expo Agenda Announced
- OneLogin Raises $13M to Power Expansion
- Cloud Security Alliance Releases Cloud Controls Matrix, Version 3.0
- Survey Finds Large Enterprises Adopting WebRTC
- WebRTC Summit | WebRTC: Test then Disrupt
- WebRTC Summit Speaker Submissions Open
- WSO2 Expands Identity Management Capabilities Across Cloud, Mobile and Web Applications With the Launch of WSO2 Identity Server 4.5
- BMC Software to Exhibit at Cloud Expo Silicon Valley
- Twilio and LiveOps to Deliver WebRTC Deployments
- Oracle Demonstrates WebRTC Solution with CounterPath's Bria
- OpenStack for the Enterprise – Showcasing the OpenStack Ecosystem
- XIRSYS Launches WebRTC Hosting Service
- Where Are RIA Technologies Headed in 2008?
- The Top 250 Players in the Cloud Computing Ecosystem
- Dolphin Announces Open API With Over 50 Add-ons Including Dropbox and Wikipedia
- Personal Branding Checklist
- AJAXWorld 2006 West Power Panel with Google's Adam Bosworth
- Why Microsoft Loves Google's Android
- Google's OpenSocial: A Technical Overview and Critique
- Cloud Expo New York Call for Papers Now Open
- Wal-Mart To Sell $399 Ubuntu Linux-based Laptop with Google Operating System
- i-Technology Blog: Google Trends on Java, McNealy, AJAX, and SOA Give Pause For Thought
- i-Technology Blog: Is There Life Beyond Google?
- Android: Who Hates Google Over the Phone?