Custom Search

Don't mistake user acceptance testing for acceptance testing

Don't mistake user acceptance testing for acceptance testing

If you think software testing in general is badly misunderstood, acceptance testing (a subset of software testing) is even more wildly misunderstood. This misunderstanding is most common with commercially driven software as opposed to open source software and software being developed for academic or research and development reasons.

This misunderstanding baffles me because acceptance testing is one of the most consistently defined testing concepts I've encountered over my career both inside and outside of the software field.

First, let's look at what Wikipedia has to say about acceptance testing:

"In engineering and its various subdisciplines, acceptance testing is black-box testing performed on a system (e.g. software, lots of manufactured mechanical parts, or batches of chemical products) prior to its delivery…

"In most environments, acceptance testing by the system provider is distinguished from acceptance testing by the customer (the user or client) prior to accepting transfer of ownership…

"A principal purpose of acceptance testing is that, once completed successfully, and provided certain additional (contractually agreed) acceptance criteria are met, the sponsors will then sign off on the system as satisfying the contract (previously agreed between sponsor and manufacturer), and deliver final payment."

This is consistent with the definition Cem Kaner uses throughout his books, courses, articles and talks, which collectively are some of the most highly referenced software testing material in the industry. The following definition is from his Black Box Software Testing course:

"Acceptance testing is done to check whether the customer should pay for the product. Usually acceptance testing is done as black box testing."

Another extremely well-referenced source of software testing terms is the International Software Testing Qualifications Board (ISTQB) Standard glossary of terms. Below is the definition from Version 1.3 (dd. May, 31, 2007):

"Formal testing with respect to user needs, requirements, and business processes conducted to determine whether or not a system satisfies the acceptance criteria and to enable the user, customers or other authorized entity to determine whether or not to accept the system."

I've chosen those three references because I've found that if Wikipedia, Cem Kaner and the ISTQB are on the same page related to a term or the definition of a concept, then the testing community at large will tend to use those terms in a manner that is consistent with these resources. Acceptance testing, however, is an exception.

There are several key points on which these definitions/descriptions agree:

  1. In each case, acceptance testing is done to determine whether the application or product is acceptable to someone or some organization and/or if the person or organization should pay for the application or product AS IS.
  2. "Finding defects" is not so much as mentioned in any of those definitions/descriptions, but each implies that defects jeopardize whether the application or product becomes "accepted."
  3. Pre-determined, explicitly stated, mutually agreed-upon criteria (between the creator of the application or product and the person or group that is paying for or otherwise commissioned the software) are the basis for non-acceptance.

    Wikipedia refers to this agreement as a contract, and identifies that non-compliance with the terms of the contract as a reason for non-payment. And Kaner references the contract through the question of whether the customer should pay for the product.

    The ISTQB does not directly refer to either contract or payment but certainly implies the existence of a contract (an agreement between two or more parties for the doing or not doing of something specified).

With that kind of consistency, you'd think there wouldn't be any confusion.

The only explanation I can come up with is that it is related to the fact that many people involved with software development only have experience with "user acceptance testing," and as a result they develop the mistaken impression that "user acceptance testing" is synonymous with "acceptance testing."

If my experiences with software user acceptance testing are common, I can understand where the confusion comes from. All of the software user acceptance testing that I have firsthand experience with involves representative users being given a script to follow and then being asked if they were able to successfully complete the task they were assigned by the person who wrote, or at least tested, the script.

Since the software had always been rather strenuously tested against the script, the virtually inevitable feedback that all of the "users" found the software "acceptable" is given to the person or group paying for the software. The person or group then accepts the software -- and pays for it.

I have no idea how many dissatisfied end users, unhappy commissioners of software and unacceptable software products this flawed process is responsible for, but I suspect that it is no small number.

There are several flaws with that practice, at least as it relates to the definitions above.

  1. The "users" doing the acceptance testing are not the people who are paying for the development of the software.
  2. The person or group paying for the development of the software is not present during the user acceptance testing.
  3. The "users" are provided with all of the information, support, instructions, guidance and assistance they need to ultimately provide the desired "yes" response. Frequently this assistance is provided by a senior tester who has been charged with the task of coordinating and conducting user acceptance testing and believes he is doing the right thing by going out of his way to provide a positive experience for the "users."
  4. By the time user acceptance testing is conducted, the developers of the software are ready to be done with the project and the person or group paying for the development of the software is anxious to ship the software.

If that is the only experience a person has with acceptance testing, I can see why one may not realize that the goal of user acceptance testing is to answer whether the end users will be satisfied enough with the software -- which obviously ships without the scripts and the well-meaning senior tester to help out -- to want to use and/or purchase it.

I have no idea how many dissatisfied end users, unhappy commissioners of software and unacceptable software products this flawed process is responsible for, but I suspect that it is no small number.

In an attempt to avoid dissatisfied end users, unhappy commissioners of software and unacceptable software products, whenever someone asks me to be a part of any kind of acceptance testing -- whether qualified by additional terms like "user," "build," "system," "automated," "agile," "security," "continuous," "package," "customer," "business process," "market," or something else -- I pause to ask the following:

"For what purpose is who supposed to be deciding whether or not to accept what, on behalf of whom?"

My question often confuses people at first, but so far it has always lead to some kind of acceptance testing that enables decisions about the software and its related contracts. And that is what acceptance testing is all about.

How to Define a Test Strategy

How to define a test strategy

Q- I want to define one test strategy which is suitable for all the teams in my organization. What are the questions I need to ask the developers to define a test strategy?

Expert’s response: This is a broad question with several possible meanings. However, I'll take a stab at it. It sounds to me like the question is how do you drive a system test strategy, i.e., once each component of a project has been component-tested, how do you test the strategy? What do you need to know from developers to build the strategy?

When it comes to system testing, to be frank I want less information from the developers than I do from the customers. I want to approach my system testing from a scenario basis. That having been said, there are some important things to know about the components -- specifically, how they interact with each other. What are outbound and inbound dependencies (i.e., what data is transferred between components)? The key here is to ask questions that don't offload the burden of testing from you to the developers. You can't ask a developer, "How do I test this" because, if he answers you, he might as well do it himself. What you need to ask are questions such as "How does this component interact with that?" or (better yet) "I've been reading the technical specification for your component, and I have a couple of questions." Then ask your specific questions.

To put it into a real-world analogy, let's say you're testing a procurement and inventory control application for a small gas business. The application may consist of a procurement piece (code that automates ordering delivery of gas), a projection piece (code that projects short- and mid-term inventory needs), and a delivery tracking piece (code that verifies the gas ordered is delivered, even if it's split up among several deliveries). At this point in your test planning, your interview with developers will focus on the data being shared --which portions of the database are common and which are specific to a given component. You'll also ask how components modify shared data. While this data modification may have been tested to specification during initial testing, it's very possible that the original specification overlooked some element of interaction and the spec is deficient.

Another key step is to examine the test strategy for each individual component. Here you are looking for the overlap: Which cases do two or more components have in common? Often, the team developing the integration test strategy will have spent time identifying system-level test cases, and you can leverage them here.

In our real-world example, interviewing the test leads from each team should result in them sharing with you cases that they felt were "system test cases" by nature -- cases that cover interaction, cases with cover dependencies, etc. The test lead for the procurement component, for instance, might have identified cases that cover order size and delivery date and will want the delivery tracking piece to be sure to check that they handle split orders appropriately. Through these interviews, you should build a list of cases that cover common cases.

Finally, as I mentioned, you want to spend a lot of time in system-level testing thinking about scenarios. You want to define how a user will interact with your product and follow that interaction as it goes from component to component. You definitely want to speak with the customer and in two phases. First, sit down with the customer and ask them to work with you to identify key customer scenarios. Document everything! Then, from that meeting, develop any other scenarios or obvious variations on scenarios. Prioritize these scenarios and write up the steps that comprise the scenario. Finally, return to the customer and validate your final scenarios and your steps that cover them.

In our real-world example, you'd walk through the lifetime of an order -- from the moment the projection component identifies a new order is needed, through order placement and then fulfillment.

You definitely want to script scenarios that cover full order delivery as well as split deliveries. You want to run a scenario that probes how the projection component deals with fluctuating demand, and so on. Once you've identified a set of scenarios, script high-level steps for them. Circle back, refine your scenarios and steps, and then sit with the customer and have them validate your planning -- taking feedback and modifying appropriately.

Through this research, you can begin to identify a system-level test approach that results in the highest possibility of verifying customer functionality. You'll minimize test case overlap with your integration-level testing, as well. The key to good system-level work is focusing on higher-level testing (scenario-based testing) and minimizing your component-level testing (assuming component-level testing has been carried out successfully). If you involve developers in this stage, be sure to do so as a planning augmentation. Don't ask them to define your strategy for you. Bring them in as valued experts, but be sure to minimize the questions you ask them. You want them on your side when it comes time to advocate for fixes.

What software testers can learn from children

What software testers can learn from children

When I went back to consulting, I started my own company -- not because I wanted to run a company, but because I didn't want to have to answer to anyone else when I chose to not travel during baseball season so I could coach my son's team. In the same spirit, when I work from home, I frequently do so in a room with my boys, who are naturally curious about what I'm doing. Over the past few years of this, I've learned a lot of things about being a good tester from them. Some of the most significant are these:

Don't be afraid to ask "Why?" As testers, we know to embrace our natural curiosity about what applications or systems do and how they do them, but we have a tendency to forget to ask why someone would want an application or system to do it. We tend to assume that if the application or system does something, some stakeholder wants the application or system to do that thing.

My children make no such assumption. Every time I start explaining to them what the application or system I'm testing does, they ask why it does what it does. No matter how many times they ask, I'm always surprised when I find that I don't have a good answer. More important, I'm amazed by the number of times, after realizing that I don't have answer and posing the question to a stakeholder, I find that they don't know either. In my experience, this has a strong positive correlation with rather significant changes being made to the design or functionality of the application or system.

Exploratory play is learning. Over the years, I have found that many testers seem to limit their testing to what they are instructed to test. My children have no such "testing filter." They are forever asking me what a particular button does or asking if they can type or use the mouse. Invariably, they find a behavior that is obviously a bug, but that I'd have been unlikely to uncover. The most intriguing part is they can almost always reproduce the bug, even days later. They may not have learned to do what the application was intended to do, but they have learned how to get the application to do something it wasn't intended to do -- which is exactly what we as testers ought to do

Recording your testing is invaluable. When Taylor was younger, he couldn't reproduce the defects he found. All he knew was to call me when the things he saw on the screen stopped making sense. Recently, I found a solution to this (since we all have trouble reproducing bugs, at least sometimes). I now set up a screen and voice recorder so that after I'm done with a test session, I can play back and watch the narrated video of the session. I can even edit those videos and attach segments of them to defect reports. Besides, Taylor loves watching the videos and listening to his voice as we sit together and watch the video of him testing while I took a phone call or did whatever else called me away and left him alone at the keyboard.

"Intuitive" means different things to different people. The more we know about what a system or application is supposed to do, the more intuitive we believe it is. My boys not only don't know what the applications and systems I test are supposed to do, but things like personnel management, retirement planning and remote portal administration are still a bit beyond them. That said, showing them a screen and asking, "What do you think Daddy is supposed to do now?" can point out some fascinating design flaws. For example, even Nicholas, who now reads well, will always tell me that he thinks I'm supposed to click the biggest or most brightly colored button or that he thinks I'm supposed to click on some eye-catching graphic that isn't a button or link at all. In pointing this out, he is demonstrating to me that the actions I am performing are unlikely to be intuitive for an untrained user.

Fast enough depends on the user. I talk about how users will judge the overall goodness of response time based on their expectations. My children expect everything to react at video game speed. They have absolutely no tolerance for delay.

You can never tell what a user may try to do with your software. When pointing out a bug that results from an unanticipated sequence of activity, we are often faced with the objection of "No user would ever do that." (Which James Bach interprets to mean, "No user we like would ever do that intentionally.") Interestingly enough, that objection melts away when I explain that I found the defect because one of my boys threw a ball that fell on the keyboard, or sat down and starting playing with the keyboard when I got up to get a snack.

Sometimes the most valuable thing you can do is take a break. Granted, my boys didn't teach me this directly, but I have learned that when I am sitting in front of my computer jealously listening to them playing while I am experiencing frustration due to my inability to make progress, taking a break to go play with them for a while almost always brings me back to the computer in a short while, refreshed and with new ideas.

Speaking of taking a break, my boys are waking up from their nap, so I think I'm going to go play for a while.

Ten software testing traps

Ten software testing traps

Everyone at some point in their careers faces difficulties. The problem could be not having enough resources or time to complete projects. It could be working with people who don't think your job is important. It could be lack of consideration and respect from managers and those who report to you.
Software testers aren't exempt from this. But as Jon Bach pointed out in his session titled "Top 10 tendencies that trap testers" that he presented at StarEast a couple weeks ago, software testers often do things that affect their work and how co-workers think about them.
Bach, manager for corporate intellect and technical solutions at Quardev Inc., reviewed 10 tendencies he's observed in software testers that often trap them and limit how well they do their job.
"If you want to avoid traps because you want to earn credibility, want others to be confident in you, and want respect, then you need to be cautious, be curious and think critically," he said.
Here's a look at what Bach considers the top 10 traps and how to remedy them:
10. Stakeholder trust: This is the tendency to search for or interpret information in a way that confirms your preconceptions. But what if a person's preconceptions are wrong? You can't automatically believe or trust people when they say, "Don't worry about it," "It's fixed," or "I'll take care of it."
Remedies include learning to trust but then verify that what the person says is correct. Testers should also think about the tradeoffs compared with opportunity costs, as well as consider what else might be broken.
9. Compartmental thinking: This means thinking only about what's in front of you. Remedies include thinking about opposite dimensions -- light vs. dark, small vs. big, fast vs. slow, etc. Testers can also exercise a brainstorm tactic called "brute cause analysis" in which one person thinks of an error and then another person thinks of a function.
8. Definition faith: Testers can't assume they know what is being asked of them. For example, if someone says, "Test this," what do you need to test for? The same goes for the term "state." There are many options.
What testers need to do is push back a little and make sure they understand what is expected of them. Is there another interpretation? What is their mission? What is the test meant to find?
7. In-attentional blindness: This is the inability to perceive features in a visual scene when the observer is not attending to them. An example of this is focusing on one thing or being distracted by something while other things go on around you, such as a magic trick.
To remedy this, testers need to increase their situational awareness. Manage the scope and depth of their attention. Look for different things and look at different things in different ways.
6. Dismissed confusion: If a tester is confused by what he's seeing, he may think, "It's probably working; it's just something I'm doing wrong." He needs to instead have confidence in his confusion. Fresh eyes find bugs, and a tester's confusion is more than likely picking up on something that's wrong.
5. Performance paralysis: This happens when testers are overwhelmed by the number of choices to begin testing. To help get over this, testers can look at the bug database, talk with other testers (paired testing), talk with programmers, look at the design documents, search the Web and review user documentation.
Bach also suggests trying a PIQ (Plunge In/Quit) cycle -- plunge in and just do anything. If it's too hard, then stop and go back to it. Do this several times -- plunge in, quit; plunge in, quit; plunge in, quit. Testers can also try using a test planning checklist and a test plan evaluation.
4. Function fanaticism: Don't get wrapped up in functional testing. Yes, those types of tests are important, but don't forget about structure tests, data tests, platform tests, operations tests and time tests. To get out of that trap, use or invest in your own heuristics.
3. Yourself, untested: Testers tend not to scrutinize their own work. They can become complacent about their testing knowledge, they stop learning more about testing, they have malformed tests and misleading bug titles. Testers need to take a step back and test their testing.
2. Bad oracles: An oracle is a principle or mechanism used to recognize a problem. You could be following a bad one. For example, how do you know a bug is a bug? Testers should file issues as well as bugs, and they should mention in passing to people involved that things might be bugs.
1. Premature celebration: You may think you've found the culprit -- the show-stopping bug. However, another bug may be one step away. To avoid this, testers should "jump to conjecture, not conclusions." They should find the fault, not just the failure.
Testers can also follow the "rumble strip" heuristic. The rumble strip runs along most highways. It's a warning that your car is heading into danger if it continues on its current path. Bach says, "The rumble strip heuristic in testing says that when you're testing and you see the product do strange things (especially when it wasn't doing those strange things just before) that could indicate a big disaster is about to happen."

What to include in a performance test plan

What to include in a performance test plan

Before performance testing can be performed effectively, a detailed plan should be formulated that specifies how performance testing will proceed from a business perspective and technical perspective. At a minimum, a performance testing plan needs to address the following:

  • Overall approach
  • Dependencies and baseline assumptions
  • Pre-performance testing actions
  • Performance testing approach
  • Performance testing activities
  • In-scope business processes
  • Out-of-scope business processes
  • Performance testing scenarios
  • Performance test execution
  • Performance test metrics

As in any testing plan, try to keep the amount of text to a minimum. Use tables and lists to articulate the information. This will reduce the incidents of miscommunication.

Overall approach
This section of the performance plan lays out the overall approach for this performance testing engagement in non-technical terms. The target audience is the management and the business. Example:

"The performance testing approach will focus on the business processes supported by the new system implementation. Within the context of the performance testing engagement, we will:

· Focus on mitigating the performance risks for this new implementation.

· Make basic working assumptions on which parts of the implementation need to be performance-tested.

· Reach consensus on these working assumptions and determine the appropriate level of performance and stress testing that shall be completed within this compressed time schedule.

This is a living document, as more information is brought to light, and as we reach consensus on the appropriate performance testing approach, this document will be updated."

Dependencies and baseline assumptions
This section of the performance test plan articulates the dependencies (tasks that must be completed) and baseline assumptions (conditions testing believes to be true) that must be met before effective performance testing can proceed. Example:

"To proceed with any performance testing engagement the following basic requirements should be met:

· Components to be performance tested shall be completely functional.

· Components to be performance tested shall be housed in hardware/firmware components that are representative or scaleable to the intended production systems.

· Data repositories shall be representative or scaleable to the intended production systems.

· Performance objectives shall be agreed upon, including working assumptions and testing scenarios.

· Performance testing tools and supporting technologies shall be installed and fully licensed."

Pre-performance testing actions
This section of the performance test plan articulates pre-testing activities that could be performed before formal performance testing begins to ensure the system is ready. It's the equivalent to smoke testing in the functional testing space. Example:

"Several pre-performance testing actions could be taken to mitigate any risks during performance testing:

· Create a "stubs" or "utilities" to push transactions through the QA environment -– using projected peak loads.

· Create a "stubs" or "utilities" to replace business-to-business transactions that are not going to be tested or will undergo limited performance. This would remove any dependencies on B2B transactions.

· Create a "stubs" or "utilities" to replace internal components that will not be available during performance testing. This would remove any dependencies on these components.

· Implement appropriate performance monitors on all high-volume servers."

Performance testing approach
This section of the performance plan expands on the overall approach, but this time the focus is on the both the business and technical approach. As an example:

"The performance testing approach will focus on a logical view of the new system implementation. Within the context of the performance testing engagement, we will:

· Focus on mitigating the performance risks for this new implementation.

· Make basic working assumptions on which parts of the implementation need to be performance-tested.

· Reach consensus on these working assumptions and determine the appropriate level of performance that shall be completed.

· Use a tier 1 performance testing tool that can replicate the expected production volumes.

· Use an environment that replicates the components (as they will exist in production) that will be performance-tested -– noting all exceptions.

· Use both production and non-production (testing) monitors to measure the performance of the system during performance testing."

Performance testing activities
This section of the performance test plan specifies the activities that will occur during performance testing. Example:

"During performance testing the following activities shall occur:

· Performance test shall create appropriate loads against the system following agreed-upon scenarios that include:

o User actions (workflow)

o Agreed-upon loads (transactions per minute)

o Agreed-upon metrics (response times)

· Manual testing and automated functional tests shall be conducted during performance testing to ensure that user activities are not impacted by the current load.

· System monitors shall be used to observe the performance of all servers involved in the test to ensure they meet predefined performance requirements.

· Post-implementation support teams shall be represented during performance testing to observe and support the performance testing efforts."

In-scope business processes
This section of the performance test plan speaks to which aspects of the system are deemed to be in-scope (measured). Example:

"The following business processes are considered in-scope for the purposes of performance testing:

· User registration

· Logon/access

· Users browsing content

· Article sales & fulfillment

· Billing

Business process list formed in consultation with: Business Analysts, Marketing Analyst, Infrastructure, and Business Owner."

Out-of-scope business processes
This section of the performance testing plan speaks to which aspects of the system are deemed to be out-of-scope (measured). Example:

"Business processes that are considered out-of-scope for the purposes of testing are as follows:

· Credit check

o Assumption: Credit check link shall be hosted by a third party -- therefore no significant performance impact.

· All other business functionality not previously listed as in-scope or out-of-scope

o Assumption: Any business activity not mentioned in the in-scope or out-of-scope sections of this document does not present a significant performance risk to the business."

Formulating performance testing scenarios
The existence of this section within the body of the performance testing plan depends on the maturity of the organization within the performance testing space. If the organization has little or no experience in this space, then include this section within the plan otherwise include it as an appendix. Example:

"Formulation of performance testing scenarios requires significant inputs from IT and the business:

· Business scenario

o The business scenario starts as a simple textual description of the business workflow being performance-tested.

o The business scenario expands to a sequence of specific steps with well-defined data requirements.

o The business scenario is complete once IT determines what (if any) additional data requirements are required because of the behavior of the application/servers (i.e. caching).

· Expected throughput (peak)

o The expected throughput begins with the business stating how many users are expected to be performing this activity during peak and non-peak hours.

o The expected throughput expands to a sequence of distinguishable transactions that may (or may not) be discernable to the end user.

o The expected throughput is completed once IT determines what (if any) additional factors could impact the load (i.e. load-balancing)

· Acceptance performance criteria (acceptable response times under various loads)

o Acceptance performance criteria are stated by the business in terms of acceptable response times under light, normal and heavy system load. System load being day-in-the-life activity. These could be simulated by other performance scenarios.

o The performance testing team then restates the acceptance criteria in terms of measurable system events. These criteria are then presented to the business for acceptance.

o The acceptance criteria are completed once IT determines how to monitor system performance during the performance test. This will include metrics from the performance testing team.

· Data requirements (scenario and implementation specific)

o The business specifies the critical data elements that would influence the end-user experience.

o IT expands these data requirements to include factors that might not be visible to the end user, such as caching.

o The performance testing team working with IT and the business creates the necessary data stores to support performance testing."

Performance test execution
Once again the existence of this section of the performance test plan is dependent upon the maturity of the organization within the performance testing space. If the organization has significant performance testing experience, then this section can become a supporting appendix. Example:

"Performance testing usually follows a linear path of events:

· Define performance-testing scenarios.

· Define day-in-the-life loads based on the defined scenarios.

· Execute performance tests as standalone tests to detect issues within a particular business workflow.

· Execute performance scenarios as a "package" to simulate day-in-the-life activities that are measured against performance success criteria.

· Report performance testing results.

· Tune the system.

· Repeat testing as required."

Performance test metrics
The performance test metrics need to track against acceptance performance criteria formulated as part of the performance testing scenarios. If the organization has the foresight to articulate these as performance requirements, then a performance requirements section should be published within the context of the performance test plan. The most basic performance test metrics consist of measuring response time and transaction failure rate against a given performance load -- as articulated in the performance test scenario. These metrics are then compared to the performance requirements to determine if the system is meeting the business need.

Investigation vs. Validation

Investigation vs. Validation


This article investigates the relationship between investigation and validation during performance testing. It demonstrates that investigation and validation during performance testing is fundamentally different from the relationship between investigation and validation in functional testing.

Differences in Investigation and Validation in Performance Testing Vs. Functional Testing

Have you ever had this experience? You're explaining something that you have gone over a million times before. Suddenly, you stop in the middle of explanation one million and one and say, "That's it! Why didn't I think of that years ago?"

This happened to me just the other week while I was working to help a client improve an approach to performance testing. It was almost as if I was listening to someone else speaking for a moment, as I heard my own words replay in my head:

"We know that there is an issue on the app server that they are working on now. That pretty much means that testing requirements compliance would be pointless, but we still have to see what the new Web server hardware has done for us, right? So let's forget the requirements for now and whip up some scripts to investigate how the..."

Performance Testing

I did not even realize that I hadn't finished my thought, until Tom asked "Investigate how the what?" and I realized that Kevin was looking at me as if I'd grown a second head. So I did what any good tester would do. "Hold on a second," I said, and moved briskly to the whiteboard to quickly sketch a picture that looked something like the figure shown here.

I'm sure some of you are thinking "OK, what's the big deal?" Neither investigation nor validation is a revolutionary concept for software testers. In fact, the Association for Software Testing (www specifically refers to software testing as "a technical investigation done to expose quality-related information about the product under test." And one can hardly read an article about software testing that doesn't discuss "validation" in one way or another.

What struck me in that moment was not the fact that most performance testing projects necessitate both investigation and validation; it was the relationship between investigation and validation during performance testing that became suddenly clear. For years I've been trying to explain to people that the relationship between investigation and validation in performance testing is fundamentally different from the relationship between investigation and validation in functional testing. But while I understood the distinction clearly in my head, it never seemed to come across very well verbally.

Before I make my case about how these relationships differ, I should clarify my working definitions of "validation" and "investigation" for the purposes of this discussion. Whether a project is agile, waterfall or somewhere in between, at some point it becomes important to determine whether or not the software does what it was intended to do in the first place. In other words, you have to test it.

Of course, if you follow a waterfall model, a V-model or some similar model, this happens near the end of the project and takes the form of executing lots of well-planned individual tests. Generally, each one of these tests will have been designed to determine whether or not one specific, predefined requirement has been met. If the test passes, the implementation of that requirement is said to be "validated."

If you take a more agile approach, you may instead be executing tests to determine whether or not the concept sketched on the bar napkin, now laminated and tacked to the wall in the lead developer's cube, has been implemented in accordance with the vision of the original artist. Although the criteria for determining whether one of these tests passes or fails are not nearly as well defined as the ones we discussed above, a passing test is nevertheless said to have "validated" the implementation of the feature or features being tested—so pretty much any way you look at it, "validation testing" can be thought of as an activity that compares the version of the software being tested to the expectations that have been set or presumed for that product.

That takes care of validation, but what about investigation?

Presumptions of Innocence

Let's start with a dictionary definition of investigation: "a detailed inquiry or systematic examination." The first and most obvious difference between our working definition of validation and the dictionary definition of investigation is that we have stated that validation requires the existence of expectations about the outcome of the testing, while the definitions of investigation make no reference to either outcomes or expectations. This distinction is why we talk about "investigating a crime scene" rather than "validating a crime scene"; validating a crime scene would violate the presumption of "innocent until proven guilty" by implying that the crime scene was being examined with a particular expectation as to what the collected data will mean.

The most well-known testing method that can be classified as "investigation" is exploratory testing (ET), which can be defined as simultaneous learning, test design and test execution. In a paper titled "Exploratory Testing Explained," James Bach writes: "An exploratory test session often begins with a charter, which states the mission and perhaps some of the tactics to be used." We can substitute for "mission" the phrase "reason for doing the investigation" without significantly changing the meaning of Bach's statement. If we then substitute "A crime scene investigation" for "An exploratory test session," we come up with "A crime scene investigation often begins with a charter, which states the reason for doing the investigation and perhaps some of the tactics to be used."

Other than the fact that I doubt crime scene investigators often refer to their instructions as a charter, I don't see any conceptual inaccuracies with the analogy, so let's agree on "investigation" being an activity based on collecting information about the version of the software being tested that may have value in determining or improving the quality of the product.

So what is it that makes the relationship between investigation and validation in performance testing fundamentally different from their relationship in functional testing?

In my experience, two factors stand out as causing this relationship to be different. The first is that typically, some manner of requirement or expectation has been established prior to the start of functional testing, even when that testing is exploratory in nature, and as I have pointed out in other columns, performance requirements are rarely well defined, testable and/or in fact required for an application to go live. What this means is that, with rare exceptions, performance testing is by nature investigative due to the lack of predefined requirements or quantifiable expectations.

The second factor differentiating these activities is the frequency with which a performance test uncovers a single issue that makes any additional validation testing wasteful until that issue is resolved. In contrast to functional testing, where it is fairly rare for a single test failure to essentially disable continued validation testing of the entire system, it is almost the norm for a single performance issue to lead to a pause, or even a halt, in validation testing.

When taken together, these two factors clearly imply that the overwhelming majority of performance tests should be classified as "investigation," whether they are intended to be or not. Yet the general perception among many individuals and organizations seems to be that "Just like functional testing, performance testing is mostly validation."

Take a moment and think about the ramifications of this disconnect. How would you plan for a "mostly validation" performance testing effort? When would you conduct which types of tests? What types of defects would be uncovered by those tests? How would the tests be designed? What skills would you look for in your lead tester?

Think, too, about the chaos that ensues when a major project enters what is planned to be performance validation two weeks before go-live, and the first test uncovers the fact that at a 10-user load, the system response time increases by two orders of magnitude, meaning that a page that returned in 1 second with one user on the system returns in 100 seconds with 10 users on the system—on a system intended to support 2,500 simultaneous users!

And if you think that doesn't happen, guess again: That is exactly what happened to me the first time I came on board a project to do performance testing at the end of development rather than at the beginning. It took eight days to find and fix the underlying issue, leaving four business days to complete the performance validation. As you can imagine, the product did not go live on the advertised date.

Now think about how you would answer each of those questions if you imagined instead a mostly investigation performance testing effort. I suspect that your answers will be significantly different. Think about the projects you have worked on: How would those projects have been different if the project planners had planned to conduct performance investigation from the beginning? If they had planned to determine the actual capacity of the hardware selected for Web servers, planned to determine the actual available network bandwidth, and planned to shake out configuration errors in the load balancers when they first became available?

The chaos on the project I described above would have been avoided if there had been a plan (or a charter) in place to investigate the performance of the login functionality as soon as it became available. One test. One script. One tester. Four hours, tops, and the debilitating issue would have been detected, resolved and forgotten before anyone had even published a go-live date.


Simple, huh? With or without the drawing on the whiteboard, the entire concept that I have struggled to make managers and executives understand for years comes down to these six words:

"Investigate performance early; validate performance last."

Factors that Influence Test Estimation

Factors that Influence Test Estimation

In his article “Test Estimation”

describes the major elements of the test estimation process; for example, techniques for compromise between estimate extremes, project management tools to use, how to predict bug find and fix rates, and options to consider when the estimated test time exceeds Management’s plan. The best estimation techniques fail, however, if no one considers the factors that influence the team’s effort, time, and resources. In this companion piece, the author explains those factors that affect the test effort for good or ill.

Sometimes, even expert project managers—especially those who are unfamiliar with software testing projects—have real trouble estimating the time and resources to allocate for testing. Careful application of project estimation best practices is a good start, but there’s more involved than following the rules of project management.

System engineering—including the testing—is a complex, high-risk human endeavor. As such, it’s important to combine good estimation techniques with an understanding of the factors that can influence effort, time, dependencies, and resources. Some of these factors can act to slow down or speed up the schedule, while others, when present, can only slow things down.

Some of these factors arise from the process by which work is done. These include

  • the extent to which testing activities pervade the project or are tacked on at the end
  • clearly defined hand-offs between testing and the rest of the organization
  • well-managed change control processes for project and test plans, product requirements, design, implementation, and testing
  • the chosen system development or maintenance lifecycle, including the maturity of testing and project processes within that lifecycle
  • timely and reliable bug fixes
  • realistic and actionable project and testing schedules and budgets
  • timely arrival of high-quality test deliverables
  • proper execution of early test phases (unit, component, and integration)

Some of these factors are material and arise from the nature of the project, the tools at hand, the resources available, and so forth, including

  • existing, assimilated, high-quality test and process automation and tools
  • the quality of the test system, by which I mean the test environment, test process, test cases, test tools, and so forth
  • an adequate, dedicated, and secure test environment
  • a separate, adequate development debugging environment
  • the availability of a reliable test oracle (so we can know a bug when we see one)
  • available, high-quality (clear, concise, accurate, etc.) project documentation like requirements, designs, plans, and so forth
  • reusable test systems and documentation from previous, similar projects
  • the similarity of the project and the testing to be performed to previous endeavors

Some factors arise from the people on the team—and these can be the most important of all—including

  • inspired and inspiring managers and technical leaders
  • an enlightened management team who are committed to appropriate levels of quality and sufficient testing
  • realistic expectations across all participants, including the individual contributors, the managers, and the project stakeholders
  • proper skills, experience, and attitudes in the project team, especially in the managers and key players
  • stability of the project team, especially the absence of turnover
  • established, positive project team relationships, again including the individual contributors, the managers, and the project stakeholders
  • competent, responsive test environment support
  • project-wide appreciation of testing, release engineering, system administration, and other unglamorous but essential roles (i.e., not a “individual heroics” culture)
  • use of skilled contractors and consultants to fill gaps
  • honesty, commitment, transparency, and open, shared agendas among the individual contributors, the managers, and the project stakeholders

Finally, some complicating factors, when present, always increase schedule and effort, including

  • high complexity of the process, project, technology, organization, or test environment
  • lots of stakeholders in the testing, the quality of the system, or project itself
  • many subteams, especially when those teams are geographically separated
  • the need to ramp up, train, and orient a growing test or project team
  • the need to assimilate or develop new tools, techniques, or technologies at the testing or project levels
  • the presence of custom hardware
  • any requirement for new test systems, especially automated testware, as part of the testing effort
  • any requirement to develop highly detailed, unambiguous test cases, especially to an unfamiliar standard of documentation
  • tricky timing of component arrival, especially for integration testing and test development
  • fragile test data, for example, data that is time sensitive

Process, material, people, and complications... On each project, specific aspects of the project in each category influence the resources and time required for various activities. When preparing a test estimate, it’s important for the test manager and those on the test team who help with estimation to consider how each of these factors will affect the estimate.

Forgetting just one of these factors can turn a realistic estimate into an unrealistic one. Experience is often the ultimate teacher of these factors, but smart test managers can learn to ask smart questions—of themselves and the project team—about whether and how each factor will affect their project.