Friday, July 4, 2008

Defect Discovery

If technology can not guarantee that defects will not be created, then defects should be found quickly before the cost-to-fix becomes expensive. For purposes of this model, a defect has been discovered when the defect has been formally brought to the attention of the developers, and the developers acknowledge that the defect is valid. A defect has not necessarily been discovered when the user simply finds a problem with the software. The user must also report the defect and the developers must acknowledge that the defect is valid. Since it is important to minimize the time between defect origination and defect discovery, strategies need to be implemented that uncover the defect, and facilitate the reporting and acknowledge of the defect.
To make it easier to recognize defects, organizations should predefine defects by category. This is a one-time event, or an event that could be performed annually. It would involve the knowledgeable, respected individuals from all major areas of the IS organization. The group should be run by a facilitator. The objective is to identify the errors/problems that occur most frequently in the IS organization and then get agreement that they are, in fact, defects. A name should be attached to each category of defect. The objective of this activity is to minimize conflicts over the validity of defects. For example, developers may not want to acknowledge that a missing requirement is a defect, but if it has been previously defined as a defect category then that conflict can be avoided.

The steps involved in defect discovery are as follows:
1. Find Defect: Discover defects before they become major problems.
2. Report Defect: Report defects to developers so that they can be resolved.
3. Acknowledge Defect: Obtain development acknowledgement that the defect is valid and should be addressed.

1. Find Defect:
Defects are found either by preplanned activities specifically intended to uncover defects (e.g., quality control activities such as inspections, testing, etc.) or by accident (e.g., users in production).

Techniques to find defects can be divided into three categories:

Static techniques: Testing that is done without physically executing a program or system. A code review is an example of a static testing technique.

Dynamic techniques: Testing in which system components are physically executed to identify defects. Execution of test cases is an example of a dynamic testing technique.

Operational techniques: An operational system produces a deliverable containing a defect found by users, customers, or control personnel -- i.e., the defect is found as a result of a failure.

While it is beyond the scope of this study to compare and contrast the various static, dynamic, and operational techniques, the research did arrive at the following conclusions:

Both static and dynamic techniques are required for an effective defect management program. In each category, the more formally the techniques were integrated into the development process, the more effective they were.

Since static techniques will generally find defects earlier in the process, they are more efficient at finding defects.

When Shell Oil followed the inspection process, they recorded the following results:

For each staff-hour spent in the inspection process, ten hours were saved!

More informal (and less effective) reviews saved as much time as they cost. In other words, worst case (informal reviews) -- no extra cost, best case (formal inspections) -- a 10-1 savings.

Their defect removal efficiency with inspections was 95-97% versus roughly 60% for systems that did not use inspections.

Shell Oil also emphasized the more intangible, yet very significant, benefits of inspections. They found that if the standards for producing a deliverable were vague or ambiguous (or nonexistent), the group would attempt to define a best practice and develop a standard for the deliverable. Once the standard became well defined, checklists would be developed. (NASA also makes extensive use of checklists and cross references defects to the checklist item that should have caught the defect). Inspections were a good way to train new staff in both best practices and the functioning of the system being inspected.

2. Report Defect:
Once discovered, defects must be brought to the developers' attention. Defects discovered by a technique specifically designed to find them can be reported by a simple written or electronic report. However, some defects, are discovered more by accident -- i.e., people who are not trying to find defects. These may be development personnel or users. In these cases, techniques that facilitate the reporting of the defect may significantly shorten the defect discovery time. As software becomes more complex and more widely used, these techniques become more valuable. These techniques include computer forums, email, help desks, etc.

It should also be noted that there are some human factors/cultural issues involved with the defect discovery process. When a defect is initially uncovered, it may be very unclear whether it is a defect, a change, user error, or a misunderstanding. Developers may resist calling something a defect because that implies "bad work" and may not reflect well on the development team. Users may resist calling something a "change" because that implies that the developers can charge them more money. Some organizations have skirted this issue by initially labeling everything by a different name -- e.g., "incidents" or "issues." From a defect management perspective, what they are called is not an important issue. What is important is that the defect be quickly brought to the developers' attention and formally controlled.

3. Acknowledge Defect:
Once a defect has been brought to the attention of the developer, the developer must decide whether or not the defect is valid. Delays in acknowledging defects can be very costly. The primary cause of delays in acknowledging a defect appears to be an inability to reproduce the defect. When the defect is not reproducible and appears to be an isolated event ("no one else has reported anything like that"), there will be an increased tendency for the developer to assume the defect is invalid -- that the defect is caused by user error or misunderstanding. Moreover, with very little information to go on, the developer may feel that there is nothing he or she can do anyway. Unfortunately, as technology becomes more complex, defects which are difficult to reproduce will become more and more common. Software developers must develop strategies to more quickly pinpoint the cause of a defect.

Strategies to pinpoint cause of defect:
One strategy to pinpoint the cause of a defect is to instrument code to trap the state of the environment when anomalous conditions occur. Microsoft's Dr. Watson concept would be an example of this technique. In the Beta release of Windows 3.1, Microsoft included features (i.e., Dr. Watson) to trap the state of the system when a significant problem occurred. This information was then available to Microsoft when the problem was reported and helped them analyze the problem.

Writing code to check the validity of the system is another way to pinpoint the cause of a defect. This is actually a very common technique for hardware manufacturers. Unfortunately diagnostics may give a false sense of security -- they can find defects, but they cannot show the absence of defects. Virus checkers would be an example of this strategy.

Finally, analyzing reported defects to discover the cause of a defect is very effective. While a given defect may not be reproducible, quite often it will appear again (and again) perhaps in different guises. Eventually patterns may be noticed which will help in resolving the defect. If the defect is not logged, or if it is closed prematurely, then valuable information can be lost. In one instance reported to the research team, a development team was having difficulty reproducing a problem. Finally, during a visit to the location, they discovered how to reproduce the problem. The problem was caused when one of the users fell asleep with her finger on the enter key. In order to protect the user, the circumstances surrounding the problem were not reported to the developers until the on-site visit.

A resolution process needs to be established for use in the event there is a dispute regarding a defect. For example, if the group uncovering the defect believes it is a defect but the developers do not, a quick-resolution process must be in place. While many approaches can address this situation, the two most effective are:

Arbitration by the software owner -- the customer using the software determines whether or not the problem is a defect.

Arbitration by a software development manager -- a senior manager of the software development department will be selected to resolve the dispute.

No comments: