This paper will explain a scientific approach to problem solving. Although it is written to address Information Technology related problems, the concepts might also be applicable in other disciplines. The methods, concepts, and techniques described here is nothing new, but it is shocking how many “problem solvers” fail to use them. In between I will include some real-life examples.
Why do problem solvers guess in stead of following a scientific approach to problem solving? Maybe because it feels quicker? Maybe a lack of experience in efficient problem solving? Or maybe because it feels like hard work to do it scientifically? Maybe while you keep on guessing and not really solving, you generate more income and add some job security? Or maybe because you violate the first principle of problem solving: understand the problem.
Principle #1. Understand the *real* problem.
Isn’t it obvious that before you can solve, you need to understand the problem? Maybe. But, most of the time the solver will start solving without knowing the real problem. What the client or user describe as “The Problem” is normally only the symptom! “My computer does not want to switch on” is the symptom. The real problem could be that the whole building is without power. “Every time I try to add a new product, I get an error message” is the symptom. Here the real problem could be “Only the last 2 products I tried to add gave a ‘Product already exists’ error”. Another classic example: “Nothing is working”…
You start your investigation by defining the “real problem”. This will entail asking questions (and sometimes verify them), and doing some basic testing. Ask the user questions like “when was the last time it worked successfully?”, “How long have you been using the system?”, “Does it work on another PC or another user?”, “What is the exact error message?” etc. Ask for a screen-print of the error if possible. Your basic testing will be to ensure the end-to-end equipment is up and running. Check the user’s PC, the network, the Web Server, Firewalls, the File Server, the Database back-end, etc. Best-case you will pint-point the problem already. Worst-case you can eliminate a lot of areas for the cause of the problem.
A real life example. The symptom according to the user: “The system hangs up at random times when I place orders”. The environment: The user enters the order detail on a form in a mainframe application. When all the detail is completed, the user will tab off the form. The mainframe then sends this detail via communication software to an Oracle Client/Server system at the plant. The Oracle system will do capacity planning and either returns an error or an expected order date back to the mainframe system. This problem is quite serious, because you can loose clients if they try to place orders and the system does not accept them! To attempt to solve this problem, people started by investigating: 1) The load and capacity of the mainframe hardware 2) Monitoring the network load between the mainframe and the Oracle system 3) Hiring consultants to debug the communication software 4) Debugging the Oracle capacity planning system After spending a couple of months they could not solve the problem.