Your better team players are desperate to get to the bottom of an end user problem, but it becomes like a scene from the latest Hercule Poirot novel, trying to detect just which part of your infrastructure or application suite is the problem. A lot of companies are going through this right now in their upgrades to Microsoft’s XP Service Pack 2 and application complexity is going up in a web services world.
That’s when Oren Modai, VP operations for Identify, says that everyone wishes they had his product, which he describes as a “black box”, like a flight recorder for .Net and J2EE applications.
“At the moment finding out what caused a software problem is an unstructured process, which usually involves someone coming out to the end user department and saying, ‘Can I have your server,’ and taking the entire application, along with its hardware, offline.”
“They just want to recreate the fault and sometimes the attempts to recreate the problem are more intrusive than the problem itself. It is communication intensive, involves multiple locations, it’s error prone and it usually involves some formal trouble tickets process that goes through the motions.”
“What people want is a factual basis for application problem resolution,” says Modai.
The problem (and therefore Identify’s opportunity) comes from the complexity of today’s distributed applications. The more hardware and software that any given application travels across, the greater the number of things that needs to be checked when something goes wrong. This involves a large number of separate participants in the problem process. Each group has their own sets of priorities, time pressures and skill constraints, and the result can be a huge escalation in costs.
Modai keeps coming back to that one statistic. What if you could just go straight to the root cause of a problem and set to work solving it? That would be a saving of 80% of the time and cost of each problem, he insists.
The Identify AppSight Application Support System monitors and records applications and can be applied in development, in testing or when a program is already in production. It reports at three levels, at the user level, offering a replay of all the user screens; at the level where the software systems is interacting with the operating system and finally at the code level.
It can be switched on and off at a moment’s notice and can run 24/7 or in reactive mode after an unplanned outage has happened, when the IT department knows that a problem is likely to recur.
Modai says, “AppSight does carry a CPU overhead, but it is about 2% to 3% CPU time. And you can set it so that the overhead is capped on mission critical systems. It can be issued from a single PC and multiple black boxes can report back to a web-based portal for constant monitoring.
He describes what he sees as the traditional problem solving mechanism whereby an end user gets an error page mid-transaction.
The call to the helpdesk results in a helpdesk technician visiting. They talk about the problem, but it won’t recur. The support database has nothing in it similar, and the problem is allocated as likely to be a server side problem.
An operations engineer checks the server infrastructure and cannot recreate the problem and escalates it to the development team who wrote the module that was being used at the time of the error.
Development tries to recreate it and fails. There is more discussion between the user and the helpdesk and the development team to determine exact settings at the time the problem occurred. A day goes by.
Finally they recreate the problem, but it doesn’t show why the problem occurred and they have to go in search of the root cause. They check the database, the application server and finally locate the problem in some of their own business logic code and take just an hour or so to come up with, and test, a patch.
When this kind of problem occurs, argues Modai, “It doesn’t just slow down the end user and the support team, but it slows down the developer too, who should be working on future developments.”
There are labor costs, downtime, and customer dissatisfaction. And in the case of the independent software vendor (ISV), the problems are further escalated by the fact that they are so removed from the user environment, and so when problems happen in their code, they have even less chance of recreating the circumstances under which it occurred.
Because of this, many of Identify’s recent successes are among the ISV community, with Microsoft selecting it to use in Microsoft Technology Centers, and with financial software house Patsystems, Legato, NCR and Cerner, a healthcare systems specialist, all choosing to integrate the product into their suites this year.
The AppSight product creates a set of separate analysis views. The support technician may only want to view user actions; a second tier support person may want to drill down into application configuration and a developer would most likely operate at the code level, looking at argument values and exceptions.
Now AppSight is pushing the idea that it can alleviate the upgrade to Microsoft’s all important, but not so welcome, XP Service Pack 2, by accelerating the testing and rollout process.
Identify’s CEO Yochi Slonim said recently, “Identify’s AppSight Black Box software can be a tremendous help as organizations plan their rollout effort on XP Service Pack 2, because it provides deep visibility into the application execution, from the user level, to system interactions, down to the code, so if an application fails after the upgrade, as it very well may, AppSight will help IT staff and application developers quickly pinpoint why, so they can resolve the issue.”
The statement goes on to give an example. “SP2’s new security features may block certain components from talking to others, causing an application to break. Or, an application may not run well with Windows XP’S new firewall defaults. A migration team can use AppSight to capture ‘gold logs’ of the application as it runs well in production under XP, then record it again running under XP SP2 to see the differences at all levels of execution and quickly pinpoint problem areas. Using AppSight, you would he able to see the exact list of configuration parameters that the application accessed that were different. This immediately reduces the time for root cause analysis and problem resolution.
It looks like Service Pack 2 is going to stimulate sales, but the company has not always been this bullish. Founded in 1996 it was only when Slonim, one of the founder’s of Mercury Interactive, joined the company in 2000, that its fortunes began to turn. It was only in February that it secured its most recent round of funding, taking on a further $15m after a year in which it doubled in size. The company is already profitable and this money will take it into new markets, and some of it is for strengthening the European operation. Identify is now 130 strong and Rethink estimates it has revenues of around $30m, although it doesn’t give out those numbers.
The company was in fact a Mercury spin off and it is from Mercury that its likely competition may come in the future.
The secret of the turnaround seems to be not in any startling new way that its software works, but more in how it goes to market.
Slonim has helped the company target ISVs and has also driven a new approach to its sales. “We go into companies and go over problems they have already solved. We install our software, which takes a matter of hours, and then solve an historical problem that they’ve already solved.
“All the customer then has to do then is compare the time it took without our product to the time it took with the product,” said Modai.
“Customers can start with a very light profile and gradually extend it, just monitoring one application at first. We are able to charge 100 times what a simple debugger costs, but we don’t charge based on the number of black boxes you install. From one license you can install and take down as many copies of the software as you like.”
And perhaps it’s that rather unusual element of the licensing process that is making it successful. The black box can be in place just for a day, or permanently, across a huge organization or on a single server. That’s the nature of application root cause determination, so Identify needed some innovation in licensing and now it has it.