from the files of
Networking Unlimited, Inc.
14 Dogwood Lane, Tenafly, NJ 07670
Phone: +1 201 568-7810
Detecting a Timing Race Between Cooperating Processors in a Modular Controller
A modular controller with two internal processors was failing intermittently in a critical control application. Networking Unlimited, Inc. was called in to work with the manufacturer's engineers to track down the cause of the system lockup and develop a solution.
A computer system consisting of two interconnected processors was failing intermittently while running a particular customer's application. The engineers for the manufacturer of the computer system were unable to find any cause for the failures, yet could determine that they were reproducible, albeit seemingly randomly.
Networking Unlimited, Inc. analyzed the protocol used between the processors and identified a timing race in the processor firmware which should have been impossible to trigger. Writing diagnostic code to instrument the operating system we were able to confirm the location of the failure and hypothesize a way to cause the failure.
Bottom Line Results
Knowing exactly what to look for, the manufacturer's engineers were able to capture the failure on a logic analyzer, confirming that every 25ms, one processor was taking over 25µs rather than the normal 5µs to execute its polling loop, causing it to fall out of sync with the other processor when the slow iteration exactly coincided with a 15µs window which occurred once every 120ms on the other processor.
Armed with a solid understanding of the nature of the failure, the manufacturer was able to determine the alternatives for implementing a repair and negotiate a suitable workaround with the customer.
Page | Company
Profile | Capabilities
| White Papers
Copyright 1999-2000 © Networking Unlimited Inc. All rights reserved.