COTS Journal

Case Study: Benchmarking Real-Time Determinism in Windows CE

By: Chris Tacke and Lawrence Ricci, Applied Data Systems

Detailed examinations and benchmark data show that for many real-time military systems, Windows CE 3.0 and Microsoft’s newest CE .NET offer ample determinism, predicable latency and excellent real-world saturation results.

The real-time deterministic performance of Windows CE has been extensively investigated for application in the command and control systems of unmanned aerial vehicles and battlefield robots. With the release of CE .NET, engineers are asking if the new OS is more or less agile than its widely used predecessor 3.0. This case study first establishes the real-time performance of CE 3.0 on the industry standard StrongARM platform, and then compares it in detail to CE .NET, the newest member of the CE family.

Real-time performance was tested by using a standard function generator to create a hardware interrupt to the device. A signal is sent to a COTS single board computer (the “controller”) running Windows CE of both versions, and the time it takes the controller to respond is measured. The latency time and jitter time of this response are the quantified measures of determinism. We measure two responses, one with interrupts directly linked, and one with action occurring based on a Windows event.

We like these tests because of their reliance on full loop testing—beginning with the stimulus of a hardware input and measuring the system output. The tests make no assumptions about the internals of the device. The tests make no references to interrupt latencies or context switch times, and there are no semantic games on the nature of hard and soft real time. The test is a basic measurement of the full system—both hardware and software performance.

Our test setup used count up/count down counters and a high-speed memory scope to let us measure latency, jitter and thread runtime (Figure 1). The results, measured with high-speed, high-precision instruments, were surprising, suggesting that specifications for CE real time performance were too conservative. CE as an RTOS performs far better than generally discussed.

Graphics Master Test Platform

The Graphics Master is a single board embedded computer furnished by Applied Data Systems (ADS) in an application-ready format, inclusive of the CE BSP used in this test. The system is based on an SA 1110/1111 StrongARM running at 206 MHz, with 32 Mbytes of Flash and 32 Mbytes of RAM (Figure 2). While the system includes a high-speed 8-bit microcontroller often used as a real time front-end I/O processor, in this case only general-purpose I/O lines were used for input and output.

The CE builds used for this test were CE 3.0 “Max All” and a similarly configured CE .NET 4.0. These were far from minimum builds and included many, indeed most, OS features. Apart from the driver/hardware level code, this was the standard out-of-the-box OS software with full graphics, desktop, control panels, networks, browsers and so on.

The hardware here was not completely standard. The interrupt line we used for the test is typically used for a “power on/power off” push button to signal transition in and out of power saving sleep mode. In this service, it is useful to have some small filters on the board to eliminate contact bounce from the typical membrane push button. Once the test started running, it became clear that the OS real-time performance was much quicker than we expected, and was occurring at time intervals inside the contact bounce of a button, so we removed the filters from the board.

Benchmark Software

The benchmark contained two tests. One routine, IST_TEST, was a few lines of code linked directly to the IST (Interrupt Support Thread). When the interrupt occurs, the routine runs with the IST and sets an output on the Graphics Master. This would be the type of code used for the most demanding real-time applications and runs at full interrupt priority.

The second routine, APP_TEST, was analogous to most application software that communicates only with the OS API and messaging system. For this routine, the IST sets a Windows “Set Event” Command event (in this cast the Power Off trigger normally assigned to the particular GPIO line). When the event is recognized by the APP_TEST routine, it runs, turns on a different output, and then shuts it off as soon as it finishes running. This routine requires no knowledge of the driver structure to implement, so it is the better choice for most applications. It also keeps complex application code running within a separate thread, insulating application and system errors from each other. Code for both tests can be found at http://www.applieddata.net/support.

Detailed CE 3.0 Tests

As shown in Figure 3, we set up the apparatus with the controller comfortably monitoring and passing on square wave inputs at a frequency of 5000 Hz, (200 microsecond period peak-to-peak for pulses) with the pulse generator set at 20% duty cycle (each pulse 40 microseconds wide). The input pulse signal was input to channel 1 (bottom trace) to the memory scope, acting as the trigger. The Interrupt-linked IST_TEST was channel 2 (middle trace), and the Windows event-linked APP_TEST was channel 3 (upper trace).

As you can see, the interrupt-linked IST_TEST output lagged the input signal by about 2.5 microseconds. The Windows event-linked APP_TEST output lagged the interrupt input by 16 microseconds.

Most readers will note these numbers are far inside any specification limits typically discussed by Microsoft or most professionals. To help quantify this performance in meaningful terms, consider that a projectile with a muzzle velocity of 1100 m/s moves all of 2.75 millimeters during this latency. CE 3.0 performance is clearly, in most environments, well inside the time window used to discuss determinism.

Since determinism is a “statistical” property, we needed to measure this latency over an extended number of samples to quantify jitter, or the variation in latency. The scope used for testing has a feature to average samples, giving us a good way to measure this jitter. The rise time of the controller output is sharp—only a few nanoseconds—so as the pulse would jitter back and forth; an average would be created for the last group of samples. The average for two pulses, jittered back and forth, would be a one step stairway to the top of the pulse. For three staggered pulses, two steps, and so on. For the 128 samples offered by the scope, this gave the averaged output a smooth slope up to a flat-topped peak. The time duration of the slope is taken as the jitter.

We also need to simulate and measure the system under load. To simulate load, we use the Polygons test program shipped with Windows CE and running at high priority. Looking at this averaged response for Windows CE 3.0 under load we obtained the results shown in Figure 4.

Notice in this figure, the horizontal axis is changed to 25 microseconds per cm. Here we can see the ISR_TEST, linked to the interrupt, is still very deterministic with respect to when it starts, even under load. jitter is about 5 microseconds. Its completion time, however, becomes more varied, as expected. Both IRS_TEST and APP_TEST programs are deterministic as to when they signal output “on”, but they are dependant on code runtime as to when they shut output “off”.

The response to the Windows event-scheduled APP_TEST is in some ways the opposite. The OS is very deterministic in ensuring the APP_TEST will run within the 133-microsecond window even contending with Polygons, but a measurable number of executions occur much earlier, at peaks of about 15 microseconds and 30 microseconds. So, we should understand that for Windows CE 3.0, determinism of an interrupt-linked process is best interpreted with respect to how quickly it starts. Determinism of Windows event-scheduled processes is best interpreted with respect to when they must start, with the time interval understood more as a “back wall”.

Statistical Performance Under Load

Finally, any analysis of determinism needs to understand the behavior under load of “outliers”, the small number of responses (early or late) that are far from the mean. To measure these outliers, we use the persistence feature of the scope, which records and overwrites all traces for a total of many hundreds of thousands of traces. The result does show outlier behavior (Figure 5).

Here we can see the deterministic behavior for Windows CE 3.0 measured over many cycles. The dark line is the average recorded response. The grey outlined line is the last trace. All interrupt-linked processes were completed within about 50 microseconds and all windows event-linked processes were completed within about 125 microseconds.

We ran all the same tests as those in Figures 3-5 showing CE 3.0 with CE .NET. In the interest of brevity, the final oscilloscope image for CE .NET is shown in Figure 6, loaded and sampled for thousands of traces.

In Figure 6 we see CE .NET under load. Notice that both the interrupt-linked IST_TEST and the Windows event-linked APP_TEST are more tightly defined, more deterministic in response. While almost as quick as CE 3.0, CE .NET is a little bit more predictable, especially for the APP_TEST under load.

Saturation Tests

All the aforementioned charts were made with the system running at 5000 Hz. We also wanted to stress the system to failure. This was accomplished by running loaded and unloaded systems at higher and higher interrupt frequencies from the pulse generator until the particular outputs no longer occurred deterministically. For both Windows CE 3.0 and CE .NET the results are shown in Figure 7, condensed on the same table with latency and jitter information.

We were not expecting anything like the interrupt-linked performance for CE 3.0. We were watching a full graphic system servicing interrupts comfortably above 100 kHz, and even running background graphics at 80 kHz. If we had shut off the APP_TEST routine, it doubtless would have reached even higher frequencies.

CE .NET is still very agile, even though it carries a lot more baggage in terms of network support and graphics—the perennial bugbear of real-time systems. CE .NET serviced interrupts right up to 47 kHz. While not exactly a threat to the AM band on your radio, this would have been a useful frequency for Marconi. For the range of events that are associated with electrical or mechanical devices, a frequency of 47 kHz, corresponding to a period of 20 microseconds, should be quite sufficient.

What is notable, and actually very positive with respect to the real-time performance of CE .NET, is the way the maximum frequency of the Windows event-linked APP_TEST changed under background load. We can see that for CE .NET, the maximum frequency changed only about 2% when Polygons was running. In CE 3.0, MF_APP changed more than 50% as a result of backgrounds load. This means that for the form of real-time programming most often used (Windows event-linked tasks) CE .NET is more deterministic than CE 3.0.

Also, we can see the CE .NET situation with jitter is even better than CE 3.0. In fact, under load, the CE .NET system actually becomes more deterministic. The jitter on the APP_TEST actually reduces, not only to less than a loaded 3.0 system, but less than a CE .NET system under no load! The actual latency numbers are also quite good, with an interrupt-linked response starting 6.3 microseconds after interrupt, and a windows event-linked response starting only 53 microseconds after interrupt.

Recommendations

While these tests give strong endorsement of CE as a RTOS, the engineer should understand that this was a test of a particular platform, with a particular BSP. The real-time performance of the OS is completely limited by the quality and efficiency of the low level code that connects the OS to the actual chip registers, and this varies with the particular system tested.

However, we were very pleased with the real-time response of CE 3.0 and CE .NET on this platform. While for certain tests CE 3.0 seemed to have an edge, we think that for the realistic cases—Windows events-linked tasks on loaded systems—CE .NET is the clear choice.

Finally, this series of tests give a good general guideline for CE application to real-time process. If the time intervals are measured in milliseconds, don’t analyze too deeply, you are probably OK. If the time intervals are measured in tens to hundreds of microseconds, think about it and perhaps test your target system and BSP. If time intervals are single-digit microseconds, test carefully and consider hardware-based interrupt handling. So, for the majority of battlefield systems; Windows CE should be excellent choices for real-time applications.

Applied Data Systems
Columbia, MD.
(301) 490-4007.
[www.applieddata.net].

© 2009 RTC Group, Inc., 905 Calle Amanecer, Suite 250, San Clemente, CA 92673