11:00 AM
Beware of Microbursts, Cisco Warns Banks
You may be measuring data latency in your data center or trading environment, but you may not be measuring it in small enough increments to detect very short spikes in traffic that could be causing data packets to be dropped, in turn causing orders to be dropped or market data to fall out of sync with trading strategies, Cisco executives said yesterday. High-frequency trading is the most logical place for bankers to be worried about microbursts; however, anywhere a firm needs to analyze fast-moving data - e.g. real-time risk management and performance management - microbursts could potentially be an issue.To Wikipedia, a microburst is a small column of sinking air that produces high winds that can knock over fully grown trees and that usually lasts for a couple of seconds. In Cisco's definition, "It is a traffic pattern that's like a spike except microbursts are short in time, such as 100 microseconds," says Pramod Srivatsa, manager, server access virtualization at Cisco, who spoke to Bank Systems & Technology in an interview yesterday afternoon. "If people don't measure this appropriately, they won't be aware that they're having microbursts." Customers such as Nasdaq have shown us evidence of the small bursts of traffic referred to as microbursts.
"If an infrastructure isn't designed to handle these microbursts, there is a very real probability that when you have them, especially during high volatility times - which are opportunity-rich times - your infrastructure won't be able to handle the short-lived congestion, which will result in packet loss," Srivatsa says. "You can literally miss market data that's critical to you or orders can get dropped, therefore you're out of the market."
"As these microbursts happen and you lose packets, not only do you lose revenue, but applications can get misaligned," adds Dave Malik, director, solution architecture at Cisco. For instance, feed handlers could be feeding information to algorithmic trading engines to guide their decisions of which venue to route a trade to or which trades to go short or long on. If one strategy is off because the data it receives is stale, other strategies can be affected. "They have to re-sync up and that re-syncing can take much longer than a microsecond," Malik notes. "The infrastructure needs to be solid at handling these spikes, but also be able to detect them. You can't fix what you can't measure. Once you measure it, then you know where the problem is and then you go after that problem."
The trouble is, people tend to measure nominal latency, the speed of data movement throughout the day, Srivatsa says. "When you have microburst traffic, the latency characteristics are really different," he says. He offers an analogy to highway traffic: Say you're trying to measure the performance of an onramp onto a highway. Say this onramp can support one car every minute. In an hour you can pass 60 cars through this onramp. You might look at the number of cars going through this onramp over a 10-hour period and find that 120 cars have passed through. "One would conclude from that that the performance is great, in 10 hours I could have supported 600 cars and only 120 cars went through, so the onramp must be functioning well," Srivatsa says. "But the problem is, you looked at it over a 10-hour period. If you had come between 8:30 and 9:30 a.m., you might have seen 100 of the 120 cars show up at that time. Then, because only one car could go through a minute, all the cars started backing up and some drivers got frustrated and went home."
A number of vendors measure data latency and try to include microbursts in their calculations, including Corvil, cPacket Networks, and Netscout.
The reason Cisco is issuing warnings about microbursts is to promote its Nexus 10 gigabit Ethernet network equipment, which it says was architected to handle microbursts. "We have a port asic crossbar architecture, this is a leap ahead in terms of internal switch architectures, versus what has been traditionally shared memory architecture," Srivatsa says. "If you have this new architecture, the ability to handle microbursts is much improved." It still makes more memory available to buffer high traffic, he says. In some tests, Nexus equipment handles about 1,500 frames, versus competitors that can handle 50 or 60 frames in their network pipes, Cisco says.