Archive for the ‘Critical Infrastructure’ Category

New malware brings cyberwar one step closer

|
Comments Off

Additional Discussion of the April China BGP Hijack Incident

|
Comments Off

My blog post last week on the April 8th China BGP hijack incident generated significant discussion and raised additional questions in both the media and research / engineering community.

In particular, I agree with Dmitri Alperovitch’s recent McAfee blog post that “This topic is highly technical and very difficult to explain to people not fully immersed into the BGP routing jargon… this incident underscores the very serious problems that exist on the Internet due to the system of trust…” It is also clear from my reading of articles and Dmitri’s blog that much of the press mischaracterized his initial estimate of the hijack impacting “15% of the Internet” — Dmitri was referring to routes and not traffic.

In his blog, Dmitri goes on to argue that any analysis of the incident should focus on the upstream ISP (AS23724) responsible for the hijack.

Both at the time of the incident in April and prior to my posting of this China hijack blog, I had private conversations with operations staff at several of AS23724′s upstreams, network operators around the world, collaborators in other security companies, and Arbor’s own resident engineers in the region. All of these private discussions reflect the sentiment espoused in public engineering forums that the China hijack had modest to minimal impact on Internet traffic volumes, including this RIPE statement, NANOG discussion thread and even the BGPMon blog at the heart of the controversy.

In the below graph, I chart traffic from 80 ATLAS providers around the world that terminates or transits AS4134 (the primary upstream to the Chinese company responsible for the BGP hijack). Traffic is shown as a weighted average percentage of all inter-domain traffic using the peak five minute daily value for the month of April 2010. As in my

The main take-away from the above graph is that ATLAS data shows no statistically significant increase for either AS4134 or AS23724. While we did observe modest changes in traffic volumes for carriers within China, the BGP hijack had limited impact on traffic volumes to or from the rest of the world.

As a couple readers of my blog observed (link to comments), traffic volumes provide an awkward measure of the security implications of a BGP hijack. In particular, the volume of hijacked traffic change depends on:

  1. The scope of the hijack. Specifically, how far and well the routes propagated. In this case, the additional ASPath length meant few peers and upstreams preferred the AS23724 routes. In other words, the leak had limited scope.
  2. Termination of the traffic. Did the China ISP (AS23724) drop hijacked packets or complete the connections? For example, in the former drop scenario, my laptop might just send 40 byte TCP Syn packets to an unresponsive destination. Since the TCP connection does not complete, my laptop will never send any significant volume of data traffic — China would only get lots and lots of Syns (and the rest of the world ICMP unreachables in exchange). UDP and ICMP, of course, are slightly different stories. On the other hand, if the traffic transits China or Chinese computers / VMs otherwise respond to the TCP requests, than significantly larger volumes of hijacked data traffic would flow from the rest of the world to China.
  3. Objective of the hijack. Though some of the media have drawn cyber-war conclusions, we may likely never know if this was a misconfiguration, practice run, or intentional hijack. In any case, traffic volumes do not map well to the different possible security threats. For example, if the goal was to disrupt Internet communication and “blackhole” hijacked traffic, then we would expect to see a global decrease in Internet traffic and a large volume of Syn directed at China. However, the technical particulars of the April hijack were not particularly well-suited for this type of large-scale Internet disruption (see this article or an earlier blog post for examples on how to do this correctly).

    Alternatively, the intent could be a trial run exploring worm-like attacks against the global routing infrastructure. In this scenario, a small set of well-crafted malformed routing messages (hidden in a hijack of thousands of other routes) quickly propagates across the Internet crashing core routers and switches. Or something a little like this event in August (as a side note, Xiaowei did absolutely nothing wrong in her August experiment and is a really nice person to boot). I also note that man-in-the-middle attack, relatively low volumes of diverted traffic, and thousands of bogus routes announced as a smokescreen (credit for this scenario to my colleague Danny McPherson in a NYTimes interview). In other words, basically close to what we observed on April 15th.

    Or maybe, of course, this was just a typo in a configuration file.

As I observed in my earlier blog, inadvertent BGP route leaks and intentional hijacks have been part and parcel of Internet routing for the last twenty years. BGP hijacks happen all the time. The research and operations community have written hundreds of papers on the topic (including my own small contributions).

If I have not been clear up to this point, we have a problem. We need to address BGP security (as well as DNSSec, botnets, DDoS and other critical infrastructure threats) as quickly as possible. The Internet’s future may depend on it.

 
- Craig
 
 

China Hijacks 15% of Internet Traffic?

|
Comments Off

On Wednesday, the US China Economic and Security Review Commission released a wide-ranging report on China trade, capital markets, human rights, WTO compliance, and other topics. If you have time to spare, here is a link to the 324 page report.

Tucked away in the hundreds of pages of China analysis is a section on the Chinese Internet, including the well-documented April 8, 2010 BGP hijack of several thousand routes (starting on page 244).

To review, shortly around 4am GMT on April 8th a Chinese Internet provider announced 40,000 routes belonging to other ISPs / enterprises around the world (though many were for China based companies). During a subsequent roughly 15 minute window, a small percentage of Internet providers around the world redirected traffic for a small percentage of these routes to Chinese address space. RIPE provides a link to a list of some of these prefixes (as well as indicating the impact on European carriers was minimal) and Andree Toonk and his colleagues at BGPmon have a nice synopsis at the BGPMon blog.

Following shortly on the heels of the China hijack of DNS addresses in March, the April BGP incident generated a significant amount of discussion in the Internet engineering community.


panic

Any corruption of DNS or global routing data (whatever the motive) is a cause of significant concern and reiterates the need for routing and DNS security. But in an industry crowded with security marketing and hype, it is important we limit the hyperbole and keep the discussion focused around the legitimate long-term infrastructure security threats and technical realities.

So, it was with a bit of a surprise that I watched an alarmed Wolf Blitzer report on prime time CNN about the China hijack of “15% of the Internet” last night. A bit less diplomatic, a discussion thread on the North American Network Operator Group (NANOG) mailing list called media reports an exaggeration or “complete FUD”. Also on the NANOG mailing list, Bob Poortinga writes “This article … is full of false data. I assert that much less than 15%, probably on the order of 1% to 2% (much less in the US) was actually diverted.”

If you read the USCESRC report, the committee only claims China hijacked “massive volumes” of Internet traffic but never get as specific as an exact percentage. The relevant excerpt from the report below:



The USCESRC cites the BGPMon blog as the source of data on “massive traffic volumes”. But curiously, the BGPMon blog makes no reference to traffic — only the number of routes.

You have to go to a National Defense interview with Dmitri Alperovitch, vice president of threat research at McAfee, to first come up with the 15% number. Several hundred media outlets, including CNN, the Wall Street Journal, Time Magazine and many more picked up this interview and eagerly reported on China’s hijack of “massive Internet traffic volumes of 15% or more”.

Now certainly, diverting 15% of the Internet even for just 15 minutes would be a major event. But as earlier analysis by Internet researchers suggested, this hijack had limited impact on the Internet routing infrastructure — most of the Internet ignored the hijack for various technical reasons.

And indeed, ATLAS data from 80 carriers around the world graphed below shows little statistically significant increase due to the hijack on April 8, 2010. I highlight April 8th in yellow and each bar shows the maximum five minute traffic volume observed each day in April going to the Chinese provider at the center of the route hijack.


china hijack

While traffic may have exhibited a modest increase to the Chinese Internet provider (AS23724), I’d estimate diverted traffic never topped a handful of Gbps. And in an Internet quickly approaching 80-100 Tbps, 1-3 Gbps of traffic is far from 15% (it is much closer to 0.015%).

In fairness, I should note that I don’t know how Mr. Alperovitch obtained his 15% number (the article does not say) and a hijack of 40k routes out of a default-free table of ~340K is not far from fifteen percent. But of course, routes are different from traffic. I also add that both China denied the hijack and some Internet researchers suspect the incident was likely accidental.

The global BGP Internet routing system is incredibly insecure. Fifteen years ago, I wrote a PhD thesis (link available here) using experiments in part capitalizing on the lack of routing security. My research injected hundreds of thousands fake routes (harmless!) into the Internet and redirected test traffic over the course of two years. A decade or more later, none of the many BGP security proposals have seen significant adoption due to a lack of market incentives and non-legitimate routes still regularly get announced and propagated by accident or otherwise. Overall, the Internet routing system still relies primarily on trust (or “routing by rumor” if you are more cynical).

We need to fix Internet infrastructure security, but we also need to be precise in our analysis of the problems.

UPDATE: Additional discussion and statistics on the incident are now available in a follow-up blog at http://asert.arbornetworks.com/2010/11/additional-discussion-of-the-april-china-bgp-hijack-incident.

- Craig

 
 

Facebook Outage

|
Comments Off

Another Facebook outage, an outpouring of tweets, press articles and an obligatory ATLAS post below.

We use ATLAS data to graph Facebook (AS32934) traffic with 80 ISPs around the world between 5pm September 22 and 5pm EDT today. You can see Facebook traffic plummet around 1:30pm and return shortly after 4pm. From a quick glance at the data, the outage appears to be global (impacting all of the 80 ISPs).


We have no information on the root cause (no sign of obvious BGP instability or DDoS).

Lots of speculation on twitter.

UPDATE 8:30pm Sept 23: Facebook explains this was an internal configuration management problem.

- Craig

 

Toronto Subway Grinds To Halt Due To Computer Error

|

Computer glitch screws with the signals on the TTC in Toronto this morning. Apparently the glitch was severe enough to bring the entire system to a grinding halt.

From The Globe and Mail:

“Rarely do we ever have to shut down the entire system,” Mr. Ross said. “But when we’re unable to move trains or see the trains, signals and switches that is a very significant safety concern.”

Mr. Ross said the transit agency is still trying to pinpoint what caused the computers at transit control to fail.

He said a similar computer malfunction prompted a five-minute shut-down on the Yonge-University-Spadina line earlier this week.

Knowing what little I do of their network there, I can only imagine.

:)

Article Link

(Image used under CC from denmar)



Siemens: German SCADA Customer Hit By Worm

|

Sometimes I just stare at a point in space while I try to wrap my head around something like this.

From Techworld:

Siemens confirmed Tuesday that one of its customers has been hit by a new worm designed to steal secrets from industrial control systems.

To date, the company has been notified of one attack, on a German manufacturer that Siemens declined to identify. “We were informed by one of our system integrators, who developed a project for a customer in process industries,” said Siemens Industry spokesman Wieland Simon in an email message. The company is trying to determine whether the attack caused damage, he said.

So how, might you ask, does the worm get access to the Siemens SCADA systems?

With a DEFAULT PASSWORD

To quote Denis Leary, “Make sure to get your whole head in front of the shotgun. Thanks for calling!”

Article Link

(Image used under CC from hans.gerwitz)



A Brief Look at Facebook Outage

|
Comments Off

Since we’ve written about Google’s multiple past outages (e.g., the GoogleLapes of May 2009 and the more recent Google Blip), it seems only fair to quickly cover Facebook’s problems last Friday.

The below graph shows coarse grain Facebook (ASN 32934) traffic statistics from 60 randomly selected ISPs around the world. While most press / blog coverage (e.g. Gigaom’s “Facebook Sees Major Outage”) pegged the disruption at 5:30 pm ET, the traffic data suggest Facebook’s problems began much earlier in the day.

facebook_outage

Normally, Facebook’s diurnal traffic follows the same pattern as other social media and interactive consumer sites. Generally, Facebook traffic reaches a low over night at 2am and then grows to its daily peak at 5pm EDT before declining briefly before a second smaller peak at 9pm ET (the peaks likely matching the North American end of work day and prime time across PDT and EDT).

But beginning Friday morning at 2am, Facebook saw dozens of modest traffic drops (each of a few Gigabits) until plumitting 30 Gbps at 5pm EDT for roughly twenty minutes.

What happened to Facebook?

While there is no shortage of speculation on Twitter and operations mailing lists, Facebook so far is not saying. I think a recent post to an engineering outage discussion list sums up the situation:

“Given Facebook’s complexity, who knows what the problem was. Load balancer or layer 7 filter/re-writer (think F5) issues? Back-end server problems? Software misconfiguration? … Some developer deciding to just roll something out in the middle of the day (as is quite common with social networking sites these days)? We’ll probably never know.”

Facebook has come a long way from a few hundred Harvard freshman looking for dates. As Facebook accelerates past 400 million users and pursues goals of nothing short of taking over the web, the social media giant has become critical infrastructure — at least from the perspective of millions of consumers and ISP support desks.

In an upcoming series of blogs, we’ll explore the growing Internet infrastructure footprint of Facebook, Google and other dominant Internet content companies.