Archive for the ‘Arbor Networks’ Category

World Cup versus the Internet

|

As the World Cup enters the knock out phase, a quick look at the impact of the games on the Internet infrastructure.

In particular, will millions of office workers (lacking television access during office hours) drive an overwhelming flood of desktop video and disrupt Internet communication? Has the Internet finally met its match in the World Cup?

You might think so given the hundreds of press articles predicting cataclysmic World Cup Internet doom. My favorite (from this week’s Sunday Scotland Herald):


heralrd

But so far, fears of overwhelmed backbones and Internet interruptions appear unfounded. We’ll look at some specific numbers below using anonymous traffic engineering statistics from ISPs participating in the ATLAS Internet Observatory.

Overall, we estimate a 30% increase in backbone traffic due to World Cup video — sizable, but not overwhelming. In a few instances, the World Cup even lead to decreases in Internet traffic as millions of consumers paused their Web surfing to watch the post business day games on television.

Though some secondary online services (e.g. Twitter) fared less well with periodic outages. In particular, Twitter fell victim to massive “tweetstorms” topping 3,000 World Cup messages per second.

Despite some reports of slow access to sports web sites (e.g. ESPN), anecdotal discussions with providers suggests video quality has been high in the US and UK via the primary video distributors of ESPN3 and the BBC iPlayer (both using Akamai).

Akamai and BBC have reported record numbers topping 800,000 concurrent connections. The high stakes bragging rights for the record to the world’s largest Internet video event has even lead to a war of words between ESPN and CBS (ESPN claims the World Cup as the world’s largest Internet event and CBS argues for the 1.15 million visitors viewing Brigham Young / Florida game).

While many providers restricted live World Cup Internet video to paying customers (e.g. ESPN) or geographic region (e.g. BBC and CBC), Univision (also using Akamai) provided a popular (and colorful) free global feed. Fans also had multiple other commercial options depending on their geographic region plus dozens of “underground” video streaming sites.

In the first graph below, we look at ATLAS data during the first week of the World Cup. In particular, we compare inter-domain Flash traffic between June 11 and 18 in blue with Flash traffic averaged over “normal” (i.e. not World Cup) weeks in green. Both datasets use traffic from 55 randomly selected ISPs in Europe and Americas. We note that these inter-domain measurements do not include local cache traffic.

World Cup Video Traffic

The largest increase in Flash traffic came on Thursday June 15th with video peaks more than doubling from an average of 400 Gbps to 1 Tbps. The jumps in June 15 traffic seems to correlate with interest in the Brazil and North Korea match (ending 2 – 1).

The next graph looks at Flash traffic for a particular day, June 23. The ESPN3 schedule began with 9:30am EDT Slovenia vs. England and USA vs. Algeria C followed by Ghana vs. Germany and Australia vs. Serbia at 2pm EDT. All times in the graph are EDT.

June 23 Flash Traffic

Again, Flash more than doubled during each of the game periods.

But in the scheme of things, Flash comprises a small percentage of Internet traffic and overall inter-domain bandwidth did not exhibit dramatic gains during the World Cup (i.e. unlike Internet traffic during the Obama inauguration).

The below graph shows both Flash (in purple) and Web (TCP port 80) traffic across 55 randomly selected ISPs on June 23. Web traffic possibly shows modest decreases during the peaks of World Cup coverage.

Comparing June 23 Web and Flash Traffic

In fairness, inter-domain traffic provides only a limited measure of World Cup video. For example, local caches serve most of Akamai’s CDN video traffic. While ATLAS anonymous statistics generally do not include this local traffic, many ISPs carefully monitor local Akamai server bandwidth. Three consumer providers graciously provided statistics on both their local Akamai cache and inter-domain Akamai traffic.

We graph the Akamai cache (in blue) and inter-domain (in red) traffic below for the three providers between June 11 and 18.

cdn

Interestingly, the cache traffic remains mostly constant during the first World Cup week. Only inter-domain (presumably HD streaming) exhibits a significant ~25% jump during the games.

So far the Internet has survived, but with the final games coming up we expect far greater consumer interest and even larger traffic volumes. As the Scotland Herald warns, the match up between the Internet and its World Cup nemesis is far from over…

 

 

 

The Battle of the Hyper Giants (Part I)

|
Comments Off

My blog post last month on the rapid growth of Google generated a bit of discussion around Google and its competitors. In particular, this Wired article (“Google’s Traffic Is Giant”) suggests Google’s infrastructure should “frighten the world’s current ISPs” and content distributors (i.e. CDNs like Akamai and Limelight). Going even further, a panicked “EatMoreBeef” Wired reader warned “I’m selling my Akamai stock!”

As Google grows towards 10% of all Internet traffic, will the multi-media search giant squash all competitors under its chrome-plated multi-terabit steamroller?

Or will the global zeitgeist tire of kitten videos and plow YouTube under the treads of hundreds of millions of virtual tractors tending to their farms and social networks on Facebook?

Or will Microsoft’s desktop OS juggernaut link with a growing Azure Cloud and pink phone to form an impenetrable competitive enterprise and consumer road block?

I have no idea.

Given my previous market predictions (“Google will never go above $200!”), I’m not going to try and predict the winners / losers in today’s Hyper Giant fight.

But I do know that that the future of the Internet is being decided today by billions of dollars of investments in data centers, backbone infrastructure and alliances / contracts with other content owners and last-mile providers. And increasingly, Hyper Giant strategies are coalescing around similar infrastructure investments as the giants compete on content, capacity (bandwidth, storage, compute), cost and performance. In other words, Google is not unique in their infrastructure ambitions.

In the next couple of blog posts, I’ll look at several of the “Hyper Giants” to help put all of this in perspective.

The below graphic shows market data and Internet routing and traffic statistics for Google (Alexa #1), Facebook (Alexa #2) and Microsoft (Alexa #5). [To save you from having to go to the Alexa web site, #3 is Yahoo and #4 is Google's YouTube]. I have also included Akamai — one of the largest Internet infrastructure providers most consumers have never heard of — in the list below.
fight

Famously started in Harvard dorm room in 2005, Facebook has grown well beyond its Ivy League roots to become the daily required Internet stop for hundreds of millions of consumers. Facebook content has also evolved beyond short text updates bragging about last night’s party to include thousands of applications, games and petabytes of pictures and video.

And a lot of Internet traffic.

The below graph shows Facebook as a weighted average percentage of all Internet inter-domain traffic. As in previous blog posts, I’m using data from 110 Internet providers around the world anonymously sharing coarse grain traffic engineering statistics. I have also included MySpace traffic as a point of reference.


facebook_big

Between March of 2007 and April 2010, Facebook grew from zero to more than .5% of all Internet traffic globally — placing the dominant social media site well in the top 50 Internet Hyper Giants. And this number does not include the significant volumes of Facebook CDN traffic.

Given the expense and time required for new Internet scale datacenter construction ($500 million or more), most Internet content companies begin life using colo (e.g. Twitter) or leased wholesale space (e.g. Facebook). Many nascent Internet companies (Facebook included) also start out leveraging third-party distribution infrastructure like LimeLight or Akamai (currently the dominant CDN used by Facebook).

But as computing, storage and distribution demands increase, small differences in capital / operational expense become large competitive differentiators at Internet scale (this 2008 NANOG presentation provides a nice overview of datacenter / colo pricing pressures). As Facebook crosses the 30,000 server mark, the company’s strategy has increasingly shifted to focus on its own proprietary infrastructure. Earlier this year, Facebook began construction of its first datacenter to “deliver a faster, more reliable experience worldwide.” Facebook is rumored to have plans on the drawing books for another four mega-scale Internet data centers.

Also like Google, Facebook has aggressively pursued direct peering with last-mile / consumer networks. As of March 2010, Facebook uses direct peering for more than 25% of its traffic (up from 5% in 2009). Like other content heavy Hyper Giants, Facebook also offers a liberal peering policy with a presence at more than 15 public exchange points.

While Facebook may not yet have the same infrastructure footprint as Google or other larger Hyper Giants, the game is clearly afoot. Leveraging wholesale datacenters, third-party CDNs and a raft of partnerships and alliances, Facebook may yet outgrow competitors with an all encompassing social media cum application platform. As of last month, Facebook reportedly surpassed Google as the most visited site on the Internet.

A Brief Look at Facebook Outage

|
Comments Off

Since we’ve written about Google’s multiple past outages (e.g., the GoogleLapes of May 2009 and the more recent Google Blip), it seems only fair to quickly cover Facebook’s problems last Friday.

The below graph shows coarse grain Facebook (ASN 32934) traffic statistics from 60 randomly selected ISPs around the world. While most press / blog coverage (e.g. Gigaom’s “Facebook Sees Major Outage”) pegged the disruption at 5:30 pm ET, the traffic data suggest Facebook’s problems began much earlier in the day.

facebook_outage

Normally, Facebook’s diurnal traffic follows the same pattern as other social media and interactive consumer sites. Generally, Facebook traffic reaches a low over night at 2am and then grows to its daily peak at 5pm EDT before declining briefly before a second smaller peak at 9pm ET (the peaks likely matching the North American end of work day and prime time across PDT and EDT).

But beginning Friday morning at 2am, Facebook saw dozens of modest traffic drops (each of a few Gigabits) until plumitting 30 Gbps at 5pm EDT for roughly twenty minutes.

What happened to Facebook?

While there is no shortage of speculation on Twitter and operations mailing lists, Facebook so far is not saying. I think a recent post to an engineering outage discussion list sums up the situation:

“Given Facebook’s complexity, who knows what the problem was. Load balancer or layer 7 filter/re-writer (think F5) issues? Back-end server problems? Software misconfiguration? … Some developer deciding to just roll something out in the middle of the day (as is quite common with social networking sites these days)? We’ll probably never know.”

Facebook has come a long way from a few hundred Harvard freshman looking for dates. As Facebook accelerates past 400 million users and pursues goals of nothing short of taking over the web, the social media giant has become critical infrastructure — at least from the perspective of millions of consumers and ISP support desks.

In an upcoming series of blogs, we’ll explore the growing Internet infrastructure footprint of Facebook, Google and other dominant Internet content companies.

 

LEET 2010 Coming Up

|
Comments Off

This year I again had the pleasure of serving on the LEET program committee, which let me view some excellent research from people around the world. This year’s submissions were very high quality, and this year’s LEET workshop looks to be a very valuable day for researchers in the field.

    Join us at the 3rd USENIX Workshop on Large-Scale Exploits and Emergent Threats, which will take place in San Jose, CA, on April 27, 2010. LEET ’10 will provide a unique forum for the discussion of threats to the confidentiality of our data, the integrity of digital transactions, and the dependability of the technologies we increasingly rely on.

    The program includes:

    • Keynote Address: “Why Don’t I (Still) Trust Anything?” by Jeff Moss, Founder, Black Hat and DEF CON
    • Invited Talk: “Naked Avatars and Other Cautionary Tales About MMORPG Password Stealers,” by Jeff Williams, Microsoft Malware Protection Center
    • Sessions on threat measurement and characterization, botnets, threat detection and mitigation, and more.

    Check out the full program at

    http://www.usenix.org/events/leet10/tech/

    Connect with the broad community of researchers and practitioners who focus on worms, bots, spam, spyware, phishing, DDoS, and the ever-increasing palette of large-scale Internet-based threats in fostering the development of preliminary work in this diverse area and stimulating discussion of thought-provoking ideas.

    Find out more and register today at

    http://www.usenix.org/leet10/progam

    On behalf of the LEET ’10 Program Committee,

    Michael Bailey, University of Michigan
    LEET ’10 Program Chair
    leet10chair_at_usenix.org

Mike used to work here at Arbor Networks many years ago.

Google Blip

|
Comments Off

While Google’s YouTube outage today generated a steady stream of tweets and blog posts, a quick look at traffic across 50 or so small / mid-size ISPs around the world suggests this was more of a “blip” than a global outage.


twitter

Certainly the outage was nowhere as large nor prolonged as the great “GoogleLapse” last year.

Below is a graph of traffic originating in Google (AS 15169) over the last 24 hours using data from 50 ISPs around the world selected at random. All times are EDT. Looks like a small outage overnight preceded the larger traffic 8am EDT drop-off.

Google Blip

And a quick aside, my intent is not to pick on Google (unless, of course, they do not pick Ann Arbor) — all providers have outages. I just find Google an especially interesting case study given their size and overall impact on the Internet.

How Big is Google?

|
Comments Off

Google’s recent FTTH announcement generated a wave of media coverage and industry discussion. Responses ranged from exuberant local communities racing to sign up to anti-competitive howls from incumbent carriers.

Industry pundits wondered what is Google up to? What will the search giant do with 1Gbps to the home? And more ominously, is Google getting too big?

While this blog post won’t explore the politics / strategy behind Google’s FTTH initiative (except to suggest Google should choose Ann Arbor), we will share some data on Google’s relative size and growth from a global Internet perspective.

Google is big.

And by “big”, I mean really big. If Google were an ISP, it would be the fastest growing and third largest global carrier. Only two other providers (both of whom carry significant volumes of Google transit) contribute more inter-domain traffic. But unlike most global carriers (i.e. the “tier1s”), Google’s backbone does not deliver traffic on behalf of millions of subscribers nor thousands of regional networks and large enterprises. Google’s infrastructure supports, well, only Google.

Based on anonymous data from 110 ISPs around the world, we estimate Google contributes somewhere between 6-10% of all Internet traffic globally as of the of summer of 2009.

The below graph shows the weighted average percentage of all Internet traffic contributed by Google ASNs between June 2007 and July 2009. Most of Google’s rapid growth comes after the acquisition of YouTube in 2007.


Google's Contribution to Global Internet Traffic

Before getting much further, a few words about what we’re measuring. Traffic volumes provide only the most indirect measure of a network’s size or popularity (for example, it takes tens of thousands of Tweets to match the bandwidth of a single HD video). Our anonymous data also does not include internal provider services (e.g. IPTV or VPN) nor data served from co-located caches within provider data centers. Rather, we’re measuring inter-domain traffic, i.e. the traffic between providers (the “inter” in “Internet”).

With all of the above said, inter-domain traffic volumes provide a key metric for understanding Internet topology and the evolution of Internet traffic patterns.

But even traffic volumes tell only part of the story.

The competition between Google, Microsoft, Yahoo and other large content players has long since moved beyond just who has the better videos or search. The competition for Internet dominance is now as much about infrastructure — raw data center computing power and about how efficiently (i.e. quickly and cheaply) you can deliver content to the consumer.

And here again, Google is at the head of the pack.

In 2007, Google used transit providers for the majority of their Internet traffic (including Level(3)). But over the last three years, Google both built out their global data center and content distribution capability as well as aggressively pursued direct interconnection with most consumer networks.

The graph below shows an estimate of the average percentage of Google traffic per month using direct interconnection (i.e. not using a transit provider). As before, this estimate is based on anonymous statistics from 110 providers. In 2007, Google required transit for the majority of their traffic. Today, most Google traffic (more than 60%) flows directly between Google and consumer networks.


google_peering

But even building out millions of square feet of global data center space, turning up hundreds of peering sessions and co-locating at more than 60 public exchanges is not the end of the story.

Over the last year, Google deployed large numbers of Google Global Cache (GGC) servers within consumer networks around the world. Anecdotal discussions with providers, suggests more than half of all large consumer networks in North America and Europe now have a rack or more of GGC servers.

So, after billions of dollars of data center construction, acquisitions, and creation of a global backbone to deliver content to consumer networks, what’s next for Google?

Well, I’m hoping for delivery of content directly to the consumer via a nice, fat 1 Gbps FTTH pipe.

Google, please choose Ann Arbor.