FYI: Recent Internet "Brownout"

  • To: apops at apnic dot net
  • Subject: FYI: Recent Internet "Brownout"
  • From: "David R. Conrad" <davidc at apnic dot net>
  • Date: Mon, 28 Apr 1997 14:27:09 +0900
  • Sender: owner-apops@apnic.net
    • Hi,
      
      Below is an account of the recent AS 7007 brownout you might have
      heard about.
      
      Regards,
      -drc
      
      ------- Forwarded Message
      
      Date:    Sat, 26 Apr 1997 19:41:35 -0500
      From:    "Vincent J. Bono" <vbono@MAI.NET>
      To:      nanog at merit dot edu, inet-access at earth dot com
      cc:      noc@MAI.NET
      Subject: 7007 Explanation and Apology
      
      Dear All,
      
          I would like to sincerely apologize to everyone everwhere who 
      experienced problems yesterday due to the 7007 AS announcements.
      
          If anyone cares to know, here is what happened:
      
      
          At 11:30AM, EST, on 25 Apr 1997, our border router, stamped with 
      AS 7007, recieved a full routing view from a downstream ISP (well, a 
      view contacing 23,000 routes anyway).
      
          There was no distibute list imposed on the downstream since they 
      also advertise their customer AS's to us (they were also 
      experimenting with sending some routes out through us and some out 
      through the MAE).  We did filter out routes from them containing any 
      of our AS numbers but since they got the view from someone at 
      MAE-East none of our internal AS numbers showed up at all.  Not 
      having a filter imposed on the inbound side was our error.
      
          In an as yet unexplained twist of bits, the 7007 router then 
      began to de-aggregate the 23K route view *and* strip the AS path out 
      of it.  I will emphasize that we were running no IGP at the time.  
      Not one.  Not OSPF, not RIP, nothing.  
      
          Our MAE-East border router, AS 6082, then got a feed of these 
      routes, at last count 73,000+, which set off our network monitor 
      system which wacthes for, among other things, route views over 45k 
      lines in size.  At 11:45AM we disabled the BGP peering session with 
      AS 1790 that was in place with the 7007 router and immediately 
      contacted Sprint (contrary to popular belief that they called *us* 
      first to let us know about the problem).  As we were trying to 
      determine what had happened, we began getting calls from other ISPs 
      saying that we were announcing their routes with specificity as well 
      as best AS path.  That really alarmed us since we saw no 
      announcements still going out.  When these calls persisted, we 
      rebooted our 7007 router (that was at 12:00PM).  When the router came 
      back up, it did begin to announce a full view to AS 1790 again, but 
      this time as a normal BGP advertisement, i.e. with AS paths and 
      aggregated addresses.  We then imposed a distibute filter on our 
      downstream and toward 1790, which stopped the announcement and, we 
      thought, solved the problem.  
      
          Well, the phone *kept* ringing and we then started to see the 
      7007 paths coming into our other routers over the MAE's.  Okay, so 
      panic ensued, and we unlugged *everything* at 12:15PM almost to the 
      second.  Then, at 12:25 the Sprint NOC called us to say the they were 
      about to turn down the DS-3 connection to our 7007 router since they 
      were *still* seeing the routes.  We of course told them to go ahaead 
      (since the router had no power to it at this point we were *very* 
      confused).  
      
          It seems that even after we stopped announcing the demon-view at 
      12:00, 1790 kept propagating the routes.  We continued to field 
      calls until about 4:45PM yesterday from ISP's all over the world.
      
          According to our conversations with the Sprint NOC at 2:14PM 
      yesterday, they simply could not clear the 7007 routes from their BGP 
      tables, they "just keep appearing again" as one of the techs told us.
      
          It also seems that some large, switched-based backbone provider, 
      began distributing the routes to MSN one the west coast which 
      lingered until about 7:00PM EST.
      
      
          We had engineers from the router manufacturer in until about 
      1:00AM this moring crawling all over the equipment making sure that 
      we hadn't created an incorrect config set.  We also now impose full 
      distribute list filters on all peers.  
      
          All I can say in our defense is that I believe we did debug the 
      problem in the most expedient manner possible, and when it seemed 
      that even after disabling the BGP session, we were still endagering 
      other networks, we did completely disconnect ourselves from the Net.  
      
          We did *not* perform any of this maliciously, I'm not sure that I 
      could duplicate the event if I tried.  Anyone who called and got a 
      harsh voice on the phone, well, I sincerely apologize to them 
      individually, but some in particular should not have tried to 
      impersonate a company other than their own *and* should not have 
      started cursing out the NOC tech who answered.
      
          I would also like to take this time to thank AT&T WorldNet, NASA 
      Sciences Institute, and Net Access Corporation who called and did not 
      just ask for an explanation, but offered asisstance.
      
      Sincerely,
      MANAGEMENT ANALYSIS, INCORPORATED
      
      Vincent J. Bono
      Director Network Services
      
      
      
      
      
      
      
       
      
      
      
      
      
      
      
      
      
      
      ------- End of Forwarded Message
      
      _________________________________________________________________________
      To unsubscribe: send "unsubscribe" to apops-request at apnic dot net
      ------------------------------------------------------------------------