• Sara Michalowicz

Amazon Server Crash Caused by Human Error

Photo courtesy of VentureBeat

Amazon released a statement saying a mistyped command is what caused a widespread outage in its cloud computing service on Tuesday, February 28. This server failure disrupted websites across the Internet for hours.

Amazon Web Services (AWS) Simple Storage Service (S3) provides features ranging from file sharing to web feeds. Amazon said the S3 team was working on an issue in regards to the billing system and intended to take a small number of servers offline, but an incorrect input of a necessary command removed a much larger set of servers than planned.

Amazon did not elaborate on exactly what the “authorized S3 team member” had mistyped, but did mention it took nearly three hours to get a portion of the system back up and more than four hours before the S3 system was running back to normal.

The Wall Street Journal reported that the outage "cost companies in the S&P 500 index $150 million, according to Cyence Inc., a startup that specializes in estimating cyber-risks. Apica Inc., a website-monitoring company, said 54 of the internet's top 100 retailers saw website performance slow by 20% or more."

Per Synergy Research Group, AWS owns 40% of the cloud services market, which is responsible for the operability of large bands of popular websites. When AWS went down, it took a huge number of businesses, apps and publishers offline causing many sites to struggle with slow or reduced capacity during the outage.

Amazon is making numerous changes to its system to avoid a similar event from occurring in the future. They explained that “the tool used allowed too much capacity to be removed too quickly.”

In an online statement, Amazon expressed, “[We] want to apologize for the impact this event caused for our customers. While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further.”

@usf_encounter tweets: