A couple of years ago, Dropbox shocked a lot of people when it decided to mostly drop the public cloud, and built its own datacenters. More recently, Atlassian did the opposite, closing most of its datacenters and moving to the cloud. Companies make these choices for a variety of reasons. When Atlassian CTO Sri Viswanath came on board in 2016, he made the decision to move the company’s biggest applications to AWS.
In part, this is a story of technical debt — that’s the concept that over time your applications become encumbered by layers of crusty code, making it harder to update and ever harder to maintain. For Atlassian, which was founded in 2002, that bill came due in 2016 when Viswanath came to work for the company.
Atlassian already knew they needed to update the code to move into the future. One of the reasons they brought Viswanath on board was to lead that charge, but the thinking was already in place even before he got there. A small team was formed back in 2015 to work out the vision and the architecture for the new cloud-based approach, but they wanted to have their first CTO in place to carry it through to fruition.
Shifting to microservices
He put the plan into motion, giving it the internal code name Vertigo — maybe because the thought of moving most of their software stack to the public cloud made the engineering team dizzy to even consider. The goal of the project was to rearchitect the software, starting with their biggest products Jira and Confluence, in a such a way that it would lay the foundation for the company for the next decade — no pressure or anything.
They spent a good part of 2016 rewriting the software and getting it set up on AWS. They concentrated on turning their 15-year old code into microservices, which in the end resulted in a smaller code base. He said the technical debt issues were very real, but they had to be careful not to reinvent the wheel, just change what needed to be changed whenever possible.
“The code base was pretty large and we had to go in and do two things. We wanted to build it for multi-tenant architecture and we wanted to create microservices,” he said. “If there was a service that could be pulled out and made self-contained we did that, but we also created new services as part of the process.”
Migrating customers on the fly
Last year was the migration year, and it was indeed a full year-long project to migrate every last customer over to the new system. It started in January and ended in December and involved moving tens of thousands of customers.
First of all, they automated whatever they could and they also were very deliberate in terms of the migration order, being conscious of migrations that might be more difficult. “We were thoughtful in what order to migrate. We didn’t want to do easiest first and hardest at the end. We didn’t want to do just the harder ones and not make progress. We had to blend [our approaches] to fix bugs and issues throughout the project,” he said.
Viswanath stated that the overarching goal was to move the customers without a major incident. “If you talk to anyone who does migration, that’s a big thing. Everyone has scars doing migrations. We were conscious to do this pretty carefully.” Surprisingly, although it wasn’t perfect, they did manage to complete the entire exercise without a major outage, a point of which the team is justifiably proud. That doesn’t mean that it was always smooth or easy.
“It sounds super easy: ‘we were thoughtful and we migrated,’ but there was warfare every day. When you migrate, you hit a wall and react. It was a daily thing for us throughout the year,” he explained. It took a total team effort involving engineering, product and support. That included having a customer support person involved in the daily scrum meetings so they could get a feel for any issues customers were having and fix them as quickly as possible.
What they gained
As in any cloud project, there are some general benefits to moving an application to the cloud around flexibility, agility and resource elasticity, but there was more than that when it came to this specific project.
First of all it has allowed faster deployment with multiple deployments at the same time, due in large part to the copious use of microservices. That means they can add new features much faster. During the migration year, they held off on new features for the most part because they wanted to keep things as static as possible for the shift over, but with the new system in place they can move much more quickly to add new features.
They get much better performance and if they hit a performance bottleneck, they can just add more resources because it’s the cloud. What’s more, they were able to have a local presence in the EU and that improves performance by having the applications closer to the end users located there.
Finally, they actually found the cloud to be a more economical option, something that not every company that moves to the cloud finds. By closing the datacenters and reducing the capital costs associated with buying hardware and hiring IT personnel to maintain it, they were able to reduce costs.
Managing the people parts
It was a long drawn out project, and as such, they really needed to think about the human aspect of it too. They would swap people in and out to make sure the engineers stayed fresh and didn’t burn out helping with the transition.
One thing that helped was the company culture in general, which Viswanath candidly describes as one with open communication and a general “no bullshit” policy. “We maintained open communication, even when things weren’t going well. People would raise their hand if they couldn’t keep up and we would get them help,” he said.
He admitted that there was some anxiety within the company and for him personally implementing a project of this scale, but they knew they needed to do it for the future of the organization. “There was definitely nervousness on what if this project doesn’t go well. It seemed the obvious right direction and we had to do it. The risk was what if we screwed up in execution and we didn’t realize benefits we set out to do.”
In the end, it was a lot of work, but it worked out just fine and they have the system in place for the future. “Now we are set up for the next 10 years,” he said.
No comments:
Post a Comment