As every software company knows, over time as code ages and workarounds build on work-arounds, the code base becomes bloated. It becomes ever more difficult to get around the technical debt that you’ve built up over time. It’s really impossible to avoid this phenomenon, but at some point, companies realize that the debt is so great that it’s limiting their ability to build new functionality. That’s precisely what Trulia faced in 2017 when it began a process of paying down that debt and modernizing its architecture.
Trulia is a real estate site founded way back in 2005, an eternity ago in terms of technology. The company went public in 2012 and was acquired by Zillow in 2014 for $3.5 billion, but has continued to operate as an independent brand under the Zillow umbrella. It understood that a lot had changed technologically in the 12 years since its inception when engineering began thinking about this. The team knew it had a humongous, monolithic code base that was inhibiting the ability to update the site.
While they tried to pull out some of the newer functions as services, it didn’t really make the site any more nimble because these services always had to tie back into that monolithic central code base. The development team knew if it was to escape this coding trap, it would take a complete overhaul.
Brainstorming broad change
As you would expect, a process like this doesn’t happen overnight, taking months to plan and implement. It all started back in 2017 when the company held what they called an “Innovation Week” with the entire engineering team. Groups of engineers came up with ideas about how to solve this problem, but the one that got the most attention was one called Project Islands, which involved breaking out the different pieces of the site as individual coding islands that could operate independently of one another.
It sounds simple, but in practice it involved breaking down the entire code base into services. They would use Next.js and React to rebuild the front end and GraphQL, an open source graph database technology to rebuild the back end.
Deep Varma, Trulia’s VP of engineering, pointed out that as a company founded in 2005, the site was built on PHP and MySQL, two popular development technologies from that time. Varma says that whenever his engineers made a change to any part of the site, they needed to do a complete system release. This caused a major bottleneck.
What they really needed to do was move to a completely modern microservices architecture that allowed engineering teams to work independently in a continuous delivery approach without breaking any other team’s code. That’s where the concept of islands came into play.
Islands in the stream
The islands were actually microservices. Each one could communicate to a set of central common services like authentication, A/B testing, the navigation bar, the footer — all of the pieces that every mini code base would need, while allowing the teams building these islands to work independently and not require a huge rebuild every time they added a new element or changed something.
The harsh reality of this kind of overhaul came into focus as the teams realized they had to be writing the new pieces while the old system was still in place and running. In a video the company made describing the effort, one engineer likened it to changing the engine of a 747 in the middle of a flight.
Varma says he didn’t try to do everything at once, as he needed to see if the islands approach would work in practice first. In November 2017, he pulled the first engineering team together, and by January it had built the app shell (the common services piece) and one microservice island. When the proof of concept succeeded, Varma knew they were in business.
Building out the archipelago
It’s one thing to build a single island, but it’s another matter to build a chain of them and that would be the next step. By last April, engineering had shown enough progress that they were able to present the entire idea to senior management and get the go-ahead to move forward with a more complex project.
First, it took some work with the Next.js development team to get the development framework to work the way they wanted. Varma said he brought in the Next.js team to work with his engineers. He said that they needed to figure out how to stitch the various islands together and resolve dependencies among the different services. The Next.js team actually changed its development roadmap for Trulia, speeding up delivery of these requirements, understanding that other companies would have similar issues.
By last July, the company released Neighborhoods, the first fully independent island functionality on the site. Recently, it moved off-market properties to islands. Off-market properties, as the name implies, are pages with information about properties that are no longer on the market. Varma says that these pages actually make up a significant portion of the company’s traffic.
While Varma would not say just how much of the site has been moved to islands at this point, he said the goal is to move the majority to the new platform in 2019. All of this shows that a complete overhaul of a complex site doesn’t happen overnight, but Trulia is taking steps to move off the original system it created in 2005 and move to a more modern and flexible architecture it has created with islands. It may not have paid down its technical debt in full in 2018, but it went a long way on laying the foundation to do so.
No comments:
Post a Comment