Replacing the engine while driving: planning safety measures into the project

When a project is created for changing out an existing dependency with another one, there are many factors that should be thought of. If there is a high risk involved if something goes wrong, then more caution would need to be taken. If a lot of effort is involved to add the new dependency, then it may make sense to incrementally ship the new dependency alongside the old dependency. Lastly, depending on the complexity, will a lot of work be done up front to integrate the new dependency, or will it be hacked in then cleaned up after the launch?

These are some of the questions that can be asked to help guide the planning for a project which changes out one dependency for another. The same pattern can be used for similar types of work besides dependencies such as service calls, database queries, integrations, and components, to name a few.

Generally, a project like this can have the following steps:

  • Investigate
  • Refactor
  • Add the new dependency
  • Launch
  • Remove the old dependency
  • Refactor
  • Celebrate

Not all of these steps need to be performed in this order or at all. For example, the launch, add the new dependency, and remove the old dependency can all be completed at once if the risk and simplicity allows for it.

Refactoring may be redundant to mention given developers should be refactoring as they normally make changes to the codebase. I am being explicit about mentioning it here so that the team can use deliberate refactoring to their advantage to make the new dependency easy to add, as well as removing any unnecessary abstractions or code leftover from the old dependency.

A special shoutout goes to the celebrate step, since I certainly know that teams can be eager to move onto the next project and forget to appreciate the hard work put into achieving its success.

Now lets get into the different concerns that can apply to projects like this, and the practices that can help it succeed.

Concerns and Practices

The most important step every project should have, the investigation step informs how the rest of the project should work. During this step as much information should be gathered to give a confident enough plan for the rest of the project to proceed. Some of the actions taken during this step are to understand how the current system works, how the dependency to be replaced is integrated, how critical this dependency is to the businesses operations, how the new dependency should be integrated, and any cleanup and refactorings that should be made before, during, and after the new dependency is added and launched.

A big topic to explore is what the system would look like if the system was originally built using this new dependency. This mental model forces the envisioning of a clean integration with the new dependency, ignoring any sort of legacy code, and free of any constraints within the system. This aligns the team with the ultimate design of the system that uses this new dependency. The team should strive and fight for reaching this state at the end of the project since it can result in the cleanest and most maintainable code. If this is not part of the end goal for the project then the system may carry forward unnecessary code and bad abstractions that result in tech debt piling up for future developers.

Another big consideration is what will the system look like when it is launched? Will the new and old dependency both have to coexist in the system so that there can be a gradual transition? Or maybe the change is small enough that the old dependency can be removed and the new one added in one change and no user interruption will occur? If there does exist a period where the two dependencies need to coexist then how can this behaviour be implemented to fit in with the ultimate design discussed earlier? This may involve doing a lot of refactoring up front to get the system into a state that fits the ultimate design at the end of the project. There also exists the tradeoff that the new dependency can be hastily integrated into the system alongside the other dependency, then all of the hacks can be undone with a sufficient cleanup and refactoring after the launch. Both methods have different tradeoffs: with the up front refactoring, the new dependency is integrated the right way into the system, but may require a lot of refactoring of the surrounding and old dependency’s code. On the other hand, hacking the new dependency into the same pattern that the old dependency uses gets the project faster to launch, but can lead to bugs and integration troubles if the interfaces or behaviour are different. Regardless, at the end of the project, the system should look as if the old dependency never existed.

How much risk is involved with switching over to use the new dependency? If it is very risky then more precautions should be taken to reduce the risk. If there is very little risk then little to no precautions can be taken, and the project can move much faster. Some methods I have used to help with reducing risk are collecting metrics on the different types of requests and responses between the two dependencies. These metrics can be early warning systems to the new dependency behaving incorrectly. Being able to rollback to the old dependency via a deploy or a feature flag provides the flexibility to quickly switch back to the old dependency in case things go wrong. Dark launching the new dependency to production has been a practice I often encourage my team to use, since it allows for testing out the new code in the production environment without affecting real users. Lastly, beta testing with a percentage of users can also reduce the impact since if something goes wrong with the new dependency, only a fraction of the users are affected. Many of these practices are complimentary and can be used together to achieve the best mix of handling risk.

How much effort is involved to add the new dependency? Effort could mean the amount of changes to make to the code or the time involved. If there is a significant amount of effort then it absolutely makes sense to incrementally ship small changes to production. Even if the changes are not live code paths, at least the team can review each change, provide feedback, and course correct if needed. I have seen in small effort projects all work was deployed to production in one pull request. On large effort projects the work was split across many pull requests written and deployed over time by a team. This latter case enabled the team to dark launch the new dependency to production and had the added safety to switch back to the old dependency if needed.

Given the considerations and practices discussed throughout this article, it is best to validate that they will actually work when it comes time to execute. If a team is experienced with these projects and practices then the investigation period can be quicker, otherwise if the team is less experienced with these projects the investigation should be more substantial. Building a prototype can help build confidence in the assumptions made about running the project and guide the team going forward. A good prototype proves or disproves assumptions in as minimal time as possible. Once the team is confident in their plan, start the project, and do not be afraid to reevaluate the choices already made as the project goes on.