What we learned from porting our web service to a new tech stack

Soma Erdélyi
Emarsys Craftlab
Published in
8 min readJan 26, 2022

--

Imagine you are driving down a highway at full speed on a nice sunny day. Your favourite song is playing on the radio, traffic is nice. Now, out of a sudden, you start to replace your engine for a new one. Without stopping. Now comes the steering wheel. You are not slowing down. Tires. Seats. Windshield, the body.

Everything inside and outside of the car is replaced around you.

Then finish off the job with repainting the entire vehicle. You never stopped for a moment.

That’s what porting a fully running web service from the ground up to a new technology stack feels like, while it’s still serving customers in production. And this is our story how we did just that.

After retrospectively taking a looking at our journey, we, the developer team and owner of the ported service thought that sharing the key learnings and takeaways can help you be more prepared in case a similar task is ahead of you. We hope that by the end of this article you can put a better toolbox into the trunk of you car, should you ever need it on the highway.

Photo by Brian Snelson from Wikimedia

Before we dive any deeper…

We have to make it clear, that it’s rare to justify a change this big in the lifecycle of a software. Furthermore, just the opposite is what we see all around us: technologies we long thought have been forgotten are still around the corner. Complex systems in the hearts of banking and government infrastructure were built decades ago and they never fully moved to a new technology stack.

If it ain’t broke, don’t fix it, right?

Although this is right in a lot of cases, ours were different and before you jump right into rewriting some working service,

make sure there is a real need for it.

This real need can be many things, but trying out the latest tech just out of curiosity is definitely not one of them. Security concerns can justify the move from one solution to the other, unsupported systems do carry a real threat. Developers — more precisely the lack of them — can also legitimate rewriting a system. A stack or programming language that was once popular can now be impossible to recruit for within reasonable limits.

Business logic comes first

If you really wish to rebuild your car, start with the engine. If the new one is not working as smooth as you expected it, you are still not a long way into the build and haven’t spent much on new parts yet.

What this means for us is that we should

start with the core business logic.

Ask yourself the question: what is the core functionality of this service? Sending out formatted messages? Converting values in a database? Moving an entity through steps of a pipeline? Once you got the answer, start the porting with that. This usually doesn’t involve any UI, probably not even a web server.

Don’t bother with the fancy frontend or fine-tuned build system at first.

Implement what actually does the heavy lifting in your service, decoupled from the web servers and other frameworks, they will come later. This proves the main point, that the service will be able to function on the new environment.

Extensive testing builds trust

Your engine is replaced and it runs smoothly. Now it’s time for the steering wheel. After you replaced that, you want to be absolutely sure that the car turns left when you turn it left, right? You trusted your old car since you drove thousands of kilometers together and it always worked. But how do you build trust in the new components?

Photo by ThisIsEngineering from Pexels

You can get the first results back from the freshly created business logic through testing. There are many forms of testing, from automated unit tests to manual end-to-end tests. This is definitely a bigger topic that I can strongly recommend diving into before starting any porting.

One form of testing that can surely be efficient and beneficial in this case is known as test-driven development (TDD), a development process, where requirements are converted into test cases. The software is developed by fulfilling the tests, so every requirement is met if all the tests are green. This also helps to break the code down to small individual units. Even if you later steer away from TDD, using it at the core of the application is where you can benefit from it the most.

Later you can let the old application’s live data go through the new business logic hidden from the users. A technique known as dark launching allows you to observe if the new code performs correctly and how it handles real load. The results should go to a disk or database close to the current system, where it doesn’t disturb the existing functionality, but it can be used to compare the new results with the existing ones.

This continuous data-driven comparison makes any error rise quickly and the correctness of the new system can be monitored way before anything starts to depend on it.

Relevant, accurate, timely communication

Driving in traffic may be the greatest co-operative game we play. Can I expect the other drivers to turn? Are they braking? Did they give me the right of way? Communication is key to safety.

The same applies in software development and operations. It is crucial to communicate what you are working on and what can go wrong as a side effect.

One application going down is bad. When multiple teams are independently searching for an unknown root cause is a disaster. Especially when this could be prevented with clear, upfront communication.

It can be a Slack message or a simple e-mail sent to everyone who can be affected. The key is to do it at the right time and include every detail that can be useful. No super long message sent month ago warning about a potential outage among many other things is sufficient but a simple “Expect UI outages on the user management page, come to us first if you notice anything” can save a lot of debugging time and frustration. The point is that the communication should happen in the right time, sent to the right participants and include the right amount of details.

Prepare for the unexpected

Wow, what is this? A little rust? Or maybe more? Do I suddenly need to replace parts I didn’t intend originally? Wow, I don’t have the time or budget for this.

Photo by standret from Freepik

With a change this big, something unexpected will almost certainly come up. Even if things are going great, integrating the new service into the existing ecosystem has its unique challenges. Replacing a live service without losing data or duplicating it has its problems as well.

It’s safe to assume that somethings will always take up more time than expected,

so the estimations should have a reasonable safety margin added to them.

It’s also probable that a big switch won’t succeed for the first time. That’s why it’s important to create a roll back plan and have a script ready when the safe state needs to be restored before fixing the problem and moving forward.

Port iteratively and incrementally

So, you started to rebuild your car. When do you first try to start the engine? Once everything is finished? And what if something goes wrong and half of the car has to be disassembled again to reach the faulty parts?

Iterative and incremental development has been part of most software development processes for a long time now. What we have to learn from it in this case is to

don’t just look at the porting as a one-way straightforward waterfall process.

You are essentially building a fully featured new software and it also has to be broken down into smaller tasks. Also, not everything has to be perfect and fully featured for the first time you build it. Integrating the different components together takes up time, so does higher level testing.

When you develop iteratively, the first corners can be cut and code can be later refactored as the project grows and needs arise. It’s also easier to fine-tune decisions once the software is larger and you have more measurements to support your decisions.

Another interesting aspect of porting the service in an iterative and incremental fashion is that

you may find out that part of the old service are not even relevant or necessary any more at all.

This happened to us as well, we had to realise some pages and functionality was not used but we still had to support it in the old application. With not porting it to the new one we managed to reduce the overall scope while the business value persisted. And what if later the not ported parts become necessary? We’ll just come back and do it then.

Plan decommissioning

Great, you now have your shiny new car and it runs perfectly. But what do you do with the used spare parts? Just leave them in the trunk? Used motor oil and tires should only be disposed of at the proper location.

Photo by Laker from Pexels

It is part of the migration to take good care of the existing service. Create a migration plan, figure out how you switch the old one off. Leave some redirects in place? Maybe add monitoring to detect if traffic still finds its way there? Also, attached resources should be erased thoroughly.

An unused, outdated database can cause money spent unnecessarily and it can also be a security risk if it’s not updated, maybe not even monitored any more.

Also, chances are that customers’ personal data is stored in it, so take a good care of any attached resource as well.

Summary

In summary these are the things we learned:

  • Start testing the application’s business logic as soon as possible.
  • Build trust in the new system with comparisons, logging and monitoring.
  • Prepare for the unexpected.
  • Plan way ahead, notify other teams about your move.
  • Work in small tasks, only do what’s necessary.
  • Don’t be afraid to come back and solve problems later, it doesn’t need to be perfect at first.
  • Create a migration plan and don’t forget to turn off the old service.

Do you have any similar experience? What were your take-aways from the journey?

When porting an existing application you are probably switching to the latest technology stack which comes with its own challenges and possibilities. Speaking of challenges, sometimes porting is not an option and you just have to work with legacy code. In case you might be interested what I learned when I was working with a huge legacy codebase and what challenges that journey had, I can recommend my previous article: https://medium.com/p/ca689fc63f9e

--

--

MSc software engineer with 8+ years of experience, working with JavaScript. Technical team leader for 2+ years, leading engineers towards success practicing XP.