• Rahul Lahiri

Debugging with API observability

A few days ago I was trying out a new product capability of our regression testing product using our demo app (which we call MovieInfo). I used the MovieInfo UI to go over some user actions and used our product to record the API traces. We do this routinely to create a test suite quickly during a demo. The application is instrumented with our Envoy listeners -- so we can capture the entire API trace from the API gateway for every user action.

The application felt sluggish. Some of the UI actions were taking exceptionally long – not something I could use to do a demo during a sales call. I got one of our engineers to look over a screenshare, and see if the problem could be debugged quickly. We decided to look at the API traces in API Studio for the captured requests. The API Studio includes an API catalog where users can view every API request along with all the egress requests from the service – which is built leveraging our observability feature. In addition, the entire trace for a request can also be viewed for regression tests.

When the engineer looked in the API catalog, something unusual immediately jumped out. One of the API requests was making a very large number of requests to a service which is essentially a very lightweight service built on top of a database. This is what the API trace looked like in the API catalog:

A bug had crept in that was making many calls to the database for each minfo/genre-groups API request. The API request was succeeding – the return status was a 200, and the data returned was correct – but it was taking forever to run. Once identified, the issue was easy to fix.

In this particular case, we did not even have to dive into the details of the request to see the problem. Just the behavior was enough to detect the problem. This visibility, provided by the API observability we have built into our product, helped us debug the problem extremely efficiently.

How does Mesh Dynamics API observability work?

There are two ways Mesh Dynamics can capture this information:

  • The basic API egress observability is offered by our API Studio simply by forwarding the egress requests from the service being developed through the API Studio. We could have found this problem if we had setup the minfo service locally, and run this request.

  • If the application is configured with Mesh Dynamics listeners, then we can capture this information from a test or staging setup. In fact, we can capture the entire trace from the gateway service. This is how our demo app is setup, and we could get the full API trace observability.

The advantage of the first option is that it requires no listener setup. Every developer can get full observability into their service. During API development, this is adequate, since every developer only needs to identify issues with their service. However, when the problem surfaces in the deployed application, and it is not known where the problem is, it will require some effort to track down the source of the problem using this approach.

With the full application observability, we assemble the full trace for every request. All the data is available for debugging with no additional effort. It’s just a matter of traversing the service graph to identify the source of the problem. There is no need to recreate the problem, and then manually trace the source of the problem in a test setup. The answer is there in the data already captured. It’s just a matter of reviewing the data systematically to identify the root cause.

In this particular case, we did not even have to get down to reviewing the details of the suspect API requests – just the egress behavior showed us the problem. We spent just a couple of minutes to identify the problem instead of hours trying to narrow down the cause of the problem.

Are you running a microservices application? Have you run into similar problems and spent significant amounts of time trying to root cause the problem? Try using the Mesh Dynamics API Studio, and experience the improvements in microservices development productivity.

Please contact us at solutions@meshdynamics.io to learn more about integrated contract and service testing with Mesh Dynamics.