Tuesday, January 17, 2017

Monitoring Node.js Applications

This article is the 13th part of the tutorial series called Node Hero - in these chapters, you can learn how to get started with Node.js and deliver software products using it.
In the last article of the series, I’m going to show you how to do Node.js monitoring and how to find advanced issues in production environments.

The Importance of Node.js Monitoring

Getting insights into production systems is critical when you are building Node.js applications! You have an obligation to constantly detect bottlenecks and figure out what slows your product down.
An even greater issue is to handle and preempt downtimes. You must be notified as soon as they happen, preferably before your customers start to complain. Based on these needs, proper monitoring should give you at least the following features and insights into your application's behavior:
  • Profiling on a code level: You have to understand how much time does it take to run each function in a production environment, not just locally.
  • Monitoring network connections: If you are building a microservices architecture, you have to monitor network connections and lower delays in the communication between your services.
  • Performance dashboard: Knowing and constantly seeing the most important performance metrics of your application is essential to have a fast, stable production system.
  • Real-time alerting: For obvious reasons, if anything goes down, you need to get notified immediately. This means that you need tools that can integrate with Pagerduty or Opsgenie - so your DevOps team won’t miss anything important.

Server Monitoring versus Application Monitoring

One concept developers usually apt to confuse is monitoring servers and monitoring the applications themselves. As we tend to do a lot of virtualization, these concepts should be treated separately, as a single server can host dozens of applications.
Let’s go trough the major differences!

Server Monitoring

Server monitoring is responsible for the host machine. It should be able to help you answer the following questions:
  • Does my server have enough disk space?
  • Does it have enough CPU time?
  • Does my server have enough memory?
  • Can it reach the network?
For server monitoring, you can use tools like zabbix.

Application Monitoring

Application monitoring, on the other hand, is responsible for the health of a given application instance. It should let you know the answers to the following questions:
  • Can an instance reach the database?
  • How much request does it handle?
  • What are the response times for the individual instances?
  • Can my application serve requests? Is it up?
For application monitoring, I recommend using our tool called Trace. What else? :)
We developed it to be an easy to use and efficient tool that you can use to monitor and debug applications from the moment you start building them, up to the point when you have a huge production app with hundreds of services.

How to Use Trace for Node.js Monitoring

To get started with Trace, head over to https://trace.risingstack.com and create your free account!
Once you registered, follow these steps to add Trace to your Node.js applications. It only takes up a minute - and these are the steps you should perform:
Start Node.js monitoring with these steps
Easy, right? If everything went well, you should see that the service you connected has just started sending data to Trace:
Reporting service in Trace for Node.js Monitoring

#1: Measure your performance

As the first step of monitoring your Node.js application, I recommend to head over to the metrics page and check out the performance of your services.
Basic Node.js performance metrics
  • You can use the response time panel to check out median and 95th percentile response data. It helps you to figure out when and why your application slows down and how it affects your users.
  • The throughput graph shows request per minutes (rpm) for status code categories (200-299 // 300-399 // 400-499 // >500 ). This way you can easily separate healthy and problematic HTTP requests within your application.
  • The memory usage graph shows how much memory your process uses. It’s quite useful for recognizing memory leaks and preempting crashes.
Advanced Node.js Monitoring Metrics
If you’d like to see special Node.js metrics, check out the garbage collection and event loop graphs. Those can help you to hunt down memory leaks. Read our metrics documentation.

#2: Set up alerts

As I mentioned earlier, you need a proper alerting system in action for your production application.
Go the alerting page of Trace and click on Create a new alert.
  • The most important thing to do here is to set up downtime and memory alerts. Trace will notify you on email / Slack / Pagerduty / Opsgenie, and you can use Webhooks as well.
  • I recommend setting up the alert we call Error rate by status code to know about HTTP requests with 4XX or 5XX status codes. These are errors you should definitely care about.
  • It can also be useful to create an alert for Response time - and get notified when your app starts to slow down.

#3: Investigate memory heapdumps

Go to the Profiler page and request a new memory heapdump, wait 5 minutes and request another. Download them and open them on Chrome DevTool’s Profiles page. Select the second one (the most recent one), and click Comparison.
chrome heap snapshot for finding a node.js memory leak
With this view, you can easily find memory leaks in your application. In a previous article I’ve written about this process in a detailed way, you can read it here: Hunting a Ghost - Finding a Memory Leak in Node.js

#4: CPU profiling

Profiling on the code level is essential to understand how much time does your function take to run in the actual production environment. Luckily, Trace has this area covered too.
All you have to do is to head over to the CPU Profiles tab on the Profiling page. Here you can request and download a profile which you can load into the Chrome DevTool as well.
CPU profiling in Trace
Once you loaded it, you'll be able to see the 10 second timeframe of your application and see all of your functions with times and URL's as well.
With this data, you'll be able to figure out what slows down your application and deal with it!

No comments:

Post a Comment