Building in logging into an application can sound easy, but once you reach a certain scale it will all become a bit much. Diving through millions of lines in order to understand what is going wrong is pretty painful. To help you with this, I wrote some of my ideas around this issue that will make you want to look at your logs again.
If you have more that one application logging to a centralized location like Datadog, Loggly, Logstash or any other centralized solution, a decent strategy becomes a lot more important.
Make sure whatever you put in the log message has a meaning to someone reading it. Seeing errors like Something went wrong, please try again
may be something you want to show your users but it will not help you if you encounter it in your log stream.
Instead explain in short what for example did not succeed. If let’s say you are calling an user service to fetch a users information, your message could show something like this: Failed to fetch user information from user-service (uid:123)
.
If you have a case where you are logging an error and the case is known, it will not hurt to provide some sort of reference to some documentation explaining the case.
Having a good strategy around how and what to log is important. Make sure you understand how to use log levels and make sure all your applications respect these levels.
Although log levels are something that can be interpreted any way you like. A common error is to log in a level that is higher than needed. Most likely, for your specific application a specific error is not a good thing, but make sure it’s actionable before logging it as an ERROR.
Here are some of my log level interpretations/guidelines.
With the log levels in mind, you most likely want to run your applications in INFO mode most of the time. If your need to investigate something more deeply, enabling DEBUG for a short while should be fine.
With an increase in traffic comes an increase in logs. This however increases the cost, both in storage price and in reading time. The more logs you have, the harder is becomes to find what you are looking for.
When developing a specific part of your application, keep in mind that your log entries may not cleanly follow each other. If for example two user registrations take place at the same time, these logs will be mixed within each other. Because of this it’s useful to add some sort of context to the message. It may not be clear in all cases what context to provide, in some cases you don’t even have a meaningful identifier. This can be solved by adding a request id
and/or correlation id
to every set of logs.
A request id is a unique identifier that is used for this http request only, any log generated in this request, will carry this information.
A correlation id is unique to the full request and will be provided to other services so they can use it. This helps troubleshoot issues across different services.
Equally important is to make sure you are able to identify the source of your log message when it ends up in a centralized system. Adding the name of the service, host, container, region, … may all help in understanding why something weird happened. Note that most centralized logging systems may already take care of this for you.
comments powered by Disqus