The Five Levels of Logging

(aib42.net)

104 points | by detay 13 days ago

19 comments

  • js8 11 days ago

    I increasingly feel that there should be two separate logs, one for the user with the Error/Warning/Info information (what the user wanted to be done and how well it went) and the other for the developer with the Debug/Trace information (capture key information about what the program was doing, especially on the interface - what files it was looking for, what database calls it did etc.).

    I always get complaints from users that there is too much useless stuff in the logs through which they need to go. I also think that the developer log should be structured differently, possibly to be easily read by some automation.

    • exabrial 11 days ago

      An "audit log" or "branch log", what business decisions were made, and a "debug log", what params drove said decision

      • franciscop 11 days ago

        This is a very common pattern on Node.js... look into the package debug, it's structured to work as a dev help and can be enabled/disabled per part. So you only enable it for the bit you are debugging and unrelated info gets filtered out.

      • testplzignore 11 days ago

        Info: What is the program up to. Occasional messages that indicate it is alive and doing what is expected.

        Warn: Problems that you know about and wish would not happen, but do anyways. A service call has failed, unexpected input that violates the contract of the method, etc. You may be handling these conditions gracefully and/or returning an error to the caller, but still want to know when they happen. Usually you should have a ticket open in your bug tracker for these.

        Error: Problems that you do not know about. When an error is logged, a developer should look at it. Some kind of action should be taken immediately. Put the fire out if it is an emergency. If not an emergency, create a ticket in your bug tracker, then change the level to warn.

        I keep warn and info separate so that I can easily grep/filter on one. I often want to just see info messages to know what the status of things are.

        • jandrese 11 days ago

          IMHO the difference between Warn and Error is that Warn is recoverable while Error is not.

          • hinkley 11 days ago

            In an enterprise setting, treating one level of logs as actionable, as GP mentions, is a very important for clarifying responsibilities.

            If it takes the right pair of eyes on the logs to see a problem for what it is, the whole process slows down and doesn't scale as high. All errors and very chatty warnings should induce me to ask someone what's going on, to make sure it hasn't slipped through the cracks. And they should thank me for pointing it out. Otherwise you reinforce some bad patterns.

            • paulddraper 9 days ago

              What does recoverable mean though?

              If the server process handles it without crashing, is that a recovery?

              A more general distinction IMO is the one presented in the article: an Error is something that should not have happened. A Warning is something that maybe should not have happened.

          • ath0 11 days ago

            I support these 5 levels of logging, but given the discussion below I also would like to highlight a lesson from an SRE talk I once attended[citation needed] that really there are only three levels of alert:

            1. DEBUG - Look at this when debugging, otherwise no one will ever see it. 2. TICKET - Open a ticket and have someone on your team look at this when it gets to the top of your queue. 3. PAGER - Drop what you're doing and handle this, now.

            Of course, you want to track INFO and ERROR type messages because a sufficient number of them might cause someone to raise a ticket... but at scale, you probably should've built that monitoring already, and just drop INFO/ERROR down to DEBUG.

            • pfranz 11 days ago

              It's nice to have articles like this to give structure to the conversation, but there's no single solution. Python has default levels of; debug, info, warning, error, and critical, but doesn't really give any guidance to what each means. At one company we tried to create some structure, but 3rd party libraries would flood messages at the "wrong" logging level.

              I like these 5-levels and the 3-levels you talk about seem like a paired down version that reenforces what each level means.

              To illustrate the "no single solution," we tried to use logging for command line output, neither of those systems really has a level for that. We probably should have created a new level between INFO and WARNING because of chatty libraries.

              In practice I've found the DEBUG and TRACE are usually best added to code when troubleshooting. When you're writing the code you don't have a good idea what info you'd need when troubleshooting, so it's overly verbose but still missing what you'd want. When trying to stash all logging either locally or centrally (because adding logging and re-running isn't always an option) it can affect performance.

              • jschwartzi 11 days ago

                I think a good log system should give you the ability to store everything but filter quickly. So for example I'll set up my systems to store everything down to LOG_DEBUG because I might want that information. But when I'm actually looking at the logs I like being able to exclude the DEBUG messages and look at the general behavior, then drill down when I think I understand what the sequence of events is. Good log systems should support this without making me filter as I'm storing the data.

                Also in UNIX systems I wish there was a way to generically pipe the output of a command to the system log while preserving the return code in case the command exits. In BASH you can configure the pipe behavior to propagate errors to the calling shell, but in normal shells the last return code is the one that gets stored, which is not usually what you want. It would be really cool if the logger utility captured return codes from upstream and returned them so that I can use it in the antiquated SystemV init scripts I have to maintain.

              • mixmastamyk 10 days ago

                That may work for an application, but there are many errors that occur while using a library that are expected/handled etc by the application. Perhaps separate logs can help.

                • yjftsjthsd-h 11 days ago

                  There are also messages which should not trigger a page themselves but will provide useful context when responding to one.

              • exabrial 11 days ago

                Talking about logging got me a rejection at Google. The very kind 22 year old interviewing me asked what my thoughts on commenting code was and I said most every time it's used as an excuse to show off a piece of overly clever code, so normally I discourage it in favor of logging, as they are code comments you can use in production. I realize now that was my first graybeard moment.

                • paulddraper 9 days ago

                  Without diving too far down the comment rabbit hole (complex but efficient algorithms, external references, legal retirements, backwards compatibilty), that's not true though.

                  A comment is attached to code. A log is attached to an event.

                  Conflating those will either give you very bad logs or very bad comments.

                • tzs 11 days ago

                  The article is a bit confusing, because the title says 5, the article gives 6 (Error, Warning, Info, Debug, Trace, and Fatal), but the article casts doubt on whether either of those is right when it says this:

                  > I would like to talk about the five (or six, or four) that I find are the most crucial, the fundamental set to base your own levels (and your thinking) on.

                  Which are the 2 from the 6 given whose status as levels of logging is in doubt?

                  • Jarwain 11 days ago

                    He orders it in terms of relevance too. Note that the last two levels he states are Trace and Fatal. He talks about how Trace may not always be enabled, or how Fatal doesn't necessarily make sense if such an error needs to be handled manually anyways.

                  • Dowwie 11 days ago

                    Debug: If you can read this, you are standing too close to the program.

                    • xtracto 11 days ago

                      This article reminded me of a writing that an ex-colleague of mine did at a previous company: https://nerds.kueski.com/better-logging/

                      Everybody developer knows that they should include logging in their systems, however how and what to log is not something so well documented.

                      When you start working in a system (i.e. at a startup) you put the odd log here and there, without too much attention. However, as your system(s) starts growing, you have to start paying attention at your logs' structure and semantics, so that they are more useful in ELK, Sumologic and similar log searching engines.

                      • pfranz 11 days ago

                        What's nice about something like Elasticsearch is if you categorize your logging levels you can more easily drop logs of a certain level at varying times. Debug/verbose/trace is a lot less useful after, say, 24hrs and takes up a lot of space while Fatal errors should be rare and probably want to be tracked year-over-year or between releases.

                      • blaisio 11 days ago

                        I really believe only two levels are necessary most of the time: error, and info. Error is for anything that a dev should be notified about (AKA if you send an error log message, assume someone will be notified about it immediately), and info is for any other useful debugging information, like "we got a request for /foo/bar at 10 pm". I also think structured logging is absolutely essential. Log levels are really just a way to get some of the benefits of structured logging.

                        • grkvlt 11 days ago

                          I was really hoping this might continue with 'six, if you work with IoT devices' or something like that, as in the Larry Miller[0] routine.[1] Never mind, I'll leave the links here anyway...

                          0. https://en.wikipedia.org/wiki/Larry_Miller_(comedian)

                          1. https://youtu.be/eceSHKqwUPY?t=18

                          • kripke 11 days ago

                            > Connecting to server:4242 as "user":"hunter2"

                            as an example of DEBUG logging is very wrong. Passwords should not end up in your logs, ever.

                          • jdmichal 11 days ago

                            I have always used them as follows. Deciding between DEBUG and INFO is the most difficult. INFO, WARN, and ERROR are simple decisions.

                            DEBUG: Information which is relevant for tracing activity and data through the program.

                            INFO: Important workflow decisions made within the normal operation of the program. Allows you to tell what path a particular operation flowed down, but not with tracing every activity.

                            WARN: Non-fatal errors which do not prevent the operation from completing. Example: Failing to close a file after having read from it.

                            ERROR: Fatal errors which prevent the operation from completing. Example: Failing to open the file or reading from it.

                            • delinka 11 days ago

                              What about FATAL errors which prevent the entire program from continuing?

                              • jdmichal 11 days ago

                                Error and fatal are the same thing. If an operation fails and cannot be recovered, that's an error. There is no meaningful difference between an operation failing and the entire program failing. Or, from another perspective, keeping the program running is just another operation that can have fatal or non-fatal errors.

                                • KayEss 11 days ago

                                  That depends on the nature of the program. We use error and critical for server software. If there is a protocol violation on a particular connection, log an error and drop the connection, but otherwise continues. If the `accept` system call fails that's critical and the program shuts down. Depending on what we're doing there may be other critical operations, but some software doesn't have any.

                                  • derefr 9 days ago

                                    Picture an Erlang web-application server.

                                    A request that causes a 500 error response to the client, should also produce an Error log output on the backend. The request crashed. But, because this is an actor-based system, that crash didn't affect anything other than the request. It was Fatal to that interaction, but not to the system.

                                    A request that causes the entire server process to spontaneously halt, should generate a Fatal log message. That is never supposed to be something that happens. The system is built, on a number of levels, to ensure that it doesn't happen. But it'd be really great, if it does happen, to have enough information to know why.

                                    Ideally, a Fatal log message is accompanied by a crash dump.

                                  • ryancouto 11 days ago

                                    i think OP is implying that error level and fatal level should be regarded as the same

                                • dionian 11 days ago

                                  trace: used for visualizing flow of program, extremely noisy and never on prod

                                  debug: low level flow of program, inputs/outputs

                                  info: what I want to see at runtime in prod, high level flow of program, details about operation so I know what's going on if I want to look

                                  warn: things that may indicate errors or other unexpected situations to be aware of that may be relevant to problems elsewhere

                                  error: something serious or unexpected that probably indicates a defect.. issue that needs to be looked into... unhandled/unexpected state

                                  • ohazi 11 days ago

                                    trace: printf debugging

                                  • sagichmal 11 days ago

                                    Counterpoint: https://dave.cheney.net/2015/11/05/lets-talk-about-logging

                                    tl;dr: Only Info and Debug levels carry their weight.

                                    • philwelch 11 days ago

                                      This works at a very, very small scale where you are reading your logs by hand. At this point, you don't even need log levels.

                                      In the real world, you're not going to read your logs unless you're trying to investigate an issue. Instead, you're going to monitor your logs. And this kind of monitoring is where WARN and ERROR become helpful.

                                      WARNs are useful because if you run into a WARN-level issue every now and again, it's no big deal, but if you run into a ton of them, that might indicate a problem that's worth fixing. A single WARN log isn't that useful, but a statistically significant number of WARN logs is a monitorable metric. Being able to detect and address issues after they're serious enough to cause elevated WARNs but before they're serious enough to cause customer impact is worthwhile.

                                      ERRORs are useful because then you have a one-size-fits-all way of monitoring for error conditions, if for example, it's an offline job that doesn't result in a 5xx response. You should still monitor for 5xx responses, pipeline job failures, non-zero exit codes from processes, or whatever your application-level "this didn't end well" signal is. But ERROR logging is also a good metric to have if you want to have multiple log-based metrics all in one place.

                                      • sagichmal 11 days ago

                                        If you're parsing your logs with a machine for triage like you describe, you should be doing structured logging, not string logging.

                                        • philwelch 11 days ago

                                          And the structure will include a log level to enable this kind of classification.

                                          • sagichmal 11 days ago

                                            Sure, but then we're not talking logger.Info: it's a difference in kind.

                                          • Rapzid 11 days ago

                                            We should all be doing structured logging always in 2018 IMHO. The future is now.

                                        • nicolaslem 11 days ago

                                          The author argues that the error level should not exist because according to him, an error can either be handled or bubble up and logging an error is in a way handling it. If an error is handled, it is not an error anymore.

                                          I think he is overthinking it a bit. Yes there should be very little places where an error is actually logged, yet is does not mean that it should not exist.

                                          In web applications you very often have one instance of:

                                              try:
                                                  return handle_request()
                                              except Exception:
                                                  logger.exception('Something bad happened')
                                                  return 500
                                          
                                          Sure you can argue that as the client received a 500 it is not an error anymore, but this doesn't help you notice and fix the problem.
                                          • sagichmal 11 days ago

                                            Any exception that bubbles all the way up to a request-scoped catch block indicates a bug in your handler logic. It should probably crash your program, or at least be reported as a structured event to your exception tracking system. In either case, logger.Info is sufficient to record this event in the application logs.

                                            • nicolaslem 11 days ago

                                              Why should exception tracking and logging be two separated things? They perform the same job, an exception tracking system is just another way to present log records.

                                              The real problem is that people still associate logs with lines of text in a file, whereas most logging frameworks create full objects containing a ton of information[0]. Once you stop considering logs as text, exception tracking and logging really become the same thing.

                                              [0] https://docs.python.org/3/library/logging.html#logrecord-att...

                                              • sagichmal 11 days ago

                                                Conceptually they are very similar, in the same way that metric data and log data are also very similar. But we don't have good technological solutions to support muxing all streams of observability data together and dealing with them efficiently, either in transport or in storage/query.

                                                Good program architecture in 2018 demands some degree of compromise. In my experience it's best to separate console logging (as in this article) from structured logging, from metrics, from distributed tracing, from exception tracking.

                                                • Rapzid 11 days ago

                                                  Exactly. As a subset of monitoring, it can probably be thought of as eventing; like all monitoring. What's a metric? A measurement event. Etc.

                                                  Perhaps a name change is in order; Event Tracing for Windows anyone? Eventing.. Event Tracing. IDK.

                                            • helper 11 days ago

                                              We use 3 log levels:

                                              - Debug: should not show up in normal logs but you can turn it on down to a specific log message in a running process without restarting. We use structured logging so we have the additional ability to turn on debug messages down to a specific key=value pair (or set of key=value pairs).

                                              - Info: An interesting event occurred that someone might care about. This could be an error that we were able to handle or just some important event.

                                              - Alert: An error case that should not happen has occurred and someone needs to look into it. In addition to logging the event, an external system should be notified (e.g. create a ticket, page someone, send an email, etc).

                                              • maxxxxx 11 days ago

                                                "- Debug: should not show up in normal logs but you can turn it on down to a specific log message in a running process without restarting. We use structured logging so we have the additional ability to turn on debug messages down to a specific key=value pair (or set of key=value pairs)."

                                                I always turn on the highest log level unless there is a very strong performance reason against doing so. I have been able to resolve a lot of issues over the years because I had good logging when something went wrong.

                                                I don't really see alerts as logging. They are an integral part of the application's behavior.

                                            • hota_mazi 11 days ago

                                              I've always preferred levels of logging rather than these arbitrary names (what's the difference between trace and debug, and which one is lower level?).

                                              Give me three levels of logging, 1, 2, and 3, and then warning, and error. Maybe allow to mix and match them, if this makes sense (the typical nomenclature makes them all mutually exclusive, which is dumb).

                                              There are your five levels.

                                              • delinka 11 days ago

                                                Depending on your Coding Culture, you order them based on "priority" (FATAL would have priority over all others, and therefore numbered as 1), or "importance" (FATAL would be most important, and therefore numbered 5.)

                                                Alternatively, each is a flag in a number so that they can be combined, with FATAL being 1 or 16 depending on priority vs. importance.

                                                I would argue that the names do indeed suffice to tell you their "level" - TRACE is the noisiest, least important, least priority level. DEBUG is next, followed by INFO, WARN, ERROR, FATAL.

                                                • zoomablemind 11 days ago

                                                  Curiously, in the days of OpenVMS this was called 'severity' level and was a global system concept [0] related to status code of execution.

                                                  Info, Warning, and Error are degrees of Success or Fatal/Severe.

                                                  The status would then be logged in a structured manner that shows a facility code, serverity, message.

                                                  0. https://h41379.www4.hpe.com/wizard/wiz_3582.html

                                              • dstick 11 days ago

                                                Thanks for this! Was just about to start implementing a logging subsystem myself so the timing is perfect :)

                                                • cwingrav 11 days ago

                                                  I'm going to weigh in on my opinionated library for logging: [woveon-logger](https://www.npmjs.com/package/woveon-logger) (js). Sorry if that's a no-no, but nobody seems to use my much loved library anyway.

                                                  The multiple levels are nice, especially for logging errors in production and tracking general information of your software. But in development, logging is often about tracking a particular aspect in your code. You need something that logs to a particular aspect that you are working on, and then can turn off that aspect once you move on. In the future, you often find yourself back to debug and need to flip that aspect back on again.

                                                  Summary: Log levels of info, warn and error all have a place. Logging to aspects is valuable during development, easy to turn off for production and useful when returning to debug.

                                                  • hinkley 11 days ago

                                                    Logging is one of the weirder cross cutting concerns, for many reasons but especially including the one you mention.

                                                    We had to roll back an upgrade last month because one nonfatal (but still pretty bad) problem was generating a ridiculous amount of log entries. We had no way to turn it off or throttle it, so we rolled back.

                                                    I have fantasized for years about a programming language with a small runtime that supports instrumentation of the code, not just for debugging and profiling, but for logging as well. Logging becomes a set of conditional breakpoints stored with or at least near the code and you can tweak them individually.

                                                    Something akin to applying the 80/20 rule to Aspect Oriented Programming.

                                                    We fake this by implementing logging frameworks that try to short circuit out cheaply. But it really only pays off if you can marshal the arguments to the logger very cheaply, and you still have problems with third party code, which can't know that you've got workarounds for problems they consider to be fatal, and are therefore so noisy at the WARN or INFO log levels that you can't see your own log entries (Apache Foundation, I'm looking at you).

                                                    • Rapzid 11 days ago
                                                      • hinkley 11 days ago

                                                        It's buried but my IDE has a similar feature.

                                                        I meant support for running systems, not local development.

                                                        • Rapzid 11 days ago

                                                          "Logpoints are especially useful for injecting logging while debugging production servers that cannot be paused or stopped."

                                                          There is a lot more going on with them than you might realize. Not sure how coupled it is to the nodejs/kubernetes/Azure stuff they have built into VSCode, but it seems like a working example of the future. Sorta like time travel debugging in node-chakracore.

                                                          Windows has this facility called EWT that lets you setup all sorts of eventing/tracing that is actually not present(the calls) in the code unless the EWT facility injects the code necessary to enable it into the running process. This allows you to dynamically enable and disable event sources based on stuff like.. whether or not something is currently subscribing to it. Worth looking into just for the sheer amazement of it if you are not a Windows developer and familiar with it.

                                                    • Karrot_Kream 11 days ago

                                                      Wouldn't this be the same as naming loggers? Many logging frameworks allow you to name a logger and just reference that. From there, you can enable or disable named loggers, and get this same "apect" that you talk about.

                                                      • 11 days ago
                                                        [deleted]
                                                      • stephengillie 11 days ago

                                                        Netscaler logging deserves its own level of hell. You're given 15 undocumented logging levels, with the suggestion not to leave it running longer than necessary, and that level 11 should be enough for anybody.

                                                        • nwmcsween 11 days ago

                                                          The issue with logging levels is they only convey basic information through the levels and the rest through the actual message which results in the mess that is log processing.

                                                          • yacse 11 days ago

                                                            Logging is very important because it also allows you to debug/resolve issues easily on production systems. But caring about the appropriate log levels and the ability to configure them at runtime is also super important. It's just insane and ridiculous to log everything on a trace/debug level and stream it to the Expensive Innovative Big Data Machine Learning Smart Log Analytics Product. Yes, I'm looking at you Application Insights.