Introduction to Debugging with Logging Library
Have you ever encountered an exception in your source code that caused it to halt immediately? If you've been coding for a while, the answer to this question is almost certainly yes. What about those moments when the code execution behavior isn't what you expected? Of course, it's happened to you. As soon as your code exceeds a certain number of lines, it becomes harder to ensure that no errors will occur in the future. Even if you're that 10x programmer that all HR departments talk about, after some time of coding, your focus isn't as sharp as when you started; you're on your way to making a mistake. For each function that you write, you can always create some tests to ensure that the execution is correct. However, this is not a guarantee that there are no bugs. Quite the opposite, if you have good test coverage and there's a bug in your code, it will show you that. Currently, you know it's time to find a bug.
Let's talk about debugging
The debugging process might seem like an easy and low-effort job at first glance, but the more complex the algorithms you implement or the logic of your program in general, the more you might want to just give up and watch some TV series, or maybe even change your current job. Burnout is no joke, after all. One of the reasons that can cause it is that you can't do what seems like a simple task. And why is that? Because you have bugs. You don't have a single lead to the cause of that bug. Everything seems hopeless. Fear not! This little article will show you some techniques and standard practices for dealing with them. I'll demonstrate some of them with our old friend — Python.
What exactly is debugging?
Debugging is the process of identifying, analyzing, and removing errors. The list of these errors is not short. Syntax errors: you're trying something new, something interesting to explore and take advantage of. You went too far and boom, a syntax error. Are you new to programming? You forgot to close parentheses, missed a semicolon, or just a colon. Depending on the language, all of these can lead to a syntax error. Runtime errors: the code is doing its job, but the implementation of your code is not idiot-proof. You're one click away from a crash. Logical errors: Did you forget to check if your division works properly when dividing a number by 0? Or maybe you have an off-by-one error. All of these and many other errors (like human errors, etc.) may not force your program to crash, but instead, they will show something entirely wrong. It's easy to miss logical errors because they might never occur in the coming years.
Importance of debugging
Without debugging, it will lead to poor performance, system crashes, and simply wrong results. Debugging allows you to step through your code, identify the path that the bug took, and fix it. What else can debugging do for you? It's no surprise that when you fix your bugs, the quality of your code increases as well. And after facing the challenge of fixing a bug, you might acquire a better technique or two for dealing with such types of tasks. Maybe you even developed a better and more thoughtful approach to your function design.
Don't forget about the learning aspect of that. When you debug, you're forced to better understand the program and the language. So, debugging has positive side effects as well. While debugging one thing, you might notice and future-proof some other aspects of your program that you didn't think about before. A simple example is that you used a similar approach in another part of your code, but now you know that it leads to some errors, so you can fix it right away.
Optimization is one of the overlooked side effects of debugging. You can identify bottlenecks in the program or maybe some loose code that may not work under certain circumstances.
Let's step away from theory and take a look at Python's 'logging' library, which will assist us with its functionality and accompany us on our debugging journey.
The 'logging' library
The 'logging' library is a comprehensive tool that was proposed by a fellow developer and later added to the standard Python library. Here's the corresponding PEP for it: https://peps.python.org/pep-0282/. As the author described, “Basically, the system involves the user creating one or more logger objects on which methods are called to log debugging notes, general information, warnings, error messages, etc.”. This quickly became a useful tool, and now most libraries use the 'logging' library or something else that's also built upon this one. One of the cool side effects is that you can use debugging messages from the third-party package you're using, as well as enhance/manipulate these by writing just a few lines of code.
As stated in the previous message, it's a simple library to start working with. Hence, it's suitable for both beginners and enterprise-level teams. In short, this is an adaptable event-logging library. It has hierarchical logging and lots of configuration options, making it expandable to any reasonable level you want. It has respectable and well-known standard five logging levels. The library also has additional subclasses such as Loggers, Handlers, Filters, and Formatters. All of them will make your solution for recording all kinds of events as flexible and manageable as you want them to be.
As I mentioned before, because of the flexibility and well-designed structure of the library, your logging can easily collaborate with the logging events from any third-party library. But what does hierarchical logging do exactly? As the official docs stated: logged messages to the module-level logger get forwarded to handlers of loggers in higher-level modules, all the way up to the highest-level logger known as the root logger; this approach is known as hierarchical logging. So no logs will be left behind. Even if it seems confusing, and it has lots of configuration options, the basic config will suffice at the beginner level. So, there's no need for hand-picking your configuration to start working. Run it with the basic config and see the result.
Here's a simple and idiomatic usage, we will go into some details a bit later:
You run it and after the execution, the file 'myapp.log' will contain the following three lines:
The idiomatic portion is that by providing __name__ to the getLogger() function, it creates a module-level logger.
What is next?
Let's briefly discuss the upcoming sections and what this article is all about (besides the obvious fact that it's about debugging and the 'logging' library). For starters, we have already discussed the concept of debugging and its importance in software development. We'll look at some best debugging practices, but before that, we will take a quick detour of some challenges that you might face during the debugging process. Next, can you answer if writing cleaner and more readable code is a debugging technique? We will answer that. What tools do we have at our disposal? We'll also discuss how to use these tools effectively and the correct approach for debugging.
We'll go a bit deeper with the 'logging' library, exploring its loggers, handlers, filters, and formatters. Furthermore, we'll also take a look at some other examples of how you can utilize the library. By playing with different severity levels, we'll learn how to write correct and suitable events in the place that you want. We'll touch on some advanced topics, such as hierarchical logging, configuration, and thread safety, and briefly touch on integration with the warning module. All of these will assist you and me in creating a more robust and flexible logging system.
By the end, you should have a good understanding of debugging, how to use Python's logging library, and how to implement effective debugging practices. As no article is complete and exhaustive, I'll provide some further resources from which you can continue your journey. Don't be afraid if you're a beginner; everything here is explained in the simplest way possible. So, suit up and get ready to expand your debugging skills and broaden your knowledge.
Challenges in Debugging
Debugging, in general, is a complex and challenging process. Let's consider an expression: a / b. Suppose we implemented this as our division function for a calculator. You run it, try some numbers, and everything goes smoothly. You're happy. But then, you lend your calculator to a friend, and they report that it crashes when they input certain numbers, seemingly at random. Now, you're faced with the task of finding this error.
The first challenge is reproducing the error, especially when the information comes from other people. Fortunately, you've written a random generator that tests multiple numbers, and it helps you find the ZeroDivisionError. But what if your system is large and complex, with 101 modules and 37 million lines of code? You might start by isolating the problem. Are you using a Version Control System (VCS)? Have you heard about git bisect? Give it a try! However, at some point in debugging, you'll always resort to a logging system.
After some time, you fix the error, but it still occurs intermittently, under different conditions. Some might say these are purely random errors. These are known as non-deterministic errors. Debugging them is another challenging task. You can try normal testing, regression testing, and even consider hardware issues. Yet, the most straightforward strategy is to log everything you can at all levels. But let's save these horror stories for another time.
Understanding the code can also be a challenge. Poorly documented code with bad practices can replace your debugging time with time spent understanding how the code works. This answers our previous question: Is clear code a debugging technique? Yes. As counterintuitive as it may sound, by writing clean and well-documented code, you help everyone identify whether the error might occur here or somewhere else. You never know how the code will be extended in the future and if it's ready for the upcoming challenges. And guess what, documentation is a type of logging! So, you just can't do anything without logging, huh? Want to learn more about how to comment on Python code? Take a look at this article.
Side effects can also pose a problem. Some bugs may cause side effects that make you think something is causing a difficulty when it isn't. That's why you need to keep an eye on everything else at the moment of code execution. How? Logging!
Performance issues can be challenging to spot with regular and intrusive debugging. Your program may be in a 'freeze' state, or something might be running in the background that you can't notice. Let's skip parallelization, manipulation of I/O processes, and optimization of your algorithms. What do you need to do in such cases? Monitoring, profiling, benchmarking. All of these are special types of logging!
Debugging multithreaded and asynchronous code presents unique challenges of its own. Without a strong foundation in understanding concurrency concepts such as race conditions, deadlocks, and thread synchronization, you can't even start debugging it. However, it's always a good idea to look at the stack traces and thread dumps. Can you answer what it is? It's logging!
We've just scratched the surface of the challenges, and they already sound exhausting! Let's explore one more challenge that you might face at any moment — limited logging information. The information you have might not be enough, or it might not cover some specific areas at all. The answer is simple. More logging!
Let's explore some top practices for debugging.
As a coder, you have many tools at your disposal. If you're using Python, you might be working with an IDE like PyCharm. It's well-known that PyCharm comes with robust built-in debugging tools. You can set a conditional breakpoint at a certain line for the debugger execution to pause. You can monitor multiple variables in a watch panel, and it even shows the current values of variables in the code! Please note the simple code highlighting that can prevent bugs because it highlights the structure and makes it easy to navigate. All of this is conveniently inside of IDE.
If you're more of an old-school coder, there's another tool that might not be as commonly used but would be a great addition for you — the pdb debug library. It's an interactive built-in debugger in Python. However, it does have a steep learning curve, so prepare yourself before trying it.
Next, we have static analysis tools. These handy tools can pinpoint compilation errors and even some runtime errors. They help you follow best practices and suggest some good Python idioms. Some of these tools have a vulnerability database and can alert you to security issues right at the coding stage.
Do not forget about profilers. These are used to create performance records. If you're looking to find a bottleneck or just eager to know the fastest and slowest functions in your code, Python profilers will come to the rescue.
And of course — logging! The logging library is your best friend for tracking the execution of the code. It's there, it's free, and it's a necessary tool at every stage of software development. But, is it better than simple print statements? Aren't they also a logging technique? Let's find out in the next section.
print() — logging comparison.
Let's talk about simple printing as a comparison to logging. We all know the advantages of printing. It's fast, it's simple. You can write whatever message you want. In one second, the job is done. Occasionally, that's all you need. There's no need to set up an entire logging system. A simple print statement can perform a basic sanity check. A message like “Am I even reaching this point?” can provide immediate insight, allowing you to narrow down the debugging area. However, with this simplicity comes a significant disadvantage: the lack of context. Where did this message come from? Is it a sanity check, just a normal flow of your program, or something that requires more attention? All these questions beg for an additional context, like 'INFO', 'WARNING', or maybe a timestamp. At the end of the day, you might just end up with your logging system. So, why not embrace the built-in library and finally fix all those bugs?
As it's already clear from the introduction, the logging system is built on the shoulders of previous coders. They all went through tougher times of debugging code and didn't have as many tools as we do now. So, rest assured that all your basic debugging needs are already covered. Different types of log messages? Done. Timestamps and other diagnostic information? Done. A simple way to save to a specific file instead of spamming the debug console? Done.
In conclusion, even though a print statement is good for a simple sanity check and some quick-and-dirty debugging in small programs, logging provides plenty of powerful tools for larger applications. As you might have guessed, using a logger is considered to be the best practice.
Alright, I believe it's time to discuss logging levels, what they represent, and how we apply them. Let's not reinvent the wheel and instead expand our knowledge from the official Python documentation.
Logging levels
Python has five standard levels and one additional level (NOTSET).
- NOTSET — When this is set, either all information is logged or the level is inherited from the ancestor. If you set it to a handler, all messages are handled.
- DEBUG — This level includes all details that can be gathered from other logging levels. As the name suggests, it's useful for debugging, diagnostics, and troubleshooting. Do you want a sanity check with confirmation that everything is running correctly? Set it to the debug level. Is something wrong? Set it to the debug level!
- INFO — Set this for casual events. Something happened, something stopped happening, something is connected to something else. Something switched from one state to another. Was the connection successful? This level is used for tracking a natural application flow. Don't include too much information in these logs.
- WARNING — Something unexpected happened. This could lead to some problems in the future. It may not require immediate attention because all systems are working just fine, but the information is still valuable for future investigation.
- ERROR — You guessed it, something isn't working as intended. This needs immediate attention. It could be a busy server, or your payment system isn't accepting payments (arguably a critical error in business, but not in software).
- CRITICAL -- This is the highest level of severity. It might be the last logging message you see because everything has stopped working. All major types of error are reported under this level.
As you can see, the simple structure for debugging messages allows you to easily control and navigate logs. Are you interested in warnings and above? Just change the setting to WARNING and all WARNING messages, including ERROR and CRITICAL, will be displayed (goes hierarchically from smaller number to higher). Want to see everything? Set it to the DEBUG level. This provides a flexible way of handling event messages.
After familiarizing ourselves with the logging levels, it's time to consider what kind of information we should include to make our logs as effective and informative as possible.
Tuning up our logger
To start, the severity level is a fundamental building block. It's impossible to ignore, as these are designed for specific situations and aid in filtering the events for your application. Fortunately, we've already had a brief introduction to these levels, so we can apply them confidently.
Another crucial component, which is included by default and requires no additional thought, is the module name. But what other information is relevant for debugging and plays a vital role? Undoubtedly, it's timestamps. The date and time of the event's occurrence can help you sort events from newest to oldest, or focus on a specific period.
Every millisecond matters, so it's recommended to include them. For example, a timestamp might look like this: 2024-01-01 00:00:00.237
.
Next, consider the log message itself. It should be clear, concise, and descriptive, providing enough information to understand the event without being overly verbose. Remember that content matters too. Don't hesitate to add information such as the IP address, User ID, User Group, or anything else that you think will help identify and fix bugs.
Do you think the precise location of the error matters? Include it! The relevance of certain information may depend on the situation, but understanding the current circumstances can guide you in deciding what information to provide. For instance, if an exception occurs, and you would rather not throw it and halt the program, it's logical to delegate it to the logger method. If you're inside an authentication module, provide some current information about the database connectivity and ongoing transactions.
Remember, don't overcomplicate your messages or make them too verbose. Be straight to the point!
To simplify things, the logging module provides another class — Formatter. You can choose a format that suits your needs. For example, '%(asctime)s - %(levelname)s - %(message)s' will log the time in a human-readable format, the severity of the message, and the contents of the message. This way, you can easily track and manage your application's events.
Let's look at the simple example:
This process creates a simple logger and adds two handlers to it: one for the debug console and another for writing to a file. We set the level for both handlers, add some formatters to them (including date format), and then add these handlers to the logger. Finally, we log our messages. This setup ensures that our logs are properly formatted and directed to the appropriate output channels.
I previously mentioned the concept of catching exceptions with our logger. Fortunately, we have a special method for this purpose: logger.exception. This method is highly recommended for handling exceptions, and its severity level is set to ERROR.
Let's first examine an example without logging:
This approach is straightforward, but if we have a logger that records our messages to a file, this exception won't be recorded.
Now, let's consider an example of logging:
In this case, instead of using the print function, we utilize the logger.exception function. At first glance, it might seem similar, but there's a significant difference: in addition to logging the “Exception: division by zero” message, we also log the entire exception traceback. Isn't that exciting?
Remember, when dealing with exceptions, we must decide whether to suppress them with custom try/except handling or allow our program to crash. In this instance, we appropriately catch our exception and use the handy logger.exception method, which allows us to record the traceback. This traceback is crucial for identifying and fixing bugs. In this case, the log message is as clear as a bright day's sky:
This detailed log message tells us about the exception and provides the exact location of the error in our code, making it easier to debug and fix. Remember, effective logging is an essential part of developing robust and maintainable software tool.
Alright, at this point, we've already learned a thing or two about debugging. We've covered some techniques, discussed the challenges along the way, and even explored some tools that can be used. We've introduced the concept of a logging system and even touched upon some examples. Now, it's time to dig a bit deeper into the library itself and discuss its different components.
logging components
So, for starters, when you write import logging, you're importing the logging module! To create a simple logger, you just need to write logger = logging.getLogger(__name__).
But what exactly is this logger object? The logger object is the foundation of everything. It's responsible for creating log records. Its wide range of functionalities makes it robust, easy to configure, and ultimately, a powerful tool to debug your code. Once you have a logger in your code, you can go ahead and set up the logging level (which we've talked about before). The logger follows a hierarchical naming structure (which we've also discussed). This structure allows you to organize your loggers in a meaningful way. The logger also has a name (known as a channel name) that is a dot-separated hierarchical name, such as 'a', 'a.b', 'a.b.c', or similar. As you already know, it's also possible to further configure the logger by setting a file handler, a logging level, a formatter, or creating a filter.
Logging Levels
We've already talked about this, but it's worth mentioning that these levels are nothing more than integer values. For example, 50 corresponds to the CRITICAL level, and 0 to NOTSET. Set the smallest to catch all events, and set the biggest to catch the most crucial ones.
Handler Objects
These are responsible for the final destination of our event records.
- StreamHandler - standard output. The console or any file-like object.
- FileHandler - similar to a StreamHandler, sends event logs to a file, for example, the file 'app.log'.
- SocketHandler/DatagramHandler - sends your messages via a TCP/UDP socket.
- SMTPHandler - sends your log messages straight to your boss via email.
- MemoryHandler - buffers your log messages until some condition is met and then sends them all.
- QueueHandler - sends log messages to a queue, allowing for multithreaded logging.
Not enough? You can always create a custom handler by subclassing the Handler class and implementing the emit method.
Quick quiz, what do you think the NullHandler does? How about HTTPHandler?
Formatter Objects
These are responsible for converting a LogRecord to an output string to be interpreted by a human or an external system.
Filter Objects
Filters are used by Handlers and Loggers for more sophisticated filtering than is provided by levels.
Here is an example of a filter that will log only WARNING messages:
LogRecord Objects
LogRecord instances are created automatically by the Logger every time something is logged. They can also be created manually via makeLogRecord().
LoggerAdapter Objects
LoggerAdapter instances are used to conveniently pass contextual information into logging calls.
Thread safety
One important question may arise: Is logging a thread-safe package? The answer is yes. You can handle calls from multiple threads without corrupting messages and avoiding such tricky things as race conditions. Do you remember we talked about this at the very beginning? Well, logging wins again. However, keep in mind that with the increased complexity of the application, you should still be aware of the limitations that some situations may cause.
Integration with the warnings module
Python has another useful module named warnings. The logging system has got you covered here as well. Just set logging.captureWarnings(True), and all warning messages will be redirected to your logging system.
More examples
In the end, let's look at a couple more examples of the logging library. For the first example, we will examine pure Python code that implements the Collatz conjecture. We can create an info record that shows the current step of the Collatz sequence. Here's how:
In the end, you have records similar to these:
Next, let's consider an example with a Flask application:
As you can see, we can now log our messages when someone accesses our web pages. This is the log:
If you're already familiar with Flask, you know that it already has its own log messages. But thanks to the magic of the logging module and its excellent integration with the Flask library, we can enjoy both logging messages from the third-party library and our own. This is precisely what I was talking about at the very beginning.
At this point, I think you've got the hang of it. It's time to wrap up.
Wrap-up
In this article, we learned about debugging, explored best practices, and opened ourselves to the challenges we might face in our programming careers while debugging the code of our colleagues. We also introduced a logging library that is a great companion and the best replacement for the print function. We dug a little deeper into our knowledge of the logging library. In the end, we looked at and embraced two fully working and completely legitimate examples where we applied our logging library knowledge.
If you still feel lost, it's a good idea to read the official documentation. There are some more things you can do, such as creating a completely flexible custom configuration for your debugging system that you can save in a file and load whenever you need it. Are you up for a challenge to do it yourself? Then let's go. This is homework for you. Good luck!
like this