Bug Busters
Unleash the code detective in you with Python unit testing
Happy Monday! It’s been quiet here at Data Bytes due to back-to-back travel, hackathons, big release cycles at work, and life in general. We’re back, and it’s time to make your week awesome! Today’s Data Bytes calories: 1,809 words … 9 minutes.
🚨Join the Google Developers Group Atlanta for our first Brews & Bytes happy hour and code jam! Our first session is a Google I/O Extended edition where we’ll create our own AI-generated text application with PaLM API and Flutter - Wednesday, June 28 from 6:30 - 8:30 PM ET at Red’s Beer Garden - RSVP HERE!
🔨What I’m working on - I collaborated with Dr. Claudia Igbrude during COVID lockdowns on AI ethics in medical applications and we got our paper published in AI and Ethics! Check it out HERE.
One Big Thing: Python Unit Testing Primer
Why did the python programmer get a broom?
…
To sweep away all the bugs and keep their code clean!
Python Unit Testing with Unittest and Coverage.py
This guide aims to show you how to create and run unit tests for your Python code using the built-in `unittest` module, and how to measure code coverage with `coverage.py`.
This is intended for data scientists, data engineers, and analysts with little code testing experience.
1. Introduction to Unit Testing
In programming, unit testing is a method where individual units of source code — usually functions or methods — are tested to determine if they behave as expected. Python’s unittest module comes inbuilt with the Python standard library and provides a rich set of tools for constructing and running tests.
2. Setting Up Your Environment
Before we start, ensure that you have Python installed on your computer. You can verify this by running the following command in your terminal or command prompt:
python --version
This command should return the Python version installed on your machine. If you haven’t installed Python, you can download it from the official Python website.
3. Project Structure for Testing
To make testing easier, it’s recommended to structure your Python project in a certain way. Here is a simple project structure that works well for most cases:
/myproject
/myproject
__init__.py
calculator.py
/tests
__init__.py
test_calculator.py
setup.py
This structure separates the source code from the tests, making it easier to manage both. Note that we have an __init__.py file in both directories. This file is required to make Python treat the directories as containing packages.
4. Creating a Simple Function to Test
Now let’s create our first test case. We’ll create a new Python file named test_calculator.py . In this file, we’ll import the unittest module and the function we want to test from calculator.py .
import unittest
from myproject.calculator import add
class TestCalculator(unittest.TestCase):
def test_add(self):
result = add(10, 5)
self.assertEqual(result, 15)
if __name__ == '__main__':
unittest.main()
This code defines a test case that inherits from unittest.TestCase . Within the test case, individual tests are defined with methods whose names start with test. This naming convention tells the test runner which methods represent tests. In our test_add method, we call the add function with the arguments 10 and 5. We then check if the result equals 15 — this is our expected output. The self.assertEqual() method is a test assertion that confirms that the result and the expected output are equal. If they are, the test passes. If they’re not, the test fails.
6. Running the Test
To run the test, navigate to the root of your project in your terminal or command prompt (where the setup.py is located), and run the following command:python -m unittest tests.test_calculator
7. Understanding the Results
The output of your test run will look something like this:
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
The dot at the start of the output represents a single test that has passed.
Measuring Code Coverage with Coverage.py
Code coverage is a measurement of how many lines/blocks of your code are executed while the automated tests are running. A code coverage tool can give you insights into how well your tests exercise your codebase. Python’s coverage.py is a tool for measuring code coverage of Python programs.
8. Setting Up Coverage.py
Firstly, you need to install coverage.py. You can do this with pip, Python’s package installer. Run the following command in your terminal or command prompt:
pip install coverage
9. Running Coverage with Your Unit Test
Once coverage.py is installed, you can run it with your unittests to collect coverage data.
coverage run -m unittest tests.test_calculator
This command runs your tests and collects coverage data during the test run.
10. Understanding the Coverage Report
After you’ve run your tests with coverage, you can then generate a report with coverage.py.
coverage report -m
This command will output a report in your terminal that might look something like this:
Name Stmts Miss Cover Missing
--------------------------------------------------
calculator.py 2 0 100%
test_calculator.py 6 0 100%
--------------------------------------------------
TOTAL 8 0 100%
The “Stmts” column shows the total number of lines of code. “Miss” shows how many of those lines were not executed by your tests. “Cover” shows the percentage of your code that was covered by tests. The “Missing” column would show any line numbers of code that were not covered by tests, but since our test coverage is 100%, it’s empty. That’s it! You now have a basic understanding of how to set up and run unit tests with Python’s unittest module, as well as how to measure code coverage with coverage.py .
11. Best Practices for Code Coverage
Aim for high coverage, but not necessarily 100%
• While it may be tempting to aim for 100% code coverage, this shouldn’t necessarily be your goal. Testing every single line of code can be time-consuming and may not add much value if those lines are unlikely to contain bugs. Instead, aim for a high level of coverage that allows you to confidently catch and prevent bugs.
Focus on testing behavior, not lines of code
• It’s more important to test the behavior of your code than it is to test every single line. When writing tests, think about what behavior your
code is supposed to exhibit and write tests that check that behavior.
Beware of false security
• High code coverage can give you confidence in your code, but it doesn’t guarantee your code is bug-free. Your tests might cover all lines of code but miss bugs because they’re not checking the right things. It’s crucial to ensure your tests are effective and check the correct behavior of your code.
Combine coverage with other testing techniques
• Code coverage is a powerful tool, but it should be just one part of a comprehensive testing strategy. Other testing techniques, like integration testing, system testing, and manual testing, are also important and can catch bugs that unit tests with high coverage might miss.
Remember, the goal of testing is not to achieve high coverage, but to reduce bugs and ensure your code works as expected. While coverage can be a useful metric, it’s not the only thing that matters.
12. Why and How to Use setup.py
setup.py is a file used by Python’s setuptools library to define a package’s metadata and contents.
In our case, setup.py allows us to specify the modules and packages we’re including in our project. It also helps with the installation and distribution of our package, but in the context of this guide, its main role is to make our package discoverable by Python’s import system, especially when running tests.
Here’s a basic setup.py file for our myproject package:
from setuptools import setup, find_packages
setup(name="myproject", packages=find_packages())
To use setup.py with unittest , we need to run the tests from the directory containing setup.py (the root of our project). This is why we ran our tests with python -m unittest tests.test_calculator instead of directly running test_calculator.py.
13. Mocking and Patches
In unit testing, we often need to isolate the function or method we’re testing. This is where mocking comes in. With mocking, we can replace parts of the system under test and simulate behaviors. This helps us to test functions in isolation and control the testing environment.
Python’s unittest.mock module is a powerful tool for creating mock objects and defining their behavior. The patch() function is often used to replace an object in the module under test with a mock.
Here’s an example where we create a mock for our add function:
import unittest
from unittest.mock import patch
from myproject.calculator import add
class TestCalculator(unittest.TestCase):
@patch('myproject.calculator.add')
def test_add(self, add_mock):
add_mock.return_value = 15
result = add(10, 5)
self.assertEqual(result, 15)
add_mock.assert_called_with(10, 5)
if __name__ == '__main__':
unittest.main()
In this example, @patch('myproject.calculator.add') replaces the add function in calculator.py with a mock object for the duration of the test.
We can then define the behavior of the mock with add_mock.return_value =15 and check that it was called with the right arguments with add_mock.assert_called_with(10,5).
Remember, mocks should be used sparingly and only when necessary, as they can make tests more complex and harder to understand.
That’s it! You now have a basic understanding of how to set up and run unit tests with Python’s unittest module, how to measure code coverage with coverage.py , how to structure your Python project for easier testing, and how to use mocks and patches. Happy testing!
Helpful Resources
unittest - Python standard unit testing framework
Getting Started with Testing in Python - primer from Real Python
🍬 Sweet & Sour Candy (this week’s good, bad, or weird of the tech world)
🤢 Generative AI data leaks are a serious problem, experts say - Generative AI tools like OpenAI’s ChatGPT are causing data leaks, with users unknowingly uploading sensitive data while querying them for answers. Samsung experienced several incidents involving employees pasting lines of proprietary code into ChatGPT, leading to a ban on the use of the AI chatbot. Despite warnings, two staffers reportedly uploaded segments of proprietary code, while a third uploaded a recording of a meeting via a personal assistant app. Cyberhaven data shows that the firm detected 6,352 attempts to paste corporate data into ChatGPT for every 100,000 employees of its customers. Organizations need to be clear on what they allow employees to do and integrate AI models into their business process with multiple layers of security and new types of policies, procedures, and audits.
😀 AI allows paralyzed person to ‘handwrite' with his mind | Science | AAAS - Researchers have developed a new technique to allow paralyzed patients to communicate more quickly by imagining moving their arm to write each letter of the alphabet, which is then interpreted by a neural network to create letters. The computer could read out the volunteer's imagined sentences with roughly 95% accuracy at a speed of about 66 characters per minute, and the researchers expect the speed to increase with more practice.
🍫 One last bite
If you can't explain it simply you don't understand it well enough. ~Albert Einstein


