Python code quality with Ruff - Part 3

Python - Ruff

You can find the first part of this post here: https://cyber.airbus.com/en/newsroom/stories/2025-10-python-code-quality-with-ruff-one-step-at-a-time-part-1

And the second part here: https://cyber.airbus.com/en/newsroom/stories/2025-10-python-code-quality-with-ruff-one-step-at-a-time-part-2

Legacy Celery task results, refactoring. . .

This brings me to the last and most interesting violation of the lot. In app.blueprints.pages.dim_tasks.dim_tasks, field info of the Celery queue task result is retrieved and set to an unused variable. At first glance, this statement does not seem to achieve anything. Looking at the Celery sources, it turns out that retrieving field info does not translate in a simple field access. It calls a getter, defined thanks to the @property annotation.

As such, it could perform any kind of side-effects. This is the reason the automatic fix provided by Ruff may be unsafe. And this is also the reason why the definition of getters and setters through the @property are forbidden in our coding rules: they make the code
difficult to reason about.

In this specific case, the field access may raise the AttributeError exception. This seems to happen when the result was created and stored in the backend (here in the database through SQLAlchemy) with an older version of DFIR-IRIS and in a bygone format. I am going to open an issue, because I find it is not the responsibility of the application to check for the consistency of data persisted in database at runtime. It makes the source code unnecessarily complex. I would prefer for task results stored in an obsolete format to be removed from the backend or sanitized, ideally at migration time.

For the time being, I will still try to fix the Ruff warning, wile doing some other cleanups along the way. If I take a look at the jinja template modal_task_info.html, I notice variable task_id is unused, so I remove it at the two places render is called. Commit and push.

Then I divide the method in two: on one hand, the rendering, on the other hand, the retrieval of the task and its conversion into a dictionary. I leave the rendering here, in the presentation layer. The remaining is moved down into the business layer (module app.business.dim_tasks). The split is not perfect, as this new function returns a dictionary. I would have prefered to return a data object, it’s more explicit, better typed.

Some of the keys in the dictionary have names with clear presentation layer overtones, such as Task ID. But that’s a step in the right direction. Commit and push. I notice the code calls a private method of the celery AsyncResult _get_meta_task in order to access the kwargs, name and traceback.

However, looking at the Celery documentation, the AsyncResult seems to already propose properties kwargs, name and traceback. I replace the occurences. I am having doubts. I know I am taking risks. This code portion is not covered by automatic tests, yet. So I fire DFIR-IRIS up and unwind a manual test scenario: log as administrator, activate the IrisCheck module, create a case, open the DIM Tasks page and click on the task. Billions and billions of blue blistering barnacles! I broke something. The request returned 500. I check the server logs to observe the error stack trace:

Traceback (most recent call last):

...
File "/iriswebapp/app/blueprints/access_controls.py"
return f(*args, **kwargs)
File "/iriswebapp/app/blueprints/pages/dim_tasks/dim_tasks.py"
task_info = dim_tasks_get(task_id)
File "/iriswebapp/app/business/dim_tasks.py"
if task_meta.traceback:
NameError: name ’task_meta’ is not defined

Hopefully, that’s just a silly typo: it should be task instead of task_meta. Before I fix it, I capitalize by adding a new playwright test to the existing automatic end-to-end test suite. Then fix. Remember: red-green-refactor. Check one last time, manually. Just in case, as one bug may hide another. Commit and push.

Next, I spot some uses of the ternary operator. I tend to advise against this construct: it’s a false friend. Yes, it lets you quickly write terse code. But this is to the detriment of readability and the ability to spot refactoring opportunities. I prefer to extract methods. Their names will convey some information about the intent of the code. And they can be reused. I extract these two methods:

def _get_engine_name(task):
if not task.name:
return ’No engine. Unrecoverable shadow failure’
return task.name
def _get_success(task_result: IIStatus):
if task_result.is_success():
return ’Success’
else:
return ’Failure’

Commit and push. Looking at the jinja template which displays tasks once more, I notice fields with the None value are not shown. So absent optional fields, or fields with the None value are handled in the same way. This allows me to do one final simplification: I can collect all the field values (some possibly None) and build the whole task object at once. Commit and push.

Time to fix that Ruff warning which brought me here. I will extract the code which checks a task result is in a legacy format into a specific function. I rename the assigned variable in conformity with the dummy-variable-rgx regular expression, so that Ruff ignores it when enforcing F841:

def dim_tasks_is_legacy(task):
try:
_ = task.info
return False
except AttributeError:
return True

I now activate F841 in the Ruff configuration, commit and push.

It turns out, this new function can immediately be reused to eliminate some code duplication. By searching for exception AttributeError
throughout the codebase, I found an almost identical code block in module app.blueprints.rest.dim_tasks_routes. Commit and push. Except it doesn’t work.

Fortunately, this time I was warned by the end to end test I previously wrote. It failed during continuation integration. The practice of small commits makes it easy to pintpoint the error. It turns out that tasks handled in module app.blueprints.rest.dim_tasks_routes are retrieved directly from the database backend. They appear to not have the same fields as the tasks wrapped by the AsyncResult from Celery. For instance, proprety info on the AsyncResult is read from field result of the database object.

Let’s find a property that’s present on both object: date_done will do it. This certainly smells a bit like a hack, because method dim_tasks_is_legacy is applied on objects of different types. It would make the code more regular to always operate at the same level of abstraction. Since Celery does not seem to provide any public API to access past task results, maybe function dim_tasks_get should retrieve the task directly from the database instead of relying on the Celery AsyncResult object. But that would take us too far for today. So I just report another quality issue for future work.

The final code of the business function looks like this:

def dim_tasks_get(task_identifier):
task = celery.AsyncResult(task_identifier)
if dim_tasks_is_legacy(task):
return {
’Danger’:
’This task was executed in a previous version ’
’of IRIS and the status cannot be read anymore.’,
’Note’: ’All the data readable by the current IRIS ’
’version is displayed in the table.’,
’Additional information’:
’The results of this tasks were stored in a ’
’pickled Class which does not exists anymore ’
’in current IRIS version.’
}
engine_name = _get_engine_name(task)
user = None
module_name = None
hook_name = None
case_identifier = None
if task.name and (’task_hook_wrapper’ in task.name
or ’pipeline_dispatcher’ in task.name):
module_name = task.kwargs.get(’module_name’)
hook_name = task.kwargs.get(’hook_name’)
user = task.kwargs.get(’init_user’)
case_identifier = task.kwargs.get(’caseid’)
if isinstance(task.info, IIStatus):
success = _get_success(task.info)
logs = task.info.get_logs()
else:
success = ’Failure’
user = ’Shadow Iris’
logs = [’Task did not returned a valid IIStatus object’]
return {
’Task ID’: task_identifier,
’Task finished on’: task.date_done,
’Task state’: task.state.lower(),
’Engine’: engine_name,
’Module name’: module_name,
’Hook name’: hook_name,
’Case ID’: case_identifier,
’Success’: success,
’User’: user,
’Logs’: logs,
’Traceback’: task.traceback
}

Not perfect yet. But I like it better. I hope you do too.

Tidying up unused imports (F401) and wildcard imports (F403)

Let us now turn to F401. I first apply the automatic safe fixes, commit and push.

The remaining 6 errors are unused imports in three __init__ files:

source/app/__init__.py
source/app/flask_dropzone/__init__.py
source/app/models/__init__.py

I spot an usual suspect: the Flask Dropzone. I will start there. Importing a symbol in an __init__ file can be used to re-export it as part of the module’s public interface. This is probably the case for random_filename. But it is not used anywhere. So I can simply remove the statement, commit and push.

In contrast symbols exported in the __init__ file of app.models, seem to be heavily used throughout the application. But I don’t really see the point. I am working with the source code of a fully fledged web application. This is different than a library for which it would be valuable to mark the publicly exposed API.

In the same file, there is also a wildcard import, which defeats the whole point of trying to control the module’s API. So I update all indirect imports to instead directly point to the original symbol. Now, the __init__ of module app.models file can simply be emptied.

Furthermore, this lets me immediatly activate the F403 rule: there was only one wildcard import and I just removed it. Commit and push.

The work to perfom in source/app/__init__.py is slightly more involved. This is the entry point of DFIR-IRIS. It creates and configures the Flask application, registers several Flask extensions, such as, among others, the Flask-Login login manager. Then, at the end of the file, views is imported.

Application initialisation continues there. First with a long sequence of global instructions to register all the blueprints. Then, there is a post init phase, during which the database is upgraded and filled with default objects. At last, some callbacks are registered on the application login manager thanks to decorators user_loader and request_loader.

As soon as a module is imported its global instructions are executed, likewise for method annotations. This explains why views needs to be imported at the end of __init__.py file, after the Flask application creation. Personnally, I prefer to have a finer control over what is executed and when. So, I create a method to register all blueprints. I move up the call to post_init directly into the __init__.py file. For the login manager callbacks, remember that the @wrapper syntax is just syntactic sugar. So this is equivalent to that:

def load_user(user_id):
return User.query.get(int(user_id))
load_user = @lm.user_loader(load_user)

I can leave the function here and move the callback registration up into the __init__.py file. Note that this also removes the circularity in imports, we used to have: app imports views which imports lm from app again. It’s much better. I can activate rule F401 in ruff configuration. Commit and push.

Unfortunately, the imports to the functions defined in app.views and to run_post_init are still in the middle, when they should preferably be located at the top of the file. That’s because they end up building import cycles, in particular with the database and the logger. That’s a classic and annoying problem with Flask: if left untamed, it has a tendency to behave like a global hub. This unnecessarily makes fine-grained unit testing difficult. Fortunately, there are known solutions. I will leave this work for another time. The last grains of the golden sand are creeping through my fingers to the deep.

Conclusion

Ruff, DFIR-IRIS, testing, continuous integration, methodology, refactoring techniques... We covered a lot of ground. One bead at a time on a rosary, one step at a time up a mountain, so it goes, methodically improving code, even while all around you, busy-ness rages, at a hectic pace. Even if your codebase is huge, the debt astronomical, there is a path. Even if you only have a 10 minutes window of opportunity, you can still do one commit in the right direction. As long as there is continuous integration, your progress will be locked. All you really need is a sense of direction. Coding rules, which, hopefully, are shared within your organization, embody this aesthetics.

Walking you through this code refactoring session, was the opportunity to showcase some of the principles, that we try to apply at Airbus Defence and Space Cyber:

low hanging fruits: go for the easiest tasks first
perform small steps, small commits
lock progress with automatic feedback in continuous integration
practice the red/green/refactor philosophy (not only with test: ensure any check serves it purposes by having it fail the first time)
if you break some behaviour, add some automatic test at the adequate level: unit, component, end-to-end
do not commit third party source code
keep trace of all dependencies and versions
apply third party security updates
when short of time, take delayed actions: create issues
use a modern IDE
move complexity left, by doing things the earliest possible: at build time rather than at installation/update time, rather than at startup time, rather than at runtime
separate responsibilities (according to your architecture)
DRY: avoid and eliminate code duplication
remove dead code
prefer composition over inheritance (strategy pattern)
normalize data
do not access symbols marked as private (prefixed with _) from outside their definition scope
avoid global instructions
avoid the ternary operator, prefer to extract methods
avoid the definition of getters and setters via @property annotation
unless for the public API of a library, prefer empty __init__ files
avoid wildcard imports
avoid import cycles
avoid imports in the middle of a file

There is a lot of software out in the world, eagerly waiting to be exploited. I certainly hope more effort would go towards simplifying, removing code, than just adding features. That’s why I am transparently sharing all the steps of this craft. Improve your code quality, 10 minutes at a time!