Logging as Storytelling

The log messages generated by a piece software tell a story: what, where, when, even why and how if you’re lucky. The readers of this story are more often than not other programs: monitoring systems, performance tools, or just filtering the messages down to something a human can actually comprehend. Unfortunately the output of most logging systems is ill-suited to being read by programs. Even worse, most logging systems omit critical information that both humans and their programs need. As part of my work at ClusterHQ I’ve been working on fixing that.

Problem #1: Text is hard to search

Let’s say you want to find all the log messages about a specific person. A first pass of log messages might look like this:

Sir James Chettam was going to dine at the Grange to-day with another gentleman whom the girls had never seen, and about whom Dorothea felt some venerating expectation.

If Miss Brooke ever attained perfect meekness, it would not be for lack of inward fire.

You could do a text search for log messages containing the text “Dorothea”, but this is likely to fail for some types of searches. You might want to searching for actions involving dinner, but then you would need to search for “dine” and “dinner” and perhaps other words well. A library like structlog that can generate structured log messages will solve this first problem. You could define a “person” field in your messages and then you can search for all messages where person == "Dorothea" as well as other structured queries.

Problem #2: Referring to Entities

Every time a log message is written out you need to decide how to refer to the objects being logged. In the messages we saw above “Dorothea” and “Miss Brooke” are in fact different identifiers for the same person. Having structured messages doesn’t help us find all messages about a specific entity if the object is referred to inconsistently. What you need is infrastructure for converting specific kinds of objects in your code to fields in your structured log messages. Then you can just say “log a message that refers to this Person” and that reusable code will make sure the correct identifier is generated.

Problem #3: Actions

Most log messages in your program are going to involve actions:

Not long after that dinner-party she had become Mrs. Casaubon, and was on her way to Rome.

A marriage has a beginning and eventually an end. The end may be successful, presuming “until death do us part” is a form of success, or a failure. The same is true of all actions, much like function calls in Python are started and eventually return a result or throw an exception. Actions may of course span multiple function calls or extended periods of time.

Actions also generate other actions: a marriage leads to a trip to Rome, the trip to Rome might lead to a visit to the Vatican Museum, and so on. Other unrelated actions are occurring at the same time, resulting in a forest of actions, with root actions that grow a tree of child actions.

You might want to trace an action from beginning to end, e.g. to measure how long it took to run. You might want to know what high-level action caused a particular unexpected low-level action. You might want to know what actions a specific entity was involved with. None of these are possible in most logging systems since they have no concept of actions to begin with.

Those are the problems. In my following posts I will talk about the solutions, being built as part of a Python library called Eliot. We hope to open source Eliot at some point in the future, but the general design ideas should be applicable anywhere.

Get Involved

Sign up for email updates about Flocker