PostgreSQL Transactions and SQLAlchemy

SQLAlchemy defaults to implicitly opening a new transaction on your first database query. If you prefer to start your transactions explicitly instead, I've documented here my explorations in getting that to work.

SQLAlchemy defaults to implicitly opening a new transaction on your first database query. If you prefer to start your transactions explicitly instead, I’ve documented here my explorations in getting that to work.

There are several different layers at play, so let’s review from the top:

Transactions in PostgreSQL

Since version 7.4, PostgreSQL itself always operates in “autocommit” mode. This means that if you haven’t started a transaction, and you run a query that changes data, it takes effect immediately; there is no need to commit it, and no option to roll it back.

If you want to group several queries together into an atomic unit of work, such that you can commit them all together or roll them back all together, you first issue a BEGIN statement, then your queries, and then either a COMMIT or ROLLBACK statement.

Transactions in PEP 249

PEP 249 specifies a common API for all Python database adapter libraries, making it easier to write cross-database-compatible code. Almost all the popular Python database adapters are (or claim or try to be) PEP 249 compliant.

The PEP 249 API specifies a different behavior, which I will call “implicit transactions.” There isn’t even a way to explicitly start a transaction in the PEP 249 API; no begin() or similar. Instead, you are always in a transaction; one is automatically started the first time you send a query to the database after opening a connection, and the first time you send a new query after ending the previous transaction with a rollback or commit.

Because Postgres doesn’t offer this behavior natively, the Postgres Python adapters (e.g. psycopg2) have to emulate it themselves in order to be PEP 249 compliant. So by default, the first time you send a query to the database, psycopg2 will prefix it with a BEGIN on your behalf.

In order to get psycopg2 to stop sending these automatic BEGIN statements and to behave like Postgres natively does, you set the autocommit property of your connection object to True:

import psycopg2
conn = psycopg2.connect('dbname=test')
conn.autocommit = True

Postgres Has it Right

I prefer the ‘autocommit’ model. It’s simple, explicit, and unsurprising. Queries are never grouped together into a transaction unless you ask for one. If you issue a COMMIT or ROLLBACK, there is never any doubt about which queries you are committing or rolling back, because you explicitly issued the BEGIN to start the transaction.

I can see some advantage to the implicit-transaction model for interactive-shell use (you can always roll back your changes if you screw something up, even if you forgot to BEGIN), but for general use I think it’s more error-prone. If you save a change to the database in one place but forget to commit, that change will automatically be wrapped up in the same transaction with later, possibly completely unrelated changes, and may get blindly rolled back along with them. It wraps read-only SELECT statements in useless transactions. And for long-running processes, since even a simple SELECT implicitly opens a transaction (and you wouldn’t intuitively think you’d need to commit or rollback after a SELECT), it’s very easy to unintentionally end up with connections in the “idle in transaction” state, where they are doing nothing but still may be holding locks and preventing Postgres from compacting tables.

The explicit-transactions (autocommit) model lends itself naturally to Python idioms like decorators or context managers for handling transactions. These clearly wrap a section of code (a function, or the block introduced by the with statement) as an atomic unit of work that will be committed or rolled back together. It’s more difficult to correctly implement such an idiom in the implicit-transactions model, because there is no explicit start point to a transaction, so prior queries outside the demarcated block might get wrapped up in the same transaction (or you have to check when entering the block whether there are uncommitted queries in an existing transaction, and decide what should be done with them — perhaps having to guess, or raise an error.)

Django

Before we talk about SQLAlchemy, let’s discuss the Django ORM briefly for comparison purposes.

Up until Django 1.6, Django’s transactions API used psycopg2 in its default (PEP 249) mode, but then by default emulated a sort of Python-level “autocommit” mode by automatically issuing commits after ORM-issued queries.

So in other words, if you were using Django pre-1.6 with Postgres, you had psycopg2 emulating implicit-transactions on top of Postgres’ native autocommit by automatically issuing BEGIN statements on your behalf, and then Django emulating autocommit atop psycopg2’s emulated non-autocommit atop Postgres’ native autocommit, by also automatically issuing COMMIT statements on your behalf.

Got all that?

Thankfully, in Django 1.6 Aymeric Augustin rewrote Django’s transaction support (based in part on prior work by Christophe Pettus) to use database-level autocommit natively and only open transactions when explicitly requested. The core API (really, the only API) is transactions.atomic, which can work as either a decorator or context manager to define an atomic unit of database work (it can also be nested, using a stack of savepoints to achieve the effect of nested transactions).

SQLAlchemy

SQLAlchemy follows the lead of PEP 249 and uses implicit-transactions as its primary mode. Whenever you query the database, a transaction automatically starts (SQLAlchemy doesn’t do anything special to make this happen, it’s just using the database adapter — psycopg2 in our case — in its default PEP-249-compliant mode), and you can commit that transaction with session.commit() or roll it back with session.rollback().

Although I don’t prefer this mode, it’s a reasonable choice for SQLAlchemy to rely on PEP 249 consistency across the board rather than implementing custom code for native-autocommit mode in all its many supported databases.

But I’m using Postgres, I know how its native autocommit mode works, and that’s the behavior I want with SQLAlchemy. Can I make that work?

SQLAlchemy’s Autocommit Mode — Not What You Think

I soon found autocommit mode in SQLAlchemy’s documentation, and thought I had my answer — but no such luck. SQLAlchemy’s autocommit mode is roughly parallel to the “autocommit” in Django pre-1.6 (albeit smarter): it emulates autocommit over top of non-autocommit database adapters by automatically committing an implicit transaction after you send queries that change the database. It doesn’t put the database connections into true autocommit mode, so it still wraps reads in unnecessary transactions.

Turning on Real Autocommit

Happily, setting all of SQLAlchemy’s psycopg2 connections into real autocommit became quite easy in SQLAlchemy 0.8.2: SQLAlchemy’s psycopg2 “dialect” now exposes an AUTOCOMMIT transaction isolation level, and selecting it sets autocommit=True on all the psycopg2 connections.

from sqlalchemy import create_engine
engine = create_engine('postgresql://test', isolation_level="AUTOCOMMIT")

We haven’t discussed transaction isolation levels yet (and I won’t in detail here). They control the visibility of changes between multiple concurrent transactions. The Postgres documentation summarizes the options it provides.

It’s a bit odd that SQLAlchemy (and psycopg2) provide AUTOCOMMIT as a transaction isolation level, since Postgres has no such isolation level, and autocommit-mode isn’t really an isolation level at all. (In fact, if you choose the AUTOCOMMIT “isolation level” in psycopg2, it turns on the connection’s autocommit property but actually resets the real Postgres isolation level to “read committed”, the default.)

But regardless, isolation_level='AUTOCOMMIT' is the incantation needed to get all of our SQLAlchemy psycopg2 connections into true autocommit mode.

Starting a Transaction

If we didn’t want to use transactions at all, this would be all we need. SQLAlchemy would happily hum along thinking it has a transaction but actually not having one at all (just like it does for databases that don’t support transactions).

But we do want to use transactions, so we need a way to start one. The natural API for this already exists in SQLAlchemy: session.begin(). Since SQLAlchemy assumes that its database adapter will automatically start transactions, session.begin() never actually issues a BEGIN to the database. But we don’t actually need to issue BEGIN ourselves either - we just need to turn off the autocommit property on our connection, and then psycopg2 will issue the BEGIN for us.

SQLAlchemy gives us a way to hook into the begin() call: the after_begin event, which sends along the relevant database connection. We have to dig through a few layers of connection-wrapping to get down to the actual psycopg2 connection object, but that’s not hard:

from sqlalchemy import create_engine, event
from sqlalchemy.orm import sessionmaker

engine = create_engine('postgresql://test', isolation_level="AUTOCOMMIT")
Session = sessionmaker(bind=engine, autocommit=True)

@event.listens_for(Session, 'after_begin')
def receive_after_begin(session, transaction, connection):
    """When a (non-nested) transaction begins, turn autocommit off."""
    dbapi_connection = connection.connection.connection
    if transaction.nested:
        assert not dbapi_connection.autocommit
        return
    assert dbapi_connection.autocommit
    dbapi_connection.autocommit = False

The session.begin() API can also be used to initiate “nested transactions” using savepoints. In this case autocommit should already have been turned off on the connection by the outer “real” transaction, so we don’t need to do anything. We add in a couple asserts to validate our assumptions about what the autocommit state should be in each case, and in the non-nested case we turn autocommit off.

We also pass autocommit=True to the Session; this turns on SQLAlchemy’s autocommit mode (mentioned above). This is necessary to prevent SQLAlchemy from automatically starting a transaction (and thus triggering our after_begin listener) on the first query.

This is the piece that I’m least happy with, as it means we have to worry about what is meant by the vague warnings in the documentation that Session autocommit is a “legacy mode of usage” that “can in some cases lead to concurrent connection checkouts” and that we should turn off the Session’s autoflush and autoexpire features. So far I haven’t done the latter; waiting to see what (if any) problems ensue in practice.

Back to Autocommit When the Transaction Ends

One thing remains lacking from our implementation. When a transaction ends, we need to restore that connection to autocommit mode again.

This isn’t entirely straightforward. SQLAlchemy gives us three events corresponding to the end of a transaction: after_rollback, after_commit, and after_transaction_end. But these are all fired after the connection has been “closed” (that is, returned to the connection pool), and after_transaction_end is only fired once per SQLAlchemy SessionTransaction object, which can involve multiple connections. For both of these reasons, none of those events provide us with a connection object.

In order to get around this, I maintain a mapping of SessionTransaction objects to the connection(s) that have had autocommit turned off due to that transaction. Then I listen to after_transaction_end and restore autocommit on all the appropriate connections:

from sqlalchemy import create_engine, event
from sqlalchemy.orm import sessionmaker

engine = create_engine('postgresql://test', isolation_level="AUTOCOMMIT")
Session = sessionmaker(bind=engine, autocommit=True)

dconns_by_trans = {}

@event.listens_for(Session, 'after_begin')
def receive_after_begin(session, transaction, connection):
    """When a transaction begins, turn autocommit off."""
    dbapi_connection = connection.connection.connection
    if transaction.nested:
        assert not dbapi_connection.autocommit
        return
    assert dbapi_connection.autocommit
    dbapi_connection.autocommit = False
    dconns_by_trans.setdefault(transaction, set()).add(
        dbapi_connection)

@event.listens_for(Session, 'after_transaction_end')
def receive_after_transaction_end(session, transaction):
    """Restore autocommit where this transaction turned it off."""
    if transaction in dconns_by_trans:
        for dbapi_connection in dconns_by_trans[transaction]:
            assert not dbapi_connection.autocommit
            dbapi_connection.autocommit = True
        del dconns_by_trans[transaction]

A Transaction Context Manager

Now that we have autocommit mode working, here’s an example of a rough equivalent to transaction.atomic for SQLAlchemy (unlike transaction.atomic this doesn’t work as a decorator, but adding that is just a matter of some boilerplate):

from contextlib import contextmanager
from sqlalchemy.orm import Session as BaseSession


class Session(BaseSession):
    def __init__(self, *a, **kw):
        super(Session, self).__init__(*a, **kw)
        self._in_atomic = False

    @contextmanager
    def atomic(self):
        """Transaction context manager.

        Will commit the transaction on successful completion
        of the block, or roll it back on error.

        Supports nested usage (via savepoints).

        """
        nested = self._in_atomic
        self.begin(nested=nested)
        self._in_atomic = True

        try:
            yield
        except:
            self.rollback()
            raise
        else:
            self.commit()
        finally:
            if not nested:
                self._in_atomic = False

It would be possible to implement this same context manager in SQLAlchemy’s default implicit-transactions mode: you just leave out the call to session.begin() in the non-nested case (because an implicit transaction will already have been created). But that implementation then suffers from the bug where database queries from prior to the context-managed block could be included within its transaction. With some further cleverness you might be able to figure out on entering the context manager whether the existing transaction is already “dirty” (though this is not trivial to determine), and then either raise an error or implicitly commit the existing transaction — but neither of these solutions are appealing compared to the conceptual simplicity of autocommit & explicit transactions.

Conclusion

Is this all worth it? Perhaps not; it’s possible to work around the problems with implicit transactions by being careful. And I’m not yet clear on the costs of this approach — just how bad are the problems SQLAlchemy’s docs warn about with its autocommit mode?

In any case, while I understand why SQLAlchemy is well-advised to generally follow PEP 249 for its default behavior, I would love if it had (scary-warning-free) support for an “autocommit and explicit transactions” mode on those databases/adapters with good support for it.

The code from this post is pulled together in a gist. I also have tests for it, but they are currently integrated with the project where I’m using this. If there’s enough interest (and it works well on this project) I might be convinced to package it up and release it properly.

Acknowledgments

Thanks to Mike Bayer for writing SQLAlchemy(!) and for pointing me towards the AUTOCOMMIT “isolation level” setting. Thanks to Christophe Pettus for my initial education in Postgres’ transaction behavior, and Aymeric Augustin for the excellent implementation in Django 1.6+.

Carl Meyer is an active owner, developer-emeritus, and core developer of the Django Python web framework.

Let’s Build Something Together!

We want to hear all about your software ideas. Fill out our contact form, join our public Slack chat, or tweet @oddbird to start the conversation.