Data Migrations with RunPython

1. Introduction

Schema migrations create, alter, or drop tables and columns. Data migrations do something different — they transform or backfill existing data as part of the migration process. You use them when adding a new field that needs values populated from existing data, or when renaming or restructuring how data is stored.

Django provides RunPython to run arbitrary Python code inside a migration. This guide covers how to write safe, reversible data migrations with real examples.

You should already understand how migrations work. If not, see Migrations Basics.
Your .venv must be active and Django 5.2 installed.

2. When to use data migrations

Common situations where you need a data migration:

You add a new required field and need to populate it with a value derived from existing data.
You split one field into two and need to move the data across.
You change how a value is stored — for example converting all status values from integers to strings.
You need to clean up bad data that was stored before a validator was added.

3. RunPython basics

RunPython takes two arguments: a forwards function that runs when the migration is applied, and an optional backwards function that runs when the migration is reversed.

from django.db import migrations


def forwards(apps, schema_editor):
    # Your data transformation logic here
    pass


def backwards(apps, schema_editor):
    # How to undo the transformation
    pass


class Migration(migrations.Migration):

    dependencies = [
        ('pages', '0002_article_slug'),
    ]

    operations = [
        migrations.RunPython(forwards, backwards),
    ]

The two arguments passed to your functions are:

apps — a registry of historical model versions. Always use this to get models inside a data migration — not the real model class.
schema_editor — the database connection handler. You rarely use this directly in data migrations.

4. Real example — backfilling a slug field

Suppose you added a slug field to Article but existing records have no slug. You need to generate slugs from titles for all existing articles.

Step 1 — add the field in a schema migration

# pages/models.py
slug = models.SlugField(max_length=220, unique=True, blank=True)

python manage.py makemigrations pages
# Creates 0003_article_slug.py

Step 2 — create the data migration

python manage.py makemigrations pages --empty --name populate_article_slugs
# Creates 0004_populate_article_slugs.py

Step 3 — write the RunPython function

# pages/migrations/0004_populate_article_slugs.py

from django.db import migrations
from django.utils.text import slugify


def populate_slugs(apps, schema_editor):
    Article = apps.get_model('pages', 'Article')
    for article in Article.objects.filter(slug=''):
        base_slug = slugify(article.title)
        slug = base_slug
        counter = 1
        # Handle duplicates
        while Article.objects.filter(slug=slug).exists():
            slug = f'{base_slug}-{counter}'
            counter += 1
        article.slug = slug
        article.save()


def reverse_populate_slugs(apps, schema_editor):
    Article = apps.get_model('pages', 'Article')
    Article.objects.all().update(slug='')


class Migration(migrations.Migration):

    dependencies = [
        ('pages', '0003_article_slug'),
    ]

    operations = [
        migrations.RunPython(populate_slugs, reverse_populate_slugs),
    ]

python manage.py migrate pages
# Applies the schema migration then the data migration

5. Always use historical models

Inside a data migration, always get the model through apps.get_model() — never import it directly from your app:

# Correct — uses the historical model state at migration time
Article = apps.get_model('pages', 'Article')

# Wrong — imports the current model, which may have changed since this migration was written
from pages.models import Article  # never do this in migrations

Django stores a snapshot of each model at the time the migration was written. If you import the live model and it has changed since the migration was created, the migration will break — or worse, silently do the wrong thing.

6. RunSQL — when raw SQL is cleaner

For simple transformations, raw SQL is sometimes cleaner and faster than iterating with Python. Use RunSQL for these cases:

from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
        ('pages', '0003_article_slug'),
    ]

    operations = [
        migrations.RunSQL(
            sql="UPDATE pages_article SET status = 'published' WHERE is_published = TRUE;",
            reverse_sql="UPDATE pages_article SET is_published = TRUE WHERE status = 'published';",
        )
    ]

RunSQL is faster than RunPython for bulk updates because it runs a single SQL statement instead of loading records into Python one by one. Use it when the transformation can be expressed cleanly in SQL.

7. Making migrations reversible

Always provide a backwards function if the transformation can be reversed. If it cannot be reversed — for example you deleted data — use migrations.RunPython.noop to explicitly mark it as irreversible:

operations = [
    migrations.RunPython(populate_slugs, migrations.RunPython.noop),
]

Using noop instead of leaving the backwards argument empty makes your intent clear — you are saying "this migration cannot be reversed" rather than "I forgot to write the backwards function."

8. Performance tips for large tables

Iterating over every record one by one is slow on large tables. Use update() for bulk changes and iterator() for large result sets to avoid loading everything into memory:

def populate_slugs(apps, schema_editor):
    Article = apps.get_model('pages', 'Article')

    # Bulk update where possible
    Article.objects.filter(status='').update(status='draft')

    # Use iterator() for large querysets to avoid memory issues
    for article in Article.objects.filter(slug='').iterator(chunk_size=500):
        article.slug = slugify(article.title)
        article.save()

Tip: Data migrations run inside a transaction by default. If the migration fails halfway through, the database rolls back to its state before the migration started. This is safe, but on very large tables the transaction can hold locks for a long time. For tables with millions of rows, consider batching the migration or running it outside of the migration system as a management command.

9. Next steps

That completes the Models and Database section. You now have a thorough understanding of how to define models, choose the right fields, set up relationships, enforce constraints, and manage schema and data changes through migrations. The next section covers the ORM and QuerySets — how to query and manipulate your data efficiently.

Data Migrations With Runpython