Data Migrations With Runpython
Write data migrations to backfill or transform records using RunPython and RunSQL. Handle forwards and backwards operations safely.
1. Introduction
Schema migrations create, alter, or drop tables and columns. Data migrations do something different — they transform or backfill existing data as part of the migration process. You use them when adding a new field that needs values populated from existing data, or when renaming or restructuring how data is stored.
Django provides RunPython to run arbitrary Python code inside a migration. This guide covers how to write safe, reversible data migrations with real examples.
- You should already understand how migrations work. If not, see Migrations Basics.
- Your
.venvmust be active and Django 5.2 installed.
2. When to use data migrations
Common situations where you need a data migration:
- You add a new required field and need to populate it with a value derived from existing data.
- You split one field into two and need to move the data across.
- You change how a value is stored — for example converting all status values from integers to strings.
- You need to clean up bad data that was stored before a validator was added.
3. RunPython basics
RunPython takes two arguments: a forwards function that runs when the migration is applied, and an optional backwards function that runs when the migration is reversed.
from django.db import migrations
def forwards(apps, schema_editor):
# Your data transformation logic here
pass
def backwards(apps, schema_editor):
# How to undo the transformation
pass
class Migration(migrations.Migration):
dependencies = [
('pages', '0002_article_slug'),
]
operations = [
migrations.RunPython(forwards, backwards),
]
The two arguments passed to your functions are:
apps— a registry of historical model versions. Always use this to get models inside a data migration — not the real model class.schema_editor— the database connection handler. You rarely use this directly in data migrations.
4. Real example — backfilling a slug field
Suppose you added a slug field to Article but existing records have no slug. You need to generate slugs from titles for all existing articles.
Step 1 — add the field in a schema migration
# pages/models.py
slug = models.SlugField(max_length=220, unique=True, blank=True)
python manage.py makemigrations pages
# Creates 0003_article_slug.py
Step 2 — create the data migration
python manage.py makemigrations pages --empty --name populate_article_slugs
# Creates 0004_populate_article_slugs.py
Step 3 — write the RunPython function
# pages/migrations/0004_populate_article_slugs.py
from django.db import migrations
from django.utils.text import slugify
def populate_slugs(apps, schema_editor):
Article = apps.get_model('pages', 'Article')
for article in Article.objects.filter(slug=''):
base_slug = slugify(article.title)
slug = base_slug
counter = 1
# Handle duplicates
while Article.objects.filter(slug=slug).exists():
slug = f'{base_slug}-{counter}'
counter += 1
article.slug = slug
article.save()
def reverse_populate_slugs(apps, schema_editor):
Article = apps.get_model('pages', 'Article')
Article.objects.all().update(slug='')
class Migration(migrations.Migration):
dependencies = [
('pages', '0003_article_slug'),
]
operations = [
migrations.RunPython(populate_slugs, reverse_populate_slugs),
]
python manage.py migrate pages
# Applies the schema migration then the data migration
5. Always use historical models
Inside a data migration, always get the model through apps.get_model() — never import it directly from your app:
# Correct — uses the historical model state at migration time
Article = apps.get_model('pages', 'Article')
# Wrong — imports the current model, which may have changed since this migration was written
from pages.models import Article # never do this in migrations
Django stores a snapshot of each model at the time the migration was written. If you import the live model and it has changed since the migration was created, the migration will break — or worse, silently do the wrong thing.
6. RunSQL — when raw SQL is cleaner
For simple transformations, raw SQL is sometimes cleaner and faster than iterating with Python. Use RunSQL for these cases:
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('pages', '0003_article_slug'),
]
operations = [
migrations.RunSQL(
sql="UPDATE pages_article SET status = 'published' WHERE is_published = TRUE;",
reverse_sql="UPDATE pages_article SET is_published = TRUE WHERE status = 'published';",
)
]
RunSQL is faster than RunPython for bulk updates because it runs a single SQL statement instead of loading records into Python one by one. Use it when the transformation can be expressed cleanly in SQL.
7. Making migrations reversible
Always provide a backwards function if the transformation can be reversed. If it cannot be reversed — for example you deleted data — use migrations.RunPython.noop to explicitly mark it as irreversible:
operations = [
migrations.RunPython(populate_slugs, migrations.RunPython.noop),
]
Using noop instead of leaving the backwards argument empty makes your intent clear — you are saying "this migration cannot be reversed" rather than "I forgot to write the backwards function."
8. Performance tips for large tables
Iterating over every record one by one is slow on large tables. Use update() for bulk changes and iterator() for large result sets to avoid loading everything into memory:
def populate_slugs(apps, schema_editor):
Article = apps.get_model('pages', 'Article')
# Bulk update where possible
Article.objects.filter(status='').update(status='draft')
# Use iterator() for large querysets to avoid memory issues
for article in Article.objects.filter(slug='').iterator(chunk_size=500):
article.slug = slugify(article.title)
article.save()
9. Next steps
That completes the Models and Database section. You now have a thorough understanding of how to define models, choose the right fields, set up relationships, enforce constraints, and manage schema and data changes through migrations. The next section covers the ORM and QuerySets — how to query and manipulate your data efficiently.