Migrating from MySQL to PostgreSQL is a significant undertaking that requires careful planning and execution. This comprehensive guide will walk you through the entire migration process, from initial preparation to final deployment, ensuring a smooth transition between these database systems.
Contents
Understanding the Migration Journey
Before diving into the technical details, it’s important to understand that migrating between database systems is more than just moving data. It involves transforming data types, adjusting SQL syntax, and often rethinking certain database design decisions. Think of it as translating between two languages that share some common words but have different grammar rules and idioms.
Preparation Phase
Assessment and Planning
The first step in any successful migration is a thorough assessment of your current MySQL database. This helps identify potential challenges and create a detailed migration strategy. Let’s create a simple shell script to gather information about your MySQL database:
#!/bin/bash
# Save MySQL database structure
mysqldump -u your_user -p --no-data your_database > schema.sql
# Get table sizes and row counts
mysql -u your_user -p -e "
SELECT
table_name,
table_rows,
data_length/1024/1024 as data_size_mb,
index_length/1024/1024 as index_size_mb
FROM information_schema.tables
WHERE table_schema = 'your_database'
ORDER BY data_length DESC;" > database_stats.txt
# List all triggers, views, and stored procedures
mysql -u your_user -p -e "
SELECT TRIGGER_NAME, EVENT_MANIPULATION, EVENT_OBJECT_TABLE
FROM information_schema.TRIGGERS
WHERE TRIGGER_SCHEMA = 'your_database';" > triggers.txt
mysql -u your_user -p -e "
SHOW FULL TABLES
WHERE TABLE_TYPE LIKE 'VIEW';" > views.txt
mysql -u your_user -p -e "
SHOW PROCEDURE STATUS
WHERE Db = 'your_database';" > procedures.txt
Setting Up the PostgreSQL Environment
Before beginning the migration, ensure PostgreSQL is properly configured. Here’s a basic postgresql.conf configuration that works well for most migrations:
# Memory Configuration
shared_buffers = 2GB # 25% of available RAM
work_mem = 32MB # Helps with complex sorts
maintenance_work_mem = 256MB # Helps with index creation
# Write Ahead Log
wal_level = replica # Enables WAL archiving
max_wal_size = 1GB # Maximum WAL size before checkpoint
min_wal_size = 80MB # Minimum WAL size
# Query Planner
random_page_cost = 1.1 # Assumes SSD storage
effective_cache_size = 6GB # 75% of available RAM
# Parallel Query
max_parallel_workers_per_gather = 4 # Depends on CPU cores
max_parallel_workers = 8 # Maximum parallel workers
Data Type Mapping
One of the most crucial aspects of migration is mapping MySQL data types to their PostgreSQL equivalents. Here’s a comprehensive mapping tool you can use:
def get_postgresql_type(mysql_type, column_name, is_primary_key=False):
"""
Maps MySQL data types to PostgreSQL equivalents.
"""
# Remove size specifications from integer types
mysql_type = re.sub(r'int\(\d+\)', 'int', mysql_type.lower())
type_mapping = {
# Numeric types
'tinyint': 'smallint',
'smallint': 'smallint',
'mediumint': 'integer',
'int': 'integer',
'bigint': 'bigint',
'float': 'real',
'double': 'double precision',
'decimal': 'decimal',
# String types
'char': 'char',
'varchar': 'varchar',
'tinytext': 'text',
'text': 'text',
'mediumtext': 'text',
'longtext': 'text',
# Binary types
'binary': 'bytea',
'varbinary': 'bytea',
'tinyblob': 'bytea',
'blob': 'bytea',
'mediumblob': 'bytea',
'longblob': 'bytea',
# Date/Time types
'date': 'date',
'datetime': 'timestamp',
'timestamp': 'timestamp',
'time': 'time',
'year': 'integer',
# Special types
'enum': 'text', # Consider creating a CHECK constraint
'set': 'text[]',
'json': 'jsonb',
}
# Handle auto_increment for primary keys
if is_primary_key and 'int' in mysql_type:
return 'SERIAL'
# Extract base type without size specification
base_type = mysql_type.split('(')[0].lower()
return type_mapping.get(base_type, 'text')
Migration Process
Step 1: Schema Migration
First, we’ll create a script to convert MySQL schema to PostgreSQL:
import re
def convert_mysql_schema(mysql_schema):
"""
Converts MySQL CREATE TABLE statements to PostgreSQL syntax.
"""
# Replace backticks with double quotes
schema = re.sub(r'`([^`]*)`', r'"\1"', mysql_schema)
# Replace auto_increment with SERIAL
schema = re.sub(r'AUTO_INCREMENT', 'SERIAL', schema, flags=re.IGNORECASE)
# Replace ENGINE=InnoDB and similar
schema = re.sub(r'\s*ENGINE\s*=\s*\w+', '', schema)
# Replace UNSIGNED with CHECK constraints
def replace_unsigned(match):
column_name = match.group(1)
return f'{column_name} CHECK ("{column_name}" >= 0)'
schema = re.sub(r'(\w+)\s+\w+\s+UNSIGNED', replace_unsigned, schema)
return schema
Step 2: Data Migration
Here’s a robust Python script for migrating data in chunks:
import psycopg2
import mysql.connector
from datetime import datetime
def migrate_table(mysql_conn, pg_conn, table_name, batch_size=10000):
"""
Migrates data from MySQL to PostgreSQL in batches.
"""
mysql_cursor = mysql_conn.cursor(dictionary=True)
pg_cursor = pg_conn.cursor()
# Get total rows
mysql_cursor.execute(f"SELECT COUNT(*) as count FROM {table_name}")
total_rows = mysql_cursor.fetchone()['count']
# Migrate in batches
for offset in range(0, total_rows, batch_size):
mysql_cursor.execute(
f"SELECT * FROM {table_name} LIMIT {batch_size} OFFSET {offset}"
)
rows = mysql_cursor.fetchall()
if not rows:
break
# Prepare batch insert query
columns = rows[0].keys()
placeholders = ','.join(['%s'] * len(columns))
insert_query = f"""
INSERT INTO "{table_name}" ({','.join(f'"{col}"' for col in columns)})
VALUES ({placeholders})
"""
# Convert MySQL-specific values
converted_rows = []
for row in rows:
converted_row = []
for value in row.values():
if isinstance(value, datetime):
value = value.replace(microsecond=0)
converted_row.append(value)
converted_rows.append(converted_row)
# Execute batch insert
pg_cursor.executemany(insert_query, converted_rows)
pg_conn.commit()
print(f"Migrated {offset + len(rows)} of {total_rows} rows in {table_name}")
Step 3: Constraints and Indexes
After data migration, we need to recreate constraints and indexes:
-- First, disable triggers
ALTER TABLE your_table DISABLE TRIGGER ALL;
-- Add foreign key constraints
ALTER TABLE orders
ADD CONSTRAINT fk_customer
FOREIGN KEY (customer_id) REFERENCES customers(id);
-- Add unique constraints
ALTER TABLE users
ADD CONSTRAINT unique_email UNIQUE (email);
-- Create indexes
CREATE INDEX idx_order_date ON orders USING btree (order_date);
CREATE INDEX idx_customer_name ON customers USING btree (last_name, first_name);
-- Re-enable triggers
ALTER TABLE your_table ENABLE TRIGGER ALL;
Handling Special Cases
Converting MySQL-specific Features
ENUM Types
Instead of MySQL’s ENUM, create a custom type in PostgreSQL:
-- Create a type for status
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'completed', 'cancelled');
-- Convert existing column
ALTER TABLE orders
ALTER COLUMN status TYPE order_status
USING status::order_status;
Group Concatenation
Replace MySQL’s GROUP_CONCAT with PostgreSQL’s STRING_AGG:
-- MySQL version:
SELECT customer_id, GROUP_CONCAT(product_name)
FROM orders
GROUP BY customer_id;
-- PostgreSQL version:
SELECT customer_id, STRING_AGG(product_name, ',')
FROM orders
GROUP BY customer_id;
Handling JSON Data
PostgreSQL’s JSONB type offers better performance than MySQL’s JSON:
-- Convert JSON column to JSONB
ALTER TABLE products
ALTER COLUMN metadata TYPE jsonb
USING metadata::jsonb;
-- Update JSON querying syntax
-- MySQL: JSON_EXTRACT(data, '$.name')
-- PostgreSQL: data->>'name'
UPDATE your_queries SET
query_text = REPLACE(
query_text,
'JSON_EXTRACT(',
'->'
);
Post-Migration Tasks
Performance Optimization
After migration, optimize your PostgreSQL database:
-- Update table statistics
ANALYZE verbose your_table;
-- Rebuild indexes
REINDEX TABLE your_table;
-- Vacuum to reclaim space and update statistics
VACUUM ANALYZE your_table;
-- Create appropriate indexes based on query patterns
CREATE INDEX CONCURRENTLY idx_your_column
ON your_table (your_column);
Application Changes
Update your application code to use PostgreSQL-specific features:
# MySQL datetime handling
cursor.execute("SELECT DATE_FORMAT(created_at, '%Y-%m-%d') FROM orders")
# PostgreSQL datetime handling
cursor.execute("SELECT to_char(created_at, 'YYYY-MM-DD') FROM orders")
# MySQL limit with offset
cursor.execute("SELECT * FROM products LIMIT 10 OFFSET 20")
# PostgreSQL limit with offset (same syntax, different implementation)
cursor.execute("SELECT * FROM products LIMIT 10 OFFSET 20")
Monitoring and Verification
Create verification queries to ensure data integrity:
-- Compare row counts
SELECT 'mysql' as source, table_name, table_rows
FROM information_schema.tables
WHERE table_schema = 'your_database'
UNION ALL
SELECT 'postgresql' as source, table_name, n_live_tup
FROM pg_stat_user_tables;
-- Compare data checksums
-- MySQL
SELECT MD5(GROUP_CONCAT(column_list))
FROM your_table;
-- PostgreSQL
SELECT MD5(STRING_AGG(column_list::text, ''))
FROM your_table;
Conclusion
Migrating from MySQL to PostgreSQL requires careful planning and execution, but the benefits often outweigh the challenges. This guide covered the essential aspects of migration, but remember that each database is unique and may require specific adjustments.
Keep these key points in mind:
- Always have a backup before starting the migration
- Test the migration process thoroughly in a staging environment
- Plan for downtime during the final migration
- Monitor performance after migration and optimize as needed
- Update application code to take advantage of PostgreSQL-specific features
The migration process might seem daunting, but by following this guide and testing thoroughly, you can ensure a successful transition to PostgreSQL.