Migrating from MySQL to PostgreSQL

Migrating from MySQL to PostgreSQL is a significant undertaking that requires careful planning and execution. This comprehensive guide will walk you through the entire migration process, from initial preparation to final deployment, ensuring a smooth transition between these database systems.

Contents

Understanding the Migration Journey

Before diving into the technical details, it’s important to understand that migrating between database systems is more than just moving data. It involves transforming data types, adjusting SQL syntax, and often rethinking certain database design decisions. Think of it as translating between two languages that share some common words but have different grammar rules and idioms.

Preparation Phase

Assessment and Planning

The first step in any successful migration is a thorough assessment of your current MySQL database. This helps identify potential challenges and create a detailed migration strategy. Let’s create a simple shell script to gather information about your MySQL database:

#!/bin/bash

# Save MySQL database structure
mysqldump -u your_user -p --no-data your_database > schema.sql

# Get table sizes and row counts
mysql -u your_user -p -e "
    SELECT 
        table_name,
        table_rows,
        data_length/1024/1024 as data_size_mb,
        index_length/1024/1024 as index_size_mb
    FROM information_schema.tables 
    WHERE table_schema = 'your_database'
    ORDER BY data_length DESC;" > database_stats.txt

# List all triggers, views, and stored procedures
mysql -u your_user -p -e "
    SELECT TRIGGER_NAME, EVENT_MANIPULATION, EVENT_OBJECT_TABLE
    FROM information_schema.TRIGGERS 
    WHERE TRIGGER_SCHEMA = 'your_database';" > triggers.txt

mysql -u your_user -p -e "
    SHOW FULL TABLES 
    WHERE TABLE_TYPE LIKE 'VIEW';" > views.txt

mysql -u your_user -p -e "
    SHOW PROCEDURE STATUS 
    WHERE Db = 'your_database';" > procedures.txt

Setting Up the PostgreSQL Environment

Before beginning the migration, ensure PostgreSQL is properly configured. Here’s a basic postgresql.conf configuration that works well for most migrations:

# Memory Configuration
shared_buffers = 2GB                  # 25% of available RAM
work_mem = 32MB                       # Helps with complex sorts
maintenance_work_mem = 256MB          # Helps with index creation

# Write Ahead Log
wal_level = replica                   # Enables WAL archiving
max_wal_size = 1GB                    # Maximum WAL size before checkpoint
min_wal_size = 80MB                   # Minimum WAL size

# Query Planner
random_page_cost = 1.1                # Assumes SSD storage
effective_cache_size = 6GB            # 75% of available RAM

# Parallel Query
max_parallel_workers_per_gather = 4    # Depends on CPU cores
max_parallel_workers = 8               # Maximum parallel workers

Data Type Mapping

One of the most crucial aspects of migration is mapping MySQL data types to their PostgreSQL equivalents. Here’s a comprehensive mapping tool you can use:

def get_postgresql_type(mysql_type, column_name, is_primary_key=False):
    """
    Maps MySQL data types to PostgreSQL equivalents.
    """
    # Remove size specifications from integer types
    mysql_type = re.sub(r'int\(\d+\)', 'int', mysql_type.lower())

    type_mapping = {
        # Numeric types
        'tinyint': 'smallint',
        'smallint': 'smallint',
        'mediumint': 'integer',
        'int': 'integer',
        'bigint': 'bigint',
        'float': 'real',
        'double': 'double precision',
        'decimal': 'decimal',

        # String types
        'char': 'char',
        'varchar': 'varchar',
        'tinytext': 'text',
        'text': 'text',
        'mediumtext': 'text',
        'longtext': 'text',

        # Binary types
        'binary': 'bytea',
        'varbinary': 'bytea',
        'tinyblob': 'bytea',
        'blob': 'bytea',
        'mediumblob': 'bytea',
        'longblob': 'bytea',

        # Date/Time types
        'date': 'date',
        'datetime': 'timestamp',
        'timestamp': 'timestamp',
        'time': 'time',
        'year': 'integer',

        # Special types
        'enum': 'text',  # Consider creating a CHECK constraint
        'set': 'text[]',
        'json': 'jsonb',
    }

    # Handle auto_increment for primary keys
    if is_primary_key and 'int' in mysql_type:
        return 'SERIAL'

    # Extract base type without size specification
    base_type = mysql_type.split('(')[0].lower()

    return type_mapping.get(base_type, 'text')

Migration Process

Step 1: Schema Migration

First, we’ll create a script to convert MySQL schema to PostgreSQL:

import re

def convert_mysql_schema(mysql_schema):
    """
    Converts MySQL CREATE TABLE statements to PostgreSQL syntax.
    """
    # Replace backticks with double quotes
    schema = re.sub(r'`([^`]*)`', r'"\1"', mysql_schema)

    # Replace auto_increment with SERIAL
    schema = re.sub(r'AUTO_INCREMENT', 'SERIAL', schema, flags=re.IGNORECASE)

    # Replace ENGINE=InnoDB and similar
    schema = re.sub(r'\s*ENGINE\s*=\s*\w+', '', schema)

    # Replace UNSIGNED with CHECK constraints
    def replace_unsigned(match):
        column_name = match.group(1)
        return f'{column_name} CHECK ("{column_name}" >= 0)'

    schema = re.sub(r'(\w+)\s+\w+\s+UNSIGNED', replace_unsigned, schema)

    return schema

Step 2: Data Migration

Here’s a robust Python script for migrating data in chunks:

import psycopg2
import mysql.connector
from datetime import datetime

def migrate_table(mysql_conn, pg_conn, table_name, batch_size=10000):
    """
    Migrates data from MySQL to PostgreSQL in batches.
    """
    mysql_cursor = mysql_conn.cursor(dictionary=True)
    pg_cursor = pg_conn.cursor()

    # Get total rows
    mysql_cursor.execute(f"SELECT COUNT(*) as count FROM {table_name}")
    total_rows = mysql_cursor.fetchone()['count']

    # Migrate in batches
    for offset in range(0, total_rows, batch_size):
        mysql_cursor.execute(
            f"SELECT * FROM {table_name} LIMIT {batch_size} OFFSET {offset}"
        )
        rows = mysql_cursor.fetchall()

        if not rows:
            break

        # Prepare batch insert query
        columns = rows[0].keys()
        placeholders = ','.join(['%s'] * len(columns))
        insert_query = f"""
            INSERT INTO "{table_name}" ({','.join(f'"{col}"' for col in columns)})
            VALUES ({placeholders})
        """

        # Convert MySQL-specific values
        converted_rows = []
        for row in rows:
            converted_row = []
            for value in row.values():
                if isinstance(value, datetime):
                    value = value.replace(microsecond=0)
                converted_row.append(value)
            converted_rows.append(converted_row)

        # Execute batch insert
        pg_cursor.executemany(insert_query, converted_rows)
        pg_conn.commit()

        print(f"Migrated {offset + len(rows)} of {total_rows} rows in {table_name}")

Step 3: Constraints and Indexes

After data migration, we need to recreate constraints and indexes:

-- First, disable triggers
ALTER TABLE your_table DISABLE TRIGGER ALL;

-- Add foreign key constraints
ALTER TABLE orders 
ADD CONSTRAINT fk_customer
FOREIGN KEY (customer_id) REFERENCES customers(id);

-- Add unique constraints
ALTER TABLE users 
ADD CONSTRAINT unique_email UNIQUE (email);

-- Create indexes
CREATE INDEX idx_order_date ON orders USING btree (order_date);
CREATE INDEX idx_customer_name ON customers USING btree (last_name, first_name);

-- Re-enable triggers
ALTER TABLE your_table ENABLE TRIGGER ALL;

Handling Special Cases

Converting MySQL-specific Features

ENUM Types

Instead of MySQL’s ENUM, create a custom type in PostgreSQL:

-- Create a type for status
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'completed', 'cancelled');

-- Convert existing column
ALTER TABLE orders 
    ALTER COLUMN status TYPE order_status 
    USING status::order_status;

Group Concatenation

Replace MySQL’s GROUP_CONCAT with PostgreSQL’s STRING_AGG:

-- MySQL version:
SELECT customer_id, GROUP_CONCAT(product_name) 
FROM orders 
GROUP BY customer_id;

-- PostgreSQL version:
SELECT customer_id, STRING_AGG(product_name, ',') 
FROM orders 
GROUP BY customer_id;

Handling JSON Data

PostgreSQL’s JSONB type offers better performance than MySQL’s JSON:

-- Convert JSON column to JSONB
ALTER TABLE products 
    ALTER COLUMN metadata TYPE jsonb 
    USING metadata::jsonb;

-- Update JSON querying syntax
-- MySQL: JSON_EXTRACT(data, '$.name')
-- PostgreSQL: data->>'name'
UPDATE your_queries SET 
    query_text = REPLACE(
        query_text, 
        'JSON_EXTRACT(', 
        '->'
    );

Post-Migration Tasks

Performance Optimization

After migration, optimize your PostgreSQL database:

-- Update table statistics
ANALYZE verbose your_table;

-- Rebuild indexes
REINDEX TABLE your_table;

-- Vacuum to reclaim space and update statistics
VACUUM ANALYZE your_table;

-- Create appropriate indexes based on query patterns
CREATE INDEX CONCURRENTLY idx_your_column 
ON your_table (your_column);

Application Changes

Update your application code to use PostgreSQL-specific features:

# MySQL datetime handling
cursor.execute("SELECT DATE_FORMAT(created_at, '%Y-%m-%d') FROM orders")

# PostgreSQL datetime handling
cursor.execute("SELECT to_char(created_at, 'YYYY-MM-DD') FROM orders")

# MySQL limit with offset
cursor.execute("SELECT * FROM products LIMIT 10 OFFSET 20")

# PostgreSQL limit with offset (same syntax, different implementation)
cursor.execute("SELECT * FROM products LIMIT 10 OFFSET 20")

Monitoring and Verification

Create verification queries to ensure data integrity:

-- Compare row counts
SELECT 'mysql' as source, table_name, table_rows 
FROM information_schema.tables 
WHERE table_schema = 'your_database'
UNION ALL
SELECT 'postgresql' as source, table_name, n_live_tup 
FROM pg_stat_user_tables;

-- Compare data checksums
-- MySQL
SELECT MD5(GROUP_CONCAT(column_list)) 
FROM your_table;

-- PostgreSQL
SELECT MD5(STRING_AGG(column_list::text, '')) 
FROM your_table;

Conclusion

Migrating from MySQL to PostgreSQL requires careful planning and execution, but the benefits often outweigh the challenges. This guide covered the essential aspects of migration, but remember that each database is unique and may require specific adjustments.

Keep these key points in mind:

Always have a backup before starting the migration
Test the migration process thoroughly in a staging environment
Plan for downtime during the final migration
Monitor performance after migration and optimize as needed
Update application code to take advantage of PostgreSQL-specific features

The migration process might seem daunting, but by following this guide and testing thoroughly, you can ensure a successful transition to PostgreSQL.