Harith Zahid

Database Normalization & Design Fundamentals

A beginner-friendly guide to designing clean, efficient database schemas that scale.


Table of Contents

  1. What is Normalization?
  2. From Messy to Clean Design
  3. Many-to-Many Relationships
  4. Foreign Keys & Referential Integrity
  5. When to Break the Rules
  6. Real-World Example
  7. Quick Reference

1. What is Normalization?

Normalization is the process of organizing your database to reduce data redundancy and improve data integrity. Think of it as decluttering your database.

Why Normalize?

Without normalization:

  • 😱 Same data stored in multiple places (waste of space)
  • 🐛 Update one place, forget another (data inconsistency)
  • 🤔 Hard to add new data without existing data
  • 🗑️ Deleting one thing accidentally deletes other things

With normalization:

  • ✅ Each piece of data stored once
  • ✅ Easy to update (change in one place)
  • ✅ No weird data anomalies
  • ✅ Clear relationships between data

The Three Levels

We'll focus on getting to Third Normal Form (3NF) - the sweet spot for most applications:

  1. First Normal Form (1NF) - No repeating groups
  2. Second Normal Form (2NF) - Separate different types of data
  3. Third Normal Form (3NF) - Remove indirect relationships

2. From Messy to Clean Design

Let's design an order system step-by-step.

❌ LEVEL 0: Unnormalized (The Mess)

Everything crammed into one table:

// DON'T DO THIS!
exports.up = function(knex) {
  return knex.schema.createTable('orders_messy', (table) => {
    table.increments('id');
    table.string('customer_name');
    table.string('customer_email');
    table.string('customer_phone');
    table.string('customer_address');
    // Repeating groups - can only store 3 products!
    table.string('product1_name');
    table.decimal('product1_price');
    table.integer('product1_quantity');
    table.string('product2_name');
    table.decimal('product2_price');
    table.integer('product2_quantity');
    table.string('product3_name');
    table.decimal('product3_price');
    table.integer('product3_quantity');
  });
};

Problems:

  • Limited to 3 products per order
  • Lots of empty columns if order has fewer products
  • Customer info duplicated for every order
  • Can't add a 4th product without changing table structure

✅ LEVEL 1: First Normal Form (1NF)

Rule: Each column has one value, no repeating groups.

// Better: Unlimited products per order
exports.up = function(knex) {
  return knex.schema.createTable('orders_1nf', (table) => {
    table.increments('order_id');
    table.string('customer_name');
    table.string('customer_email');
    table.string('customer_phone');
    table.string('customer_address');
    table.string('product_name');
    table.decimal('product_price', 10, 2);
    table.integer('quantity');
    table.timestamp('order_date').defaultTo(knex.fn.now());
  });
};

// Example data:
// order_id | customer_name | customer_email    | product_name | price  | qty
// 1        | John Doe      | john@example.com  | Laptop       | 999.99 | 1
// 1        | John Doe      | john@example.com  | Mouse        | 25.99  | 2
//          ^ Same customer info repeated for each product - still wasteful!

Fixed: No more repeating columns
Problem: Customer info still duplicates

✅ LEVEL 2: Second Normal Form (2NF)

Rule: Separate different types of data into their own tables.

// Split customers and orders
exports.up = function(knex) {
  return knex.schema
    .createTable('customers', (table) => {
      table.increments('customer_id').primary();
      table.string('name').notNullable();
      table.string('email').notNullable().unique();
      table.string('phone');
      table.string('address');
      table.timestamps(true, true);
    })
    .createTable('orders_2nf', (table) => {
      table.increments('order_id').primary();
      table.integer('customer_id').unsigned().notNullable();
      table.string('product_name');
      table.decimal('product_price', 10, 2);
      table.integer('quantity');
      table.timestamp('order_date').defaultTo(knex.fn.now());
      
      // Link to customer table
      table.foreign('customer_id')
        .references('customer_id')
        .inTable('customers');
    });
};

// Example data:
// customers:
// customer_id | name     | email
// 1           | John Doe | john@example.com
//
// orders_2nf:
// order_id | customer_id | product_name | price  | qty
// 1        | 1           | Laptop       | 999.99 | 1
// 1        | 1           | Mouse        | 25.99  | 2
//          ^ Just references customer ID - much better!

Fixed: Customer info stored once
Problem: Product name and price are related (if price changes, should update everywhere)

✅ LEVEL 3: Third Normal Form (3NF) - The Goal!

Rule: Each table represents ONE thing, no indirect relationships.

// Perfect: Customers, Products, Orders, and Order Items
exports.up = function(knex) {
  return knex.schema
    // Table 1: Customers
    .createTable('customers', (table) => {
      table.increments('customer_id').primary();
      table.string('name').notNullable();
      table.string('email').notNullable().unique();
      table.string('phone');
      table.string('address');
      table.timestamps(true, true);
    })
    // Table 2: Products
    .createTable('products', (table) => {
      table.increments('product_id').primary();
      table.string('name').notNullable();
      table.text('description');
      table.decimal('price', 10, 2).notNullable();
      table.integer('stock_quantity').defaultTo(0);
      table.timestamps(true, true);
    })
    // Table 3: Orders
    .createTable('orders', (table) => {
      table.increments('order_id').primary();
      table.integer('customer_id').unsigned().notNullable();
      table.decimal('total_amount', 10, 2).notNullable();
      table.enum('status', ['pending', 'processing', 'shipped', 'delivered'])
        .defaultTo('pending');
      table.timestamp('order_date').defaultTo(knex.fn.now());
      
      table.foreign('customer_id')
        .references('customer_id')
        .inTable('customers');
    })
    // Table 4: Order Items (connects orders and products)
    .createTable('order_items', (table) => {
      table.increments('order_item_id').primary();
      table.integer('order_id').unsigned().notNullable();
      table.integer('product_id').unsigned().notNullable();
      table.integer('quantity').notNullable();
      table.decimal('price_at_purchase', 10, 2).notNullable(); // Historical price
      
      table.foreign('order_id')
        .references('order_id')
        .inTable('orders')
        .onDelete('CASCADE'); // Delete items when order is deleted
      
      table.foreign('product_id')
        .references('product_id')
        .inTable('products');
      
      // Prevent same product twice in one order
      table.unique(['order_id', 'product_id']);
    });
};

✅ Perfect! Each table represents one thing:

  • customers - Customer information
  • products - Product catalog
  • orders - Order headers
  • order_items - What's in each order

Key Design Decisions:

  1. price_at_purchase - Stores price snapshot when ordered
    • Why? If product price changes tomorrow, old orders stay correct
  2. total_amount in orders - Denormalized for convenience
    • Could calculate from order_items, but commonly needed
  3. CASCADE delete - When order is deleted, its items delete too
    • Makes sense: order items can't exist without an order

3. Many-to-Many Relationships

Sometimes things have many-to-many relationships. Example: Products can have multiple categories, and categories can have multiple products.

Junction Tables

Use a "junction table" (also called "join table" or "bridge table") to connect them:

exports.up = function(knex) {
  return knex.schema
    // Categories table
    .createTable('categories', (table) => {
      table.increments('category_id').primary();
      table.string('name').notNullable().unique();
      table.text('description');
    })
    // Junction table (connects products and categories)
    .createTable('product_categories', (table) => {
      table.integer('product_id').unsigned().notNullable();
      table.integer('category_id').unsigned().notNullable();
      
      // Composite primary key
      table.primary(['product_id', 'category_id']);
      
      table.foreign('product_id')
        .references('product_id')
        .inTable('products')
        .onDelete('CASCADE');
      
      table.foreign('category_id')
        .references('category_id')
        .inTable('categories')
        .onDelete('CASCADE');
    });
};

// Usage
async function assignCategories() {
  // Insert product
  const [laptopId] = await knex('products').insert({
    name: 'Gaming Laptop',
    price: 1499.99
  });
  
  // Insert categories
  const [electronicsId] = await knex('categories').insert({ name: 'Electronics' });
  const [gamingId] = await knex('categories').insert({ name: 'Gaming' });
  
  // Connect product to multiple categories
  await knex('product_categories').insert([
    { product_id: laptopId, category_id: electronicsId },
    { product_id: laptopId, category_id: gamingId }
  ]);
}

// Query products with their categories
async function getProductsWithCategories() {
  return knex('products as p')
    .select('p.name as product', 'c.name as category')
    .leftJoin('product_categories as pc', 'p.product_id', 'pc.product_id')
    .leftJoin('categories as c', 'pc.category_id', 'c.category_id');
}

Pattern: Product ↔ ProductCategories ↔ Category


4. Foreign Keys & Referential Integrity

Foreign keys enforce relationships between tables - they're your safety net.

Basic Foreign Key

table.foreign('customer_id')
  .references('customer_id')
  .inTable('customers');

This ensures every customer_id in orders actually exists in the customers table!

Cascade Options

What happens when you delete a customer who has orders?

// Option 1: RESTRICT (default) - Prevent deletion
table.foreign('customer_id')
  .references('customer_id')
  .inTable('customers')
  .onDelete('RESTRICT'); // Error if customer has orders

// Option 2: CASCADE - Delete related records too
table.foreign('order_id')
  .references('order_id')
  .inTable('orders')
  .onDelete('CASCADE'); // Delete order items when order is deleted

// Option 3: SET NULL - Set to null
table.foreign('manager_id')
  .references('employee_id')
  .inTable('employees')
  .onDelete('SET NULL'); // Remove manager reference when manager is deleted

When to use each:

Action When to Use
RESTRICT Can't delete customers with orders (preserves history)
CASCADE Delete order items when order is deleted (dependent data)
SET NULL Optional relationships (manager gets deleted, employee stays)

5. When to Break the Rules

Sometimes normalized design is TOO normalized. Here's when to denormalize:

Denormalization for Performance

// Add frequently accessed data to avoid JOINs
exports.up = function(knex) {
  return knex.schema.table('orders', (table) => {
    // Store customer name in orders (denormalized)
    table.string('customer_name');
    table.string('customer_email');
    // Still have customer_id foreign key for full details
  });
};

When to denormalize:

  • ✅ Read-heavy data (shown often, rarely changes)
  • ✅ Complex JOINs causing performance issues
  • ✅ Historical snapshots (like price_at_purchase)

When NOT to denormalize:

  • ❌ Data changes frequently
  • ❌ You're just starting out (normalize first!)
  • ❌ Queries are already fast enough

Common Denormalization Patterns

// Pattern 1: Store counts
table.integer('items_count').defaultTo(0); // Avoid COUNT(*) query

// Pattern 2: Store aggregates
table.decimal('total_spent', 12, 2).defaultTo(0); // Avoid SUM() query

// Pattern 3: Store historical snapshots
table.decimal('price_at_purchase', 10, 2); // Price when ordered, not current price

// Pattern 4: Store commonly JOINed data
table.string('customer_name'); // Avoid JOIN for display name

Remember: Denormalized data must be kept in sync! Use triggers or application logic.


6. Real-World Example

Complete e-commerce schema in 3NF:

exports.up = function(knex) {
  return knex.schema
    // Core entities
    .createTable('customers', (table) => {
      table.increments('customer_id').primary();
      table.string('name').notNullable();
      table.string('email').notNullable().unique();
      table.string('phone');
      table.string('address');
      table.timestamps(true, true);
    })
    .createTable('products', (table) => {
      table.increments('product_id').primary();
      table.string('name').notNullable();
      table.text('description');
      table.decimal('price', 10, 2).notNullable();
      table.integer('stock_quantity').defaultTo(0);
      table.timestamps(true, true);
    })
    .createTable('categories', (table) => {
      table.increments('category_id').primary();
      table.string('name').notNullable().unique();
      table.text('description');
    })
    // Relationships
    .createTable('product_categories', (table) => {
      table.integer('product_id').unsigned().notNullable();
      table.integer('category_id').unsigned().notNullable();
      table.primary(['product_id', 'category_id']);
      
      table.foreign('product_id')
        .references('product_id')
        .inTable('products')
        .onDelete('CASCADE');
      table.foreign('category_id')
        .references('category_id')
        .inTable('categories')
        .onDelete('CASCADE');
    })
    .createTable('orders', (table) => {
      table.increments('order_id').primary();
      table.integer('customer_id').unsigned().notNullable();
      table.decimal('total_amount', 10, 2).notNullable();
      table.enum('status', ['pending', 'processing', 'shipped', 'delivered'])
        .defaultTo('pending');
      table.timestamp('order_date').defaultTo(knex.fn.now());
      table.timestamps(true, true);
      
      table.foreign('customer_id')
        .references('customer_id')
        .inTable('customers')
        .onDelete('RESTRICT'); // Keep order history
    })
    .createTable('order_items', (table) => {
      table.increments('order_item_id').primary();
      table.integer('order_id').unsigned().notNullable();
      table.integer('product_id').unsigned().notNullable();
      table.integer('quantity').notNullable();
      table.decimal('price_at_purchase', 10, 2).notNullable();
      
      table.foreign('order_id')
        .references('order_id')
        .inTable('orders')
        .onDelete('CASCADE'); // Items belong to orders
      table.foreign('product_id')
        .references('product_id')
        .inTable('products')
        .onDelete('RESTRICT'); // Keep order history
      
      table.unique(['order_id', 'product_id']);
    });
};

// Create a complete order
async function createOrder(customerId, items) {
  return knex.transaction(async (trx) => {
    // Get products
    const products = await trx('products')
      .whereIn('id', items.map(i => i.product_id))
      .select('*');
    
    // Calculate total
    let total = 0;
    for (const item of items) {
      const product = products.find(p => p.product_id === item.product_id);
      total += product.price * item.quantity;
    }
    
    // Create order
    const [orderId] = await trx('orders').insert({
      customer_id: customerId,
      total_amount: total
    });
    
    // Add items
    for (const item of items) {
      const product = products.find(p => p.product_id === item.product_id);
      
      await trx('order_items').insert({
        order_id: orderId,
        product_id: item.product_id,
        quantity: item.quantity,
        price_at_purchase: product.price // Snapshot current price
      });
      
      // Update stock
      await trx('products')
        .where('product_id', item.product_id)
        .decrement('stock_quantity', item.quantity);
    }
    
    return orderId;
  });
}

7. Quick Reference

Normalization Checklist

Before finalizing your schema:

  • ✅ Each table represents ONE type of thing
  • ✅ No repeating columns (product1, product2, product3...)
  • ✅ No columns that depend on other non-key columns
  • ✅ Foreign keys defined with appropriate cascade rules
  • ✅ Unique constraints where needed (email, username...)
  • ✅ Indexes on foreign keys

Common Table Patterns

One-to-Many:

customers (1) → (many) orders
// orders.customer_id references customers.customer_id

Many-to-Many:

products (many) ↔ (many) categories
// Need junction table: product_categories

Self-Referencing:

categories (parent) → (many) categories (children)
// categories.parent_id references categories.category_id

Foreign Key Decision Tree

Should I use CASCADE, RESTRICT, or SET NULL?

Is the child data meaningful without parent?
├─ NO → Use CASCADE (delete children with parent)
│   Example: Delete order → delete order items
└─ YES → Use RESTRICT (keep history) or SET NULL
    Example: Delete customer → keep orders with NULL customer

Design Process

  1. List your entities - What "things" exist? (Customer, Product, Order)
  2. Define relationships - How do they connect? (Customer has Orders)
  3. Draw it out - Sketch tables and arrows showing relationships
  4. Normalize to 3NF - Remove redundancy and dependencies
  5. Add indexes - On foreign keys and search columns
  6. Test with queries - Make sure you can get the data you need
  7. Denormalize IF needed - Only after measuring performance

The Golden Rules

  1. Start with 3NF - Normalize first, denormalize later if needed
  2. Use foreign keys - Enforce data integrity at database level
  3. One thing per table - Each table represents a single entity
  4. Historical snapshots - Store point-in-time data (like prices)
  5. Junction tables for many-to-many - Never put comma-separated IDs in a column
  6. Meaningful primary keys - Use auto-incrementing IDs
  7. Document your denormalizations - Explain why you broke the rules

Common Mistakes to Avoid

Storing lists in a column

// BAD: Comma-separated categories
table.string('categories'); // "electronics,gaming,computers"

// GOOD: Junction table
// product_categories with foreign keys

Repeating columns

// BAD: Limited and inflexible
table.string('phone1');
table.string('phone2');
table.string('phone3');

// GOOD: Separate table for phone numbers

Not using foreign keys

// BAD: Just storing ID with no constraint
table.integer('customer_id');

// GOOD: Enforce the relationship
table.integer('customer_id').unsigned();
table.foreign('customer_id')
  .references('customer_id')
  .inTable('customers');

Next Steps

  1. Sketch out schemas for different domains (blog, social media, inventory)
  2. Identify relationships (one-to-many, many-to-many)
  3. Practice normalizing messy designs to 3NF
  4. Add foreign keys and test referential integrity
  5. Query your schema - if queries are too complex, reconsider design

Remember: Good database design is 80% planning, 20% coding. Take time to think through your relationships before writing migrations!