Database Normalization & Design Fundamentals
A beginner-friendly guide to designing clean, efficient database schemas that scale.
Table of Contents
- What is Normalization?
- From Messy to Clean Design
- Many-to-Many Relationships
- Foreign Keys & Referential Integrity
- When to Break the Rules
- Real-World Example
- Quick Reference
1. What is Normalization?
Normalization is the process of organizing your database to reduce data redundancy and improve data integrity. Think of it as decluttering your database.
Why Normalize?
Without normalization:
- 😱 Same data stored in multiple places (waste of space)
- 🐛 Update one place, forget another (data inconsistency)
- 🤔 Hard to add new data without existing data
- 🗑️ Deleting one thing accidentally deletes other things
With normalization:
- ✅ Each piece of data stored once
- ✅ Easy to update (change in one place)
- ✅ No weird data anomalies
- ✅ Clear relationships between data
The Three Levels
We'll focus on getting to Third Normal Form (3NF) - the sweet spot for most applications:
- First Normal Form (1NF) - No repeating groups
- Second Normal Form (2NF) - Separate different types of data
- Third Normal Form (3NF) - Remove indirect relationships
2. From Messy to Clean Design
Let's design an order system step-by-step.
❌ LEVEL 0: Unnormalized (The Mess)
Everything crammed into one table:
// DON'T DO THIS!
exports.up = function(knex) {
return knex.schema.createTable('orders_messy', (table) => {
table.increments('id');
table.string('customer_name');
table.string('customer_email');
table.string('customer_phone');
table.string('customer_address');
// Repeating groups - can only store 3 products!
table.string('product1_name');
table.decimal('product1_price');
table.integer('product1_quantity');
table.string('product2_name');
table.decimal('product2_price');
table.integer('product2_quantity');
table.string('product3_name');
table.decimal('product3_price');
table.integer('product3_quantity');
});
};
Problems:
- Limited to 3 products per order
- Lots of empty columns if order has fewer products
- Customer info duplicated for every order
- Can't add a 4th product without changing table structure
✅ LEVEL 1: First Normal Form (1NF)
Rule: Each column has one value, no repeating groups.
// Better: Unlimited products per order
exports.up = function(knex) {
return knex.schema.createTable('orders_1nf', (table) => {
table.increments('order_id');
table.string('customer_name');
table.string('customer_email');
table.string('customer_phone');
table.string('customer_address');
table.string('product_name');
table.decimal('product_price', 10, 2);
table.integer('quantity');
table.timestamp('order_date').defaultTo(knex.fn.now());
});
};
// Example data:
// order_id | customer_name | customer_email | product_name | price | qty
// 1 | John Doe | john@example.com | Laptop | 999.99 | 1
// 1 | John Doe | john@example.com | Mouse | 25.99 | 2
// ^ Same customer info repeated for each product - still wasteful!
Fixed: No more repeating columns
Problem: Customer info still duplicates
✅ LEVEL 2: Second Normal Form (2NF)
Rule: Separate different types of data into their own tables.
// Split customers and orders
exports.up = function(knex) {
return knex.schema
.createTable('customers', (table) => {
table.increments('customer_id').primary();
table.string('name').notNullable();
table.string('email').notNullable().unique();
table.string('phone');
table.string('address');
table.timestamps(true, true);
})
.createTable('orders_2nf', (table) => {
table.increments('order_id').primary();
table.integer('customer_id').unsigned().notNullable();
table.string('product_name');
table.decimal('product_price', 10, 2);
table.integer('quantity');
table.timestamp('order_date').defaultTo(knex.fn.now());
// Link to customer table
table.foreign('customer_id')
.references('customer_id')
.inTable('customers');
});
};
// Example data:
// customers:
// customer_id | name | email
// 1 | John Doe | john@example.com
//
// orders_2nf:
// order_id | customer_id | product_name | price | qty
// 1 | 1 | Laptop | 999.99 | 1
// 1 | 1 | Mouse | 25.99 | 2
// ^ Just references customer ID - much better!
Fixed: Customer info stored once
Problem: Product name and price are related (if price changes, should update everywhere)
✅ LEVEL 3: Third Normal Form (3NF) - The Goal!
Rule: Each table represents ONE thing, no indirect relationships.
// Perfect: Customers, Products, Orders, and Order Items
exports.up = function(knex) {
return knex.schema
// Table 1: Customers
.createTable('customers', (table) => {
table.increments('customer_id').primary();
table.string('name').notNullable();
table.string('email').notNullable().unique();
table.string('phone');
table.string('address');
table.timestamps(true, true);
})
// Table 2: Products
.createTable('products', (table) => {
table.increments('product_id').primary();
table.string('name').notNullable();
table.text('description');
table.decimal('price', 10, 2).notNullable();
table.integer('stock_quantity').defaultTo(0);
table.timestamps(true, true);
})
// Table 3: Orders
.createTable('orders', (table) => {
table.increments('order_id').primary();
table.integer('customer_id').unsigned().notNullable();
table.decimal('total_amount', 10, 2).notNullable();
table.enum('status', ['pending', 'processing', 'shipped', 'delivered'])
.defaultTo('pending');
table.timestamp('order_date').defaultTo(knex.fn.now());
table.foreign('customer_id')
.references('customer_id')
.inTable('customers');
})
// Table 4: Order Items (connects orders and products)
.createTable('order_items', (table) => {
table.increments('order_item_id').primary();
table.integer('order_id').unsigned().notNullable();
table.integer('product_id').unsigned().notNullable();
table.integer('quantity').notNullable();
table.decimal('price_at_purchase', 10, 2).notNullable(); // Historical price
table.foreign('order_id')
.references('order_id')
.inTable('orders')
.onDelete('CASCADE'); // Delete items when order is deleted
table.foreign('product_id')
.references('product_id')
.inTable('products');
// Prevent same product twice in one order
table.unique(['order_id', 'product_id']);
});
};
✅ Perfect! Each table represents one thing:
customers- Customer informationproducts- Product catalogorders- Order headersorder_items- What's in each order
Key Design Decisions:
price_at_purchase- Stores price snapshot when ordered- Why? If product price changes tomorrow, old orders stay correct
total_amountin orders - Denormalized for convenience- Could calculate from order_items, but commonly needed
CASCADEdelete - When order is deleted, its items delete too- Makes sense: order items can't exist without an order
3. Many-to-Many Relationships
Sometimes things have many-to-many relationships. Example: Products can have multiple categories, and categories can have multiple products.
Junction Tables
Use a "junction table" (also called "join table" or "bridge table") to connect them:
exports.up = function(knex) {
return knex.schema
// Categories table
.createTable('categories', (table) => {
table.increments('category_id').primary();
table.string('name').notNullable().unique();
table.text('description');
})
// Junction table (connects products and categories)
.createTable('product_categories', (table) => {
table.integer('product_id').unsigned().notNullable();
table.integer('category_id').unsigned().notNullable();
// Composite primary key
table.primary(['product_id', 'category_id']);
table.foreign('product_id')
.references('product_id')
.inTable('products')
.onDelete('CASCADE');
table.foreign('category_id')
.references('category_id')
.inTable('categories')
.onDelete('CASCADE');
});
};
// Usage
async function assignCategories() {
// Insert product
const [laptopId] = await knex('products').insert({
name: 'Gaming Laptop',
price: 1499.99
});
// Insert categories
const [electronicsId] = await knex('categories').insert({ name: 'Electronics' });
const [gamingId] = await knex('categories').insert({ name: 'Gaming' });
// Connect product to multiple categories
await knex('product_categories').insert([
{ product_id: laptopId, category_id: electronicsId },
{ product_id: laptopId, category_id: gamingId }
]);
}
// Query products with their categories
async function getProductsWithCategories() {
return knex('products as p')
.select('p.name as product', 'c.name as category')
.leftJoin('product_categories as pc', 'p.product_id', 'pc.product_id')
.leftJoin('categories as c', 'pc.category_id', 'c.category_id');
}
Pattern: Product ↔ ProductCategories ↔ Category
4. Foreign Keys & Referential Integrity
Foreign keys enforce relationships between tables - they're your safety net.
Basic Foreign Key
table.foreign('customer_id')
.references('customer_id')
.inTable('customers');
This ensures every customer_id in orders actually exists in the customers table!
Cascade Options
What happens when you delete a customer who has orders?
// Option 1: RESTRICT (default) - Prevent deletion
table.foreign('customer_id')
.references('customer_id')
.inTable('customers')
.onDelete('RESTRICT'); // Error if customer has orders
// Option 2: CASCADE - Delete related records too
table.foreign('order_id')
.references('order_id')
.inTable('orders')
.onDelete('CASCADE'); // Delete order items when order is deleted
// Option 3: SET NULL - Set to null
table.foreign('manager_id')
.references('employee_id')
.inTable('employees')
.onDelete('SET NULL'); // Remove manager reference when manager is deleted
When to use each:
| Action | When to Use |
|---|---|
RESTRICT |
Can't delete customers with orders (preserves history) |
CASCADE |
Delete order items when order is deleted (dependent data) |
SET NULL |
Optional relationships (manager gets deleted, employee stays) |
5. When to Break the Rules
Sometimes normalized design is TOO normalized. Here's when to denormalize:
Denormalization for Performance
// Add frequently accessed data to avoid JOINs
exports.up = function(knex) {
return knex.schema.table('orders', (table) => {
// Store customer name in orders (denormalized)
table.string('customer_name');
table.string('customer_email');
// Still have customer_id foreign key for full details
});
};
When to denormalize:
- ✅ Read-heavy data (shown often, rarely changes)
- ✅ Complex JOINs causing performance issues
- ✅ Historical snapshots (like
price_at_purchase)
When NOT to denormalize:
- ❌ Data changes frequently
- ❌ You're just starting out (normalize first!)
- ❌ Queries are already fast enough
Common Denormalization Patterns
// Pattern 1: Store counts
table.integer('items_count').defaultTo(0); // Avoid COUNT(*) query
// Pattern 2: Store aggregates
table.decimal('total_spent', 12, 2).defaultTo(0); // Avoid SUM() query
// Pattern 3: Store historical snapshots
table.decimal('price_at_purchase', 10, 2); // Price when ordered, not current price
// Pattern 4: Store commonly JOINed data
table.string('customer_name'); // Avoid JOIN for display name
Remember: Denormalized data must be kept in sync! Use triggers or application logic.
6. Real-World Example
Complete e-commerce schema in 3NF:
exports.up = function(knex) {
return knex.schema
// Core entities
.createTable('customers', (table) => {
table.increments('customer_id').primary();
table.string('name').notNullable();
table.string('email').notNullable().unique();
table.string('phone');
table.string('address');
table.timestamps(true, true);
})
.createTable('products', (table) => {
table.increments('product_id').primary();
table.string('name').notNullable();
table.text('description');
table.decimal('price', 10, 2).notNullable();
table.integer('stock_quantity').defaultTo(0);
table.timestamps(true, true);
})
.createTable('categories', (table) => {
table.increments('category_id').primary();
table.string('name').notNullable().unique();
table.text('description');
})
// Relationships
.createTable('product_categories', (table) => {
table.integer('product_id').unsigned().notNullable();
table.integer('category_id').unsigned().notNullable();
table.primary(['product_id', 'category_id']);
table.foreign('product_id')
.references('product_id')
.inTable('products')
.onDelete('CASCADE');
table.foreign('category_id')
.references('category_id')
.inTable('categories')
.onDelete('CASCADE');
})
.createTable('orders', (table) => {
table.increments('order_id').primary();
table.integer('customer_id').unsigned().notNullable();
table.decimal('total_amount', 10, 2).notNullable();
table.enum('status', ['pending', 'processing', 'shipped', 'delivered'])
.defaultTo('pending');
table.timestamp('order_date').defaultTo(knex.fn.now());
table.timestamps(true, true);
table.foreign('customer_id')
.references('customer_id')
.inTable('customers')
.onDelete('RESTRICT'); // Keep order history
})
.createTable('order_items', (table) => {
table.increments('order_item_id').primary();
table.integer('order_id').unsigned().notNullable();
table.integer('product_id').unsigned().notNullable();
table.integer('quantity').notNullable();
table.decimal('price_at_purchase', 10, 2).notNullable();
table.foreign('order_id')
.references('order_id')
.inTable('orders')
.onDelete('CASCADE'); // Items belong to orders
table.foreign('product_id')
.references('product_id')
.inTable('products')
.onDelete('RESTRICT'); // Keep order history
table.unique(['order_id', 'product_id']);
});
};
// Create a complete order
async function createOrder(customerId, items) {
return knex.transaction(async (trx) => {
// Get products
const products = await trx('products')
.whereIn('id', items.map(i => i.product_id))
.select('*');
// Calculate total
let total = 0;
for (const item of items) {
const product = products.find(p => p.product_id === item.product_id);
total += product.price * item.quantity;
}
// Create order
const [orderId] = await trx('orders').insert({
customer_id: customerId,
total_amount: total
});
// Add items
for (const item of items) {
const product = products.find(p => p.product_id === item.product_id);
await trx('order_items').insert({
order_id: orderId,
product_id: item.product_id,
quantity: item.quantity,
price_at_purchase: product.price // Snapshot current price
});
// Update stock
await trx('products')
.where('product_id', item.product_id)
.decrement('stock_quantity', item.quantity);
}
return orderId;
});
}
7. Quick Reference
Normalization Checklist
Before finalizing your schema:
- ✅ Each table represents ONE type of thing
- ✅ No repeating columns (product1, product2, product3...)
- ✅ No columns that depend on other non-key columns
- ✅ Foreign keys defined with appropriate cascade rules
- ✅ Unique constraints where needed (email, username...)
- ✅ Indexes on foreign keys
Common Table Patterns
One-to-Many:
customers (1) → (many) orders
// orders.customer_id references customers.customer_id
Many-to-Many:
products (many) ↔ (many) categories
// Need junction table: product_categories
Self-Referencing:
categories (parent) → (many) categories (children)
// categories.parent_id references categories.category_id
Foreign Key Decision Tree
Should I use CASCADE, RESTRICT, or SET NULL?
Is the child data meaningful without parent?
├─ NO → Use CASCADE (delete children with parent)
│ Example: Delete order → delete order items
└─ YES → Use RESTRICT (keep history) or SET NULL
Example: Delete customer → keep orders with NULL customer
Design Process
- List your entities - What "things" exist? (Customer, Product, Order)
- Define relationships - How do they connect? (Customer has Orders)
- Draw it out - Sketch tables and arrows showing relationships
- Normalize to 3NF - Remove redundancy and dependencies
- Add indexes - On foreign keys and search columns
- Test with queries - Make sure you can get the data you need
- Denormalize IF needed - Only after measuring performance
The Golden Rules
- Start with 3NF - Normalize first, denormalize later if needed
- Use foreign keys - Enforce data integrity at database level
- One thing per table - Each table represents a single entity
- Historical snapshots - Store point-in-time data (like prices)
- Junction tables for many-to-many - Never put comma-separated IDs in a column
- Meaningful primary keys - Use auto-incrementing IDs
- Document your denormalizations - Explain why you broke the rules
Common Mistakes to Avoid
❌ Storing lists in a column
// BAD: Comma-separated categories
table.string('categories'); // "electronics,gaming,computers"
// GOOD: Junction table
// product_categories with foreign keys
❌ Repeating columns
// BAD: Limited and inflexible
table.string('phone1');
table.string('phone2');
table.string('phone3');
// GOOD: Separate table for phone numbers
❌ Not using foreign keys
// BAD: Just storing ID with no constraint
table.integer('customer_id');
// GOOD: Enforce the relationship
table.integer('customer_id').unsigned();
table.foreign('customer_id')
.references('customer_id')
.inTable('customers');
Next Steps
- Sketch out schemas for different domains (blog, social media, inventory)
- Identify relationships (one-to-many, many-to-many)
- Practice normalizing messy designs to 3NF
- Add foreign keys and test referential integrity
- Query your schema - if queries are too complex, reconsider design
Remember: Good database design is 80% planning, 20% coding. Take time to think through your relationships before writing migrations!