There are several types of dimensional models, each suited to specific data warehouse and reporting needs. While the star schema is the most common implementation, other types of dimensional models exist to address varying requirements. Here are the main types:
1. Star Schema
- Description: Central fact table connected to denormalized dimension tables.
- Advantages:Simplified querying and high performance for analytical use cases.
- Easy for business users to understand.
- Limitations:Data redundancy in dimension tables can increase storage requirements.
- Use Case: Best for straightforward analytics with a focus on performance.
2. Snowflake Schema
- Description: A variation of the star schema where dimension tables are normalized into multiple related tables.
- Advantages:Reduces redundancy in dimension tables, saving storage space.
- Easier to maintain consistency across related data in dimensions.
- Limitations:More complex queries due to the need for additional joins.
- Slightly reduced query performance compared to a star schema.
- Use Case: Suitable for scenarios where storage efficiency and consistency are prioritized over query simplicity.
3. Galaxy Schema (Fact Constellation)
- Description: Combines multiple fact tables sharing common dimension tables.
- Advantages:Supports complex business processes involving multiple fact tables.
- Facilitates analysis across different data marts with shared dimensions.
- Limitations:Complexity increases with the number of fact tables.
- Requires careful design to avoid performance bottlenecks.
- Use Case: Ideal for enterprise-scale data warehouses with multiple related subject areas (e.g., sales, inventory, and finance).
4. Data Vault Model
- Description: A highly flexible and scalable approach that separates data into three components:Hubs: Represent core business entities (e.g., Customer, Product).
- Links: Capture relationships between hubs (e.g., Customer-Product transactions).
- Satellites: Store attributes and time-variant data for hubs and links.
- Advantages:Excellent for handling historical and evolving data structures.
- Scalable and suited for agile development.
- Limitations:Not as intuitive for end-users as star or snowflake schemas.
- Requires additional transformation steps for querying.
- Use Case: Best for large-scale systems requiring flexibility and historical tracking.
5. Starflake Schema
- Description: A hybrid approach combining elements of both star and snowflake schemas.
- Advantages:Balances query performance with storage efficiency.
- Combines denormalized and normalized tables for flexibility.
- Limitations:Adds complexity to the schema design.
- Use Case: Appropriate when some dimensions require normalization while others benefit from denormalization.
6. Core and Custom Dimensional Models
- Description: Divides the dimensional model into a core model (common dimensions and facts) and custom extensions (specific to business units or regions).
- Advantages:Supports global and localized reporting requirements.
- Reusability of core components reduces duplication.
- Limitations:Complexity in maintaining consistency between core and custom models.
- Use Case: Useful for global organizations with diverse reporting needs.
7. Virtual Star Schema
- Description: Logical schema created by views or virtualization, without physically materializing tables.
- Advantages:Reduces storage needs by avoiding data duplication.
- Flexible and easier to modify.
- Limitations:Performance may depend on the underlying database engine and optimization.
- Use Case: Suitable for scenarios where physical data warehouses are complemented by virtualized layers.
8. Bus Architecture
- Description: Focuses on shared dimensions across multiple star or snowflake schemas, with a "bus matrix" to map dimensions and facts.
- Advantages:Facilitates consistency across different subject areas.
- Promotes reuse of dimensions across data marts.
- Limitations:Requires strict governance to maintain shared dimensions.
- Use Case: Best for large organizations consolidating data across multiple business domains.
9. Hybrid Dimensional Models
- Description: Combines aspects of dimensional and relational models to meet specific business and technical requirements.
- Advantages:Flexibility to address unique data warehouse challenges.
- Tailored to balance performance and scalability.
- Limitations:Custom implementations can be more complex to design and maintain.
- Use Case: For organizations with highly customized data needs.
10. Aggregate Fact Tables
- Description: Uses pre-aggregated data to optimize performance for specific queries (e.g., monthly sales summaries).
- Advantages:Significantly faster query performance for aggregated metrics.
- Reduces computational load on raw fact tables.
- Limitations:Requires careful design to align with anticipated queries.
- Increases storage requirements for pre-aggregated data.
- Use Case: For dashboards and reports requiring quick insights on aggregated data.
Final Notes
The choice of dimensional model depends on:
- Data volume: Larger datasets may favor normalized models for efficiency.
- Query complexity: Simple star schemas are better for quick insights, while snowflake schemas handle more nuanced relationships.
- Business needs: Aligning the model with reporting and analytics goals ensures maximum utility.
Popular dimensional modeling
Dimensional data modeling is a design technique for structuring data in a way that supports easy querying and analysis, typically in a data warehouse or data mart. The three main types of dimensional data models are Star Schema, Snowflake Schema, and Galaxy Schema. Here’s an end-to-end explanation with use cases for each:
1. Star Schema
Definition:
The Star Schema organizes data into a central fact table and connected dimension tables in a denormalized structure. Each dimension table contains descriptive attributes related to the fact table.
Use Case Example: Retail Sales Analysis
• Objective: Analyze sales performance across stores, products, and time periods.
Steps:
1. Fact Table:
• Name: Sales_Fact
• Columns: Date_Key, Store_Key, Product_Key, Sales_Amount, Quantity_Sold.
2. Dimension Tables:
• Date_Dim: Contains attributes like Date, Month, Year.
• Store_Dim: Contains Store_ID, Location, Manager.
• Product_Dim: Contains Product_ID, Category, Brand.
3. Schema Design:
The Sales_Fact table connects to each dimension table using foreign keys (e.g., Store_Key connects to Store_Dim).
4. Query Example:
“What were the total sales by store in Q4 of 2023?”
• Joins the Sales_Fact with Date_Dim and Store_Dim.
Advantages:
• Simple and intuitive for business users.
• Optimized for read-heavy queries.
2. Snowflake Schema
Definition:
A Snowflake Schema is a normalized version of the Star Schema where dimension tables are further split into sub-dimensions.
Use Case Example: Customer Relationship Analysis
• Objective: Analyze customer purchases by geographic region and product category.
Steps:
1. Fact Table:
• Name: Purchases_Fact
• Columns: Date_Key, Customer_Key, Product_Key, Purchase_Amount.
2. Dimension Tables:
• Customer_Dim: Contains Customer_ID, Region_Key.
• Region_Dim: Contains Region_ID, Country, Continent.
• Product_Dim: Contains Product_ID, Category_Key.
• Category_Dim: Contains Category_ID, Category_Name.
3. Schema Design:
Region_Dim normalizes location details, and Category_Dim organizes products into categories.
4. Query Example:
“What are the total purchases by region for Electronics in 2024?”
• Joins Purchases_Fact with Customer_Dim, Region_Dim, and Product_Dim.
Advantages:
• Reduced data redundancy.
• More scalable for complex dimensions.
3. Galaxy Schema
Definition:
The Galaxy Schema (or Fact Constellation) contains multiple fact tables that share dimension tables.
Use Case Example: Enterprise Resource Planning (ERP) Analysis
• Objective: Analyze sales and inventory data across the organization.
Steps:
1. Fact Tables:
• Sales_Fact: Tracks sales transactions.
• Inventory_Fact: Tracks stock levels.
2. Dimension Tables:
Shared dimensions: Product_Dim, Store_Dim, Date_Dim.
3. Schema Design:
Both Sales_Fact and Inventory_Fact use shared dimensions like Date and Product.
4. Query Example:
“What is the correlation between inventory levels and sales trends in Q4 2023?”
• Joins both fact tables with Date_Dim and Product_Dim.
Advantages:
• Supports complex analyses across multiple business processes.
• Centralizes dimensions for consistent data reporting.
End-to-End Process
1. Requirements Gathering:
• Understand business questions and metrics (e.g., “What are the monthly sales trends?”).
2. Source Data Identification:
• Identify transactional data sources like ERP systems or CRM databases.
3. Data Modeling:
• Choose a schema type (Star, Snowflake, or Galaxy).
• Design fact and dimension tables using tools like ERwin or Lucidchart.
4. ETL Process:
• Extract: Pull data from source systems.
• Transform: Clean and structure data for the chosen schema.
• Load: Populate the fact and dimension tables in the data warehouse.
5. Reporting:
• Use tools like Power BI, Tableau, or Looker for visualization and querying.
6. Performance Tuning:
• Index fact tables.
• Optimize joins and aggregations.
Key Differences Between Schemas:
Feature Star Schema Snowflake Schema Galaxy Schema
Normalization Denormalized Normalized Combination
Query Complexity Simplified Complex Complex
Data Redundancy Higher Lower Moderate
Use Case Simple Reporting Complex Dimensions Multi-Process Analysis