Here’s a link to slides from a talk by Roger Magoulas, (O’Reilly Media, Inc.) about building the open warehouse. The talk was presented at O’Reilly Open Source Convention 2004.
Commodity hardware, faster disks, and open source software now make building a data warehouse more of a resource and design issue than a cost issue for many organizations. Now a robust analysis infrastructure can be built on an open source platform with no performance or functional compromises.
This talk will cover a proven analysis architecture, the open source tool options for each architecture component, the basics of dimensional modeling, and a few tricks of the trade.
Why open source? Aside from the cost savings, open source lets you leverage what your staff already knows — tools like Perl, SQL and Apache — rather than having to procure and staff for the proprietary tools that dominate the commercial space.
Data Warehouse Architecture: – Consolidated Data Store (CDS)
– Process to condition, correlate and transform data
– Multi-topic data marts
– dimensional models
– Multi-channel data accessOpen Source Components
Database: MySQL
– fast, effective
Data Movement: Perl/DBI/SQL
– flexible data access
Data Access: Perl/Apache/SQL
– template toolkit for ad hoc SQL
– Perl hash for crosstabs/pivot
– Perl for reportsDimensional Model
– organizes data for queries and navigation from detail to summary
– normalized fact table for quantitative data
– denormalized dimensions with descriptive data
– conforming dimensions available to multiple factsPerformance Considerations
– configuration
– indexing
– SQL-92 joins
– aggregate tables and aggregate navigationThe presentation should provide you with the basic architecture, toolkit, design principles, and strategy for building an effective open source data warehouse.
You may subscribe to this blog by email.