I am a solution architect for a global tech vendor.
I am looking for any general guidelines to allocate tablespace quota to different layers/schemas in the ETL flow of a data warehouse (% of total space in each layer).
I have these 4 layers:
I understand space requirements may vary based on project requirements, however, any general guideline (if any such in data warehousing and ETL space) to estimate the space will be helpful.
Thanks! I appreciate your help.
Space requirement will be directly dependent on design considerations and data requirements. Any estimation without detailed information is bound to be faulty and should be used as reference only.
Here are my recommendation on how to go about it:
1. Staging
Estimating Staging will be dependent on the following:
1. Understanding of sources
- Each source, amount of data it's going to provide
- Regular full/ incremental data and special days data volume
Having said these, typically 5 % of space should do for incremental data loads.
2. ODS- Type 1 persistent tables
Estimating ODS will be dependent on the following:
1. Understanding of sources
- Each source, amount of data it's going to provide to ODS
- Regular full/ incremental data and special days data volume
2. data retention period is a critical as space requirement can increase twice if retention days is doubled
Having said these, typically 10 - 15 % of space with 14 days retention will be reasonable.
3. Transformation layer- similar to final DWH layer but truncate before loading newly arrived data
Estimating Transformation will be dependent on the following:
1. Modeling strategy used
2. Number of additional derived information is maintained
3. Data volatility
4. Audit information captured
Since it's a truncate and load, expect it to be the size of approximately 5 - 10 %
4. DWH layer- Final dimensional model layer
- Typically 70 - 80 % of space is allocated to DWH.
- Dimension, Facts, Aggregate tables, Indexes, etc. should be carefully considered for estimation
- Data retention criteria (if any) should also be considered
Other key considerations:
1. Special attention to space requirements for historical data load should be considered. Most projects miss this important step and struggle at the end.
2. The system should also be estimated for different project environments. Typically Dev (10 %), System Test(5 %), UAT(15 %) of entire space should be allocated
3. 10 % of data growth on a year on year basis for ODS/ Staging/ Transformation layer should be considered
4. The growth of DWH will be dependent on modeling techniques and data volatility. These must be considered for the next few years
5. Some buffer should also be retained for any unforeseen circumstance.
Thank you Gouri, it helps !!