When assessing Data Warehouse solutions, I believe scalability and flexibility are paramount considerations. Ensuring the system can adapt to evolving data needs and handle increasing volumes of information is crucial for long-term success. For more insights on this topic, you might find this resource helpful: https://www.cleveroad.com/blog...
Chief Data Architect at Lucid Technologies & Solutions
Consultant
2017-03-04T10:51:14Z
Mar 4, 2017
I would look at it from how the solution addresses two types of requirements - call it Functional and Non-functional or Business and Technical requirements.
This classification becomes critical as the datawarehouse solutions have now evolved from mere high-power databases with enhanced storage and performance optimization features to a key Information management solution in the enterprise.
From Business perspective, look at:
- Analytical capability it can support (likes of SAP HANA have all the analytical layers built within the datawarehouse not needing a high-end analytical tool!)
- Variety of data to support (structured and unstructured - how they can co-exist with the big data solutions)
- Capability to support or integrate with other data management solutions like Master Data Management, Data Governance, Data Quality solutions (isolated DWH solution does not show any business value and will die a slow death)
- Ability to provide or support accelerators such as out-of-the-box industry data models and other Agile development methods (avoids DWH projects becoming multi-year projects delivering no ROI)
From Technical perspective, look at:
- Total cost of ownership - Cloud-based vs appliances vs on-prem database platform
- Scalability and Performance features - in-memory architecture, row-based vs columnar, parallel processing(MPP) arch etc
- Storage optimization - data compression capability, caching etc
- Security (critical for regulated industries) such as support for masking, advance audit techniques and access control
The key point I want to bring up again is that it is just not the technical perspective that should drive the selection.
Ease of use in finding and retrieving data. Having the data stored without providing the meta data for the user is of no value. it needs an interface with access to data that provides more than key word search.
Vertica Support Engineer at a media company with 10,001+ employees
Vendor
2017-02-27T14:53:06Z
Feb 27, 2017
A very open-ended question.
When evaluating a DW initiative, the most important thing is to start off with the right questions, many of which have already been touched-on directly or indirectly by others here.
Know your data and what you want to do with it; how much do you already have, and for how long do you want to keep it in the DW?
How many people will be querying it at the same time?
What kind of response time do they expect?
Which DW platforms are compatible with your existing IT staff skillset? What skillsets would you have to hire?
Is the data ready to be loaded into the DW? It seldom is - will additional staff be needed for this?
Hi
About my experience, and seeing how the TICs are moving today, I think that there are two principals factors to consider when one wants to invest in a DW platform. These are: Innovation and support...The first assure you about a continuous modern platform over the years, and the second assure you that this platform runs at the level you want.
It actually boils down to the amount of Sequential IO that can be pushed and massaged. Each physical core of a data warehousing host can consume about 1 gbps of data... In order to keep all the cores working on the data it has to get moved from storage into ram... you need to measure the Maximum Consumption rates and the Maximum Throughput rates for Sequential IO and determine whether there are any bottlenecks to the performance of queries acting against the sequential data.
Software Architect at a tech consulting company with 51-200 employees
Real User
2017-02-23T18:01:56Z
Feb 23, 2017
IMHO, it is the wrong approach to consider one single aspect as the driver for comparing DW solutions from vendors.
Over the years, many aspects have become important for a DW solution:
ETL tools included in the solution, in particular, tools must include required transformations for Dimensions and Fact Tables, like Slowly Changing Dimension, just to name one
Data Quality tools included in the solution
Master Data Services tools included in the solution
Reporting tools included in the solution
Level of integration between ETL tools, Data Quality tools and Master Data Services tools included in the solution
Proper extensibility support in ETL tools included in the solution
Proper table partitioning (data or horizontal partitioning) support included in the solution
Proper columnstore indexes support included in the solution
These are the important aspects that a competitive DW solution should include.
It 100% depends on your needs in terms of data volume, availability, # of potential concurrent users, # of data sources, complexity of data sources. I'm assuming by platform you mean tools/software/hardware? Are you thinking BI as well or just backend?
Cloud-based solutions should be part of the consideration but that again is based on your organization's needs and abilities - data center capacity and support. Cost always matters no matter where you work and we all have budgets to work with so saying scalability is most important or something else doesn't make sense if you don't need scalability or can't afford it.
In other words, if your need is just solving reporting problems for a single source system (not really a DW), your source is only say 100 GB and you have a dozen total users, then buying 10 node Teradata MPP means you have lost your marbles. So scalability in that situation is irrelevant because almost anything will satisfy your needs.
On the other hand if you may have a petabyte of total data and terabyte DW database then get out your wallet and buy top of the line...Informatica or Datastage are probably best bets for ETL, and Teradata or maybe Azure data warehouse would be on top of my list for platform.
Senior Architect at a agriculture with 1,001-5,000 employees
Vendor
2017-02-23T15:54:11Z
Feb 23, 2017
That very much depends on your requirements. What’s the usage profile of your planned data warehouse solution? Your use cases drive the evaluation and the criteria used to weighted-rate aspects of potential solutions.
Executive Vice President - Sales at a tech vendor with 51-200 employees
Vendor
2017-02-23T15:48:53Z
Feb 23, 2017
Scalability, flexibility, performance, and ease of management are key. While a traditional RDBMS (originally designed for OLTP) can perform Data Warehouse tasks, you will always be better off with a purpose built platform that was designed from the ground up specifically for Data Warehouse data loading/ingest, report processing, and ideally in place analytics. Highly flexible, cloud-based solutions have reached a level of maturity and a price point that are extremely compelling - take a look at Snowflake.
What is a data warehouse? A data warehouse, sometimes categorized as an Enterprise Data Warehouse, (DW or DWH) is a data analysis and reporting system. Data warehouses are fundamental storehouses of integrated data from single, or multiple sources, storing historical or current data in one location where data is utilized, creating reports for designated Enterprise users.
A DW is considered an integral component of business intelligence and describes a system used to analyze an...
When assessing Data Warehouse solutions, I believe scalability and flexibility are paramount considerations. Ensuring the system can adapt to evolving data needs and handle increasing volumes of information is crucial for long-term success. For more insights on this topic, you might find this resource helpful: https://www.cleveroad.com/blog...
It depends on the company requirement. Performance of database when loading and retrieving data is very important for any data warehouse.
I would look at it from how the solution addresses two types of requirements - call it Functional and Non-functional or Business and Technical requirements.
This classification becomes critical as the datawarehouse solutions have now evolved from mere high-power databases with enhanced storage and performance optimization features to a key Information management solution in the enterprise.
From Business perspective, look at:
- Analytical capability it can support (likes of SAP HANA have all the analytical layers built within the datawarehouse not needing a high-end analytical tool!)
- Variety of data to support (structured and unstructured - how they can co-exist with the big data solutions)
- Capability to support or integrate with other data management solutions like Master Data Management, Data Governance, Data Quality solutions (isolated DWH solution does not show any business value and will die a slow death)
- Ability to provide or support accelerators such as out-of-the-box industry data models and other Agile development methods (avoids DWH projects becoming multi-year projects delivering no ROI)
From Technical perspective, look at:
- Total cost of ownership - Cloud-based vs appliances vs on-prem database platform
- Scalability and Performance features - in-memory architecture, row-based vs columnar, parallel processing(MPP) arch etc
- Storage optimization - data compression capability, caching etc
- Security (critical for regulated industries) such as support for masking, advance audit techniques and access control
The key point I want to bring up again is that it is just not the technical perspective that should drive the selection.
Ease of use in finding and retrieving data. Having the data stored without providing the meta data for the user is of no value. it needs an interface with access to data that provides more than key word search.
A very open-ended question.
When evaluating a DW initiative, the most important thing is to start off with the right questions, many of which have already been touched-on directly or indirectly by others here.
Know your data and what you want to do with it; how much do you already have, and for how long do you want to keep it in the DW?
How many people will be querying it at the same time?
What kind of response time do they expect?
Which DW platforms are compatible with your existing IT staff skillset? What skillsets would you have to hire?
Is the data ready to be loaded into the DW? It seldom is - will additional staff be needed for this?
Hi
About my experience, and seeing how the TICs are moving today, I think that there are two principals factors to consider when one wants to invest in a DW platform. These are: Innovation and support...The first assure you about a continuous modern platform over the years, and the second assure you that this platform runs at the level you want.
It actually boils down to the amount of Sequential IO that can be pushed and massaged. Each physical core of a data warehousing host can consume about 1 gbps of data... In order to keep all the cores working on the data it has to get moved from storage into ram... you need to measure the Maximum Consumption rates and the Maximum Throughput rates for Sequential IO and determine whether there are any bottlenecks to the performance of queries acting against the sequential data.
IMHO, it is the wrong approach to consider one single aspect as the driver for comparing DW solutions from vendors.
Over the years, many aspects have become important for a DW solution:
ETL tools included in the solution, in particular, tools must include required transformations for Dimensions and Fact Tables, like Slowly Changing Dimension, just to name one
Data Quality tools included in the solution
Master Data Services tools included in the solution
Reporting tools included in the solution
Level of integration between ETL tools, Data Quality tools and Master Data Services tools included in the solution
Proper extensibility support in ETL tools included in the solution
Proper table partitioning (data or horizontal partitioning) support included in the solution
Proper columnstore indexes support included in the solution
These are the important aspects that a competitive DW solution should include.
It 100% depends on your needs in terms of data volume, availability, # of potential concurrent users, # of data sources, complexity of data sources. I'm assuming by platform you mean tools/software/hardware? Are you thinking BI as well or just backend?
Cloud-based solutions should be part of the consideration but that again is based on your organization's needs and abilities - data center capacity and support. Cost always matters no matter where you work and we all have budgets to work with so saying scalability is most important or something else doesn't make sense if you don't need scalability or can't afford it.
In other words, if your need is just solving reporting problems for a single source system (not really a DW), your source is only say 100 GB and you have a dozen total users, then buying 10 node Teradata MPP means you have lost your marbles. So scalability in that situation is irrelevant because almost anything will satisfy your needs.
On the other hand if you may have a petabyte of total data and terabyte DW database then get out your wallet and buy top of the line...Informatica or Datastage are probably best bets for ETL, and Teradata or maybe Azure data warehouse would be on top of my list for platform.
That very much depends on your requirements. What’s the usage profile of your planned data warehouse solution? Your use cases drive the evaluation and the criteria used to weighted-rate aspects of potential solutions.
- Stefan
Scalability, flexibility, performance, and ease of management are key. While a traditional RDBMS (originally designed for OLTP) can perform Data Warehouse tasks, you will always be better off with a purpose built platform that was designed from the ground up specifically for Data Warehouse data loading/ingest, report processing, and ideally in place analytics. Highly flexible, cloud-based solutions have reached a level of maturity and a price point that are extremely compelling - take a look at Snowflake.
The type of data warehouse platform!
The scalability of the data warehouse and its ability to handle multiple queries at the same time without loading the servers.