Senior DBA and Architect at a tech services company with 501-1,000 employees
Consultant
2016-06-24T21:23:38Z
Jun 24, 2016
At the outset comparing these technologies/products I believe is truly not an apple to apple comparison. However one could make an attempt to look at their core functionality and what they were designed to accomplish. All of these have one goal in common – that is to manage and analyze very large volumes of data including Big Data. To address this all of them do have a cluster based architecture.
Ultimately one has to look at their business goals, current infrastructure, additional training if any and more.
Oracle Exadata –
The Exadata product family (referred to as Engineered Systems by Oracle) includes Exadata Database Machine, SuperCluster, etc. A key component is the Exadata Storage Server. It employs a massively parallel MPP architecture of multiple compute/DB nodes and storage/cell nodes all communicating over the InfiniBand network. This results in a clustered shared disk architecture. The Exadata software is optimally divided between the database servers and Exadata cells. The DB kernel and the storage cells communicate through the iDB (Intelligent DB Protocol) and this enables to ship a portion of the DB kernel activity down to the storage layer. Exadata cells return only the rows and columns that satisfy the SQL query thereby reducing the network data movement and as well as eliminating wastage of CPU cycles on the DB nodes.
Oracle ASM is the file system and volume manager on Exadata and can be designed and deployed to handle both OLTP and DSS workloads unlike others which are only geared for DSS. It can handle both RAC and single instance databases and helps in database consolidation. However the storage servers cannot communicate with one another; instead all communication is forced via the InfiniBand network to DB node and then back to the storage tier. Exadata supports both row and columnar format through Hybrid Columnar (eHCC) Compression and as well as in-memory store.
Oracle Exadata comes out in multiple flavors (quarter, half, full rack, etc.; in case of SuperCluster half and full rack; storage expansion rack) and can be scaled as and when required. Exadata also provides connectors for Big Data, Hadoop, ODBC, JDBC and more. Since the Sun acquisition Oracle now owns the entire stack both hardware and software on the Exadata.
EMC Greenplum –
Greenplum Database is a MPP (massively parallel processing) database server and is based on PostgreSQL open-source technology. MPP is inherently a shared nothing architecture with 2 or more server nodes. Greenplum uses this architecture to derive high performance and thereby load distribution for very large data warehouses. This translates to 2 or more PostgreSQL database instances acting together as a single unified DBMS. All nodes can scan and process in parallel. To this extent this is similar to an Oracle RAC database with 2 or more nodes. User’s interaction is similar to PostgreSQL DBMS. Supports table structures as heap, row/column oriented and external. This is also true of Oracle Exadata.
EMC acquired Greenplum in July 2010. EMC has built an advanced analytics appliance (Greenplum DCA) to develop Business Intelligence as a Service (BIaaS). This has been designed for predictive analytics for Big Data.
HP Vertica -
Vertica is also built based on the MPP architecture and can be scaled-out. It employs column-oriented RDBMS and compression of data. This enables Vertica to handle analytical workloads through a distributed (multiple nodes) compressed columnar architecture. It provides support for open ANSI SQL, ODBC and JDBC along with built-in predictive analysis via Ruby and Python. Vertica can be used together with Hadoop. The Vertica cluster architecture stores data on the database in two containers - Write Optimized Store (WOS) and Read Optimized Store (ROS).
The WOS stores data in memory without compression or indexing whereas the ROS stores data on disk wherein it is segmented, sorted, and compressed for high optimization. A Tupler Mover handles the data movement between the WOS and ROS.
Neteeza –
Neteeza employs an asymmetric massively parallel processing (AMPP) architecture. IBM acquired Neteeza in Sep 2010 and has now integrated this technology into their PureData System appliance. This appliance is tailored for Data Warehouse workloads and the architecture makes use of the IBM blade servers and disk storage using IBM’s FPGAs (Field Programmable Gate Arrays). The FPGA is a special chip having a large number of gates that are programmable to implement most of the logical functions needed to manage streaming processing tasks. Like all of its competitors provides integration with ODBC, JDBC, ETL and BI tools as well.
Principal Technical Consultant at Khwarizm Consulting
Consultant
2016-06-16T23:51:32Z
Jun 16, 2016
If you already have Oracle DBs, then Exadata will be the solution as training for the new machine will be almost null, except if they don't have RAC skills.
Adjusting Oracle Instance to work for DW is not that much a headache.
I have trained a large Telco in Africa for Exadata. 50% of the attendants were DW on Teradata. They accepted the idea that they will can use Exadata for DW tasks in the future beside/over Teradata.
Oracle sees Teradata as the main competitor not the other you mentioned here
Exadata is shipped from one vendor only al built to match each other.
Generally when I look to other DB/DW systems I found that they are multi-vendor inside, which is a drawback for me.
Why you don't also consider Tedata too in comparison?
If I were you I will set with all vendors and put them in the real offering.
Some, said that Exadata is very hard to adjust. This is very wired for me. The machine is very well balanced in configuration. Setup by Oracle ACS or certified partner, with attending the one course your team will have full control over the box.
Exadata is the Storge engine for Bigdata shipped by Oracle too.
Go with HPE Vertica, extremely fast, platform independent, very well integrated with Hadoop (all distributions) and easy to use ! Definitely the most economic solution to purchase and maintain.
More info Ican give if i new business case.
Tomasz
Depends what you want to do. Exadata is a hardware solution ORACLE offers to increase performance for high volume data processing, plays very well if you are an ORACLE shop. If you tech stack is IBM based, DB2, Websphere etc then I would certainly look at Netezza..
Usually I think of Green Plum for uses cases involving data analysis and presentation. seems to me they changed owners recently.
@it_user429198 so here we are in 2023.. oracle X9 and we did have as a comparison HPE vertica... What is a good replacement for the X9 assuming we are replacing the infrastructure?
They are different products so a comparison is not easy without knowing the scope of the project. The main difference is that Oracle is a RDBMS platform while the other 3 are columnar DBs or similar. This means that HPE Vertica, EMC GreenPlum, IBM Netezza are much efficiente (5-8x less storage) and fast DBs (3-7x query speed) but the lack some RDBMS capabilities especially and they have some limits in concurrent access (100' vs 1000'). This should be a problem for large DWH with thousand of simultaneous accesses, but if you don't need 1000' simultaneous access that's not an issue
The other big difference is price and cost of ownership, Oracle is much higher than other 3.
Regards the difference between vertica, greemplun and netezza; HP Vertica and greenplum are the more DB than netezza. IBM netezza is more an appliance to use in conjunction with RDBMS technologies. vertica and Greenplum have a lot of DWH/OLAP features both on specialized queries (dimensional queries) than on geoloaction capabilities and on DWH aspects (resource manager to allocate more resource to specific queries, Query optimized, ..).
BI Manager, Vertica ASE Certified DBA at a marketing services firm with 1,001-5,000 employees
Real User
2016-06-15T13:04:45Z
Jun 15, 2016
I am strongly in favor of Vertica. It's columnar storage is fast, flexible, load times are lightning fast, and compression is extremely impressive. Their latest version allows for more personalized query hints. The included Database Designer helps you design your database for optimal performance. Training is top notch. Support can be a bit sketchy at times, but they are very friendly, knowledgeable and helpful. I've been using Vertica for three years and would not go back to using any other product.
EMEA HP Servers Category Manager at Hewlett Packard Enterprise
Vendor
2016-06-15T12:26:11Z
Jun 15, 2016
HPE Vertica is the best solution. Extreme low latency query performance. Better compression capabilities. Lower TCO . Facebook and Uber runs on Vertica
EMC GreenPlum : High TCO because of long and complex implementation cycles
Not well suited for real time workloads requiring continuous load and query. Proprietaray HW
IBM Netezza: Inferior workload management ( allocating query resources among different groups of users)
Expensive “fork lift” upgrades driven by appliance sizing not incremental customer need - adds significant downtime. Proprietaray HW
Oracle Exadata: High Admin and Skills costs (RAC) Complex to set up. Expensive. To match Vertica, customers will require a much higher hardware footprint thus significantly increasing TCO. Proprietaray HW
Netezza would have to be my answer as well. As mentioned before GreenPlum is on its way out as EMC has recently released multiple new platforms and are beginning to focus more along those lines. Netezza excels in the structured data capacity, but if you are looking for more of a catch-all data types, I might suggest an EMC Isilon solution.
Exadata - More generic, covering oltp to olap/analytical use cases. But difficult to configure for olap use cases....100s of parameters to deal with. Oracle RDBms optimizer still maturing to easily and completely leverage Exadata configuration advantages such as that from full scans, storage indexes etc
Vertica - very efficient for use cases where processing of specific or limited columns is required. Such as olap queries requiring aggregation of a specific column value. However not the most efficient for the wider use cases supported by an EDW. Good for reporting data marts not for EDWs
Greenplum - cheaper platform however etl vendor support is not great, like informatica pdo not fully supported. Use postgress SQL...not very different from ANSI or oracle SQL....but still some differences are there.
Netezza - pretty good dw appliance configuration.....very little overheads for tuning, indexing or any other configuration required in other dw appliances like exadata. Good from ease of deployment and use, and performance perspective. Capable of better supporting wider use cases of an EDW as compared to others from the list
This is a question that is often asked because of the overlap and unique differentiation of each product. First, it should be known that I am in favor of Vertica as the overall leading EDW based on multiple use cases ranging from structured, semi-structured and unstructured data sets. There is a reason why Facebook uses Vertica.
GreenPlum was a good concept but it is too storage-centric imho. Applications are no longer written from the storage layer, but rather top-down with a solid API stack to black box infrastructure altogether and this makes GreenPlum less than desirable as a critical infrastructure component.
Netezza is solid and has performed well over the years. Works very well with structured data use cases and process-oriented applications.
However, it is becoming harder to justify the enterprise grade EDWs due to the innovation and performance of in-memory database architectures with a high performance storage backend (i.e., MapR, DDN, etc.). The integration of Hadoop (lakes or streams) makes the in-memory approach more effective, relevant and scalable.
Last note: the increasing demand for near real-time analytics is also driving more disruption in the traditional EDW/3-tier architecture approach. With the right storage backend and a high performing in-memory database, the ROI comparison begins to unfold rather quickly.
We’re in the process of comparing Oracle Exadata vs. HPE Vertica vs. EMC GreenPlum vs. IBM Netezza.
What would be your requirement – Batch processing, split seconds response, DW ?
Greenplum is very good for batch processing and a good data warehouse. Netezza is similar to Greenplum but having better support.
Technical Team Lead, Business Intelligence at a tech services company with 51-200 employees
Consultant
2016-06-15T12:01:32Z
Jun 15, 2016
There are a number of factors that you'll need to consider. The biggest advantage to all of them is data compression over a traditional RDMBS. I'm not an expert but I have worked with Oracle Exadata, HPE Vertica and IBM Netezza. If you're already using Oracle products then Exadata seems like the logical choice. Same for IBM and Netezza. Both I believe are appliances. Not sure about EMC GreenPlum, but HPE Vertica is the only pure columnar that I've worked with which probably has the highest data compression. From my experience, you need to take time to figure out how to use the product. For example, IBM DB2 had materialized tables and IBM Netezza has no such function. The hints used in Oracle 11g didn't work once we switched over to Netezza. For HPE Vertica, if you're used to writing queries with a lot of columns, each additional column degrades performance a bit. All the issues raised are solvable but it takes time to figure out the best approach. HPE Vertica was the cheapest to install and maintain. Support wasn't great and that's something that would be probably better with Oracle and IBM. Hope this helps.
Consultant at a financial services firm with 5,001-10,000 employees
Real User
2016-06-15T11:36:57Z
Jun 15, 2016
This is very much dependant on the size of the organisation's data, alongside what you're trying to achieve. If you're attempting to have a Kimball-style data warehouse, perhaps using ETL routines against a Hadoop data lake, you might well consider one of these perfectly capable tools.
If you're intending to set up an Inmon-style data warehouse, these may also be a good fit. In my personal experience, I would opt for Greenplum out of the list of technolgies. My organisation uses this and it is a very good MPP solution. That said, if speed of query processing is something you're keen to do, you may wish to extend your list to include the next generation of MPP databases, such as Exasol.
What is a data warehouse? A data warehouse, sometimes categorized as an Enterprise Data Warehouse, (DW or DWH) is a data analysis and reporting system. Data warehouses are fundamental storehouses of integrated data from single, or multiple sources, storing historical or current data in one location where data is utilized, creating reports for designated Enterprise users.
A DW is considered an integral component of business intelligence and describes a system used to analyze an...
At the outset comparing these technologies/products I believe is truly not an apple to apple comparison. However one could make an attempt to look at their core functionality and what they were designed to accomplish. All of these have one goal in common – that is to manage and analyze very large volumes of data including Big Data. To address this all of them do have a cluster based architecture.
Ultimately one has to look at their business goals, current infrastructure, additional training if any and more.
Oracle Exadata –
The Exadata product family (referred to as Engineered Systems by Oracle) includes Exadata Database Machine, SuperCluster, etc. A key component is the Exadata Storage Server. It employs a massively parallel MPP architecture of multiple compute/DB nodes and storage/cell nodes all communicating over the InfiniBand network. This results in a clustered shared disk architecture. The Exadata software is optimally divided between the database servers and Exadata cells. The DB kernel and the storage cells communicate through the iDB (Intelligent DB Protocol) and this enables to ship a portion of the DB kernel activity down to the storage layer. Exadata cells return only the rows and columns that satisfy the SQL query thereby reducing the network data movement and as well as eliminating wastage of CPU cycles on the DB nodes.
Oracle ASM is the file system and volume manager on Exadata and can be designed and deployed to handle both OLTP and DSS workloads unlike others which are only geared for DSS. It can handle both RAC and single instance databases and helps in database consolidation. However the storage servers cannot communicate with one another; instead all communication is forced via the InfiniBand network to DB node and then back to the storage tier. Exadata supports both row and columnar format through Hybrid Columnar (eHCC) Compression and as well as in-memory store.
Oracle Exadata comes out in multiple flavors (quarter, half, full rack, etc.; in case of SuperCluster half and full rack; storage expansion rack) and can be scaled as and when required. Exadata also provides connectors for Big Data, Hadoop, ODBC, JDBC and more. Since the Sun acquisition Oracle now owns the entire stack both hardware and software on the Exadata.
EMC Greenplum –
Greenplum Database is a MPP (massively parallel processing) database server and is based on PostgreSQL open-source technology. MPP is inherently a shared nothing architecture with 2 or more server nodes. Greenplum uses this architecture to derive high performance and thereby load distribution for very large data warehouses. This translates to 2 or more PostgreSQL database instances acting together as a single unified DBMS. All nodes can scan and process in parallel. To this extent this is similar to an Oracle RAC database with 2 or more nodes. User’s interaction is similar to PostgreSQL DBMS. Supports table structures as heap, row/column oriented and external. This is also true of Oracle Exadata.
EMC acquired Greenplum in July 2010. EMC has built an advanced analytics appliance (Greenplum DCA) to develop Business Intelligence as a Service (BIaaS). This has been designed for predictive analytics for Big Data.
HP Vertica -
Vertica is also built based on the MPP architecture and can be scaled-out. It employs column-oriented RDBMS and compression of data. This enables Vertica to handle analytical workloads through a distributed (multiple nodes) compressed columnar architecture. It provides support for open ANSI SQL, ODBC and JDBC along with built-in predictive analysis via Ruby and Python. Vertica can be used together with Hadoop. The Vertica cluster architecture stores data on the database in two containers - Write Optimized Store (WOS) and Read Optimized Store (ROS).
The WOS stores data in memory without compression or indexing whereas the ROS stores data on disk wherein it is segmented, sorted, and compressed for high optimization. A Tupler Mover handles the data movement between the WOS and ROS.
Neteeza –
Neteeza employs an asymmetric massively parallel processing (AMPP) architecture. IBM acquired Neteeza in Sep 2010 and has now integrated this technology into their PureData System appliance. This appliance is tailored for Data Warehouse workloads and the architecture makes use of the IBM blade servers and disk storage using IBM’s FPGAs (Field Programmable Gate Arrays). The FPGA is a special chip having a large number of gates that are programmable to implement most of the logical functions needed to manage streaming processing tasks. Like all of its competitors provides integration with ODBC, JDBC, ETL and BI tools as well.
If you already have Oracle DBs, then Exadata will be the solution as training for the new machine will be almost null, except if they don't have RAC skills.
Adjusting Oracle Instance to work for DW is not that much a headache.
I have trained a large Telco in Africa for Exadata. 50% of the attendants were DW on Teradata. They accepted the idea that they will can use Exadata for DW tasks in the future beside/over Teradata.
Oracle sees Teradata as the main competitor not the other you mentioned here
Exadata is shipped from one vendor only al built to match each other.
Generally when I look to other DB/DW systems I found that they are multi-vendor inside, which is a drawback for me.
Why you don't also consider Tedata too in comparison?
If I were you I will set with all vendors and put them in the real offering.
Some, said that Exadata is very hard to adjust. This is very wired for me. The machine is very well balanced in configuration. Setup by Oracle ACS or certified partner, with attending the one course your team will have full control over the box.
Exadata is the Storge engine for Bigdata shipped by Oracle too.
Go with HPE Vertica, extremely fast, platform independent, very well integrated with Hadoop (all distributions) and easy to use ! Definitely the most economic solution to purchase and maintain.
More info Ican give if i new business case.
Tomasz
Depends what you want to do. Exadata is a hardware solution ORACLE offers to increase performance for high volume data processing, plays very well if you are an ORACLE shop. If you tech stack is IBM based, DB2, Websphere etc then I would certainly look at Netezza..
Usually I think of Green Plum for uses cases involving data analysis and presentation. seems to me they changed owners recently.
I have no experience with HPE.
@it_user429198 so here we are in 2023.. oracle X9 and we did have as a comparison HPE vertica... What is a good replacement for the X9 assuming we are replacing the infrastructure?
They are different products so a comparison is not easy without knowing the scope of the project. The main difference is that Oracle is a RDBMS platform while the other 3 are columnar DBs or similar. This means that HPE Vertica, EMC GreenPlum, IBM Netezza are much efficiente (5-8x less storage) and fast DBs (3-7x query speed) but the lack some RDBMS capabilities especially and they have some limits in concurrent access (100' vs 1000'). This should be a problem for large DWH with thousand of simultaneous accesses, but if you don't need 1000' simultaneous access that's not an issue
The other big difference is price and cost of ownership, Oracle is much higher than other 3.
Regards the difference between vertica, greemplun and netezza; HP Vertica and greenplum are the more DB than netezza. IBM netezza is more an appliance to use in conjunction with RDBMS technologies. vertica and Greenplum have a lot of DWH/OLAP features both on specialized queries (dimensional queries) than on geoloaction capabilities and on DWH aspects (resource manager to allocate more resource to specific queries, Query optimized, ..).
I am strongly in favor of Vertica. It's columnar storage is fast, flexible, load times are lightning fast, and compression is extremely impressive. Their latest version allows for more personalized query hints. The included Database Designer helps you design your database for optimal performance. Training is top notch. Support can be a bit sketchy at times, but they are very friendly, knowledgeable and helpful. I've been using Vertica for three years and would not go back to using any other product.
HPE Vertica is the best solution. Extreme low latency query performance. Better compression capabilities. Lower TCO . Facebook and Uber runs on Vertica
EMC GreenPlum : High TCO because of long and complex implementation cycles
Not well suited for real time workloads requiring continuous load and query. Proprietaray HW
IBM Netezza: Inferior workload management ( allocating query resources among different groups of users)
Expensive “fork lift” upgrades driven by appliance sizing not incremental customer need - adds significant downtime. Proprietaray HW
Oracle Exadata: High Admin and Skills costs (RAC) Complex to set up. Expensive. To match Vertica, customers will require a much higher hardware footprint thus significantly increasing TCO. Proprietaray HW
Netezza would have to be my answer as well. As mentioned before GreenPlum is on its way out as EMC has recently released multiple new platforms and are beginning to focus more along those lines. Netezza excels in the structured data capacity, but if you are looking for more of a catch-all data types, I might suggest an EMC Isilon solution.
My vote is for netezza from this list.
Exadata - More generic, covering oltp to olap/analytical use cases. But difficult to configure for olap use cases....100s of parameters to deal with. Oracle RDBms optimizer still maturing to easily and completely leverage Exadata configuration advantages such as that from full scans, storage indexes etc
Vertica - very efficient for use cases where processing of specific or limited columns is required. Such as olap queries requiring aggregation of a specific column value. However not the most efficient for the wider use cases supported by an EDW. Good for reporting data marts not for EDWs
Greenplum - cheaper platform however etl vendor support is not great, like informatica pdo not fully supported. Use postgress SQL...not very different from ANSI or oracle SQL....but still some differences are there.
Netezza - pretty good dw appliance configuration.....very little overheads for tuning, indexing or any other configuration required in other dw appliances like exadata. Good from ease of deployment and use, and performance perspective. Capable of better supporting wider use cases of an EDW as compared to others from the list
for EDW I would have to say Netezza. For ease of data loading and parallelism on queries puts it far above any competitor listed.
This is a question that is often asked because of the overlap and unique differentiation of each product. First, it should be known that I am in favor of Vertica as the overall leading EDW based on multiple use cases ranging from structured, semi-structured and unstructured data sets. There is a reason why Facebook uses Vertica.
GreenPlum was a good concept but it is too storage-centric imho. Applications are no longer written from the storage layer, but rather top-down with a solid API stack to black box infrastructure altogether and this makes GreenPlum less than desirable as a critical infrastructure component.
Netezza is solid and has performed well over the years. Works very well with structured data use cases and process-oriented applications.
However, it is becoming harder to justify the enterprise grade EDWs due to the innovation and performance of in-memory database architectures with a high performance storage backend (i.e., MapR, DDN, etc.). The integration of Hadoop (lakes or streams) makes the in-memory approach more effective, relevant and scalable.
Last note: the increasing demand for near real-time analytics is also driving more disruption in the traditional EDW/3-tier architecture approach. With the right storage backend and a high performing in-memory database, the ROI comparison begins to unfold rather quickly.
We’re in the process of comparing Oracle Exadata vs. HPE Vertica vs. EMC GreenPlum vs. IBM Netezza.
What would be your requirement – Batch processing, split seconds response, DW ?
Greenplum is very good for batch processing and a good data warehouse. Netezza is similar to Greenplum but having better support.
Barath
There are a number of factors that you'll need to consider. The biggest advantage to all of them is data compression over a traditional RDMBS. I'm not an expert but I have worked with Oracle Exadata, HPE Vertica and IBM Netezza. If you're already using Oracle products then Exadata seems like the logical choice. Same for IBM and Netezza. Both I believe are appliances. Not sure about EMC GreenPlum, but HPE Vertica is the only pure columnar that I've worked with which probably has the highest data compression. From my experience, you need to take time to figure out how to use the product. For example, IBM DB2 had materialized tables and IBM Netezza has no such function. The hints used in Oracle 11g didn't work once we switched over to Netezza. For HPE Vertica, if you're used to writing queries with a lot of columns, each additional column degrades performance a bit. All the issues raised are solvable but it takes time to figure out the best approach. HPE Vertica was the cheapest to install and maintain. Support wasn't great and that's something that would be probably better with Oracle and IBM. Hope this helps.
That depends on your use case. What are you trying to do?
My very short answer, IBM netezza.
Exa and Ver are tooled together. GreenPlum may become history.
Regards,
MH
This is very much dependant on the size of the organisation's data, alongside what you're trying to achieve. If you're attempting to have a Kimball-style data warehouse, perhaps using ETL routines against a Hadoop data lake, you might well consider one of these perfectly capable tools.
If you're intending to set up an Inmon-style data warehouse, these may also be a good fit. In my personal experience, I would opt for Greenplum out of the list of technolgies. My organisation uses this and it is a very good MPP solution. That said, if speed of query processing is something you're keen to do, you may wish to extend your list to include the next generation of MPP databases, such as Exasol.