This solution is used as part of our data warehouse solution. We have some customer indexing content in this Vertica product. Vertica is a relational database that is used as part of our data warehouse implementation.
Team Lead Solutions Architect at IMEXPERTS DO BRASIL
Data analytics solution where data can be coded and compressed and does not require additional infrastructure
Pros and Cons
- "Vertica is a great product because customers can compress and code data. The infrastructure that data warehouse solutions need is a commodity server so that customers don't have to invest in infrastructure."
- "In a future release, we would like to have artificial intelligence capabilities like neural networks. Customers are demanding this type of analytics."
What is our primary use case?
What is most valuable?
Vertica is a great product because customers can compress and code data. The infrastructure that data warehouse solutions need is a commodity server so that customers don't have to invest in infrastructure. Vertica is a column oriented database so the response of queries is very fast.
What needs improvement?
In a future release, we would like to have artificial intelligence capabilities like neural networks. Customers are demanding this type of analytics.
For how long have I used the solution?
We have been using this solution for six years.
Buyer's Guide
Vertica
October 2024
Learn what your peers think about Vertica. Get advice and tips from experienced pros sharing their opinions. Updated: October 2024.
816,406 professionals have used our research since 2012.
What do I think about the stability of the solution?
This is a stable solution.
What do I think about the scalability of the solution?
This is a scalable solution.
How are customer service and support?
The customer service for this solution is good.
How would you rate customer service and support?
Neutral
How was the initial setup?
The initial setup is straightforward.
What other advice do I have?
It is important for those considering this solution to have some experience in database management solutions because Vertica is a relational database. Knowledge regarding SQL and any other database related skills is very important in order to implement or use this product correctly.
I would rate this solution a nine out of ten.
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Sr DBA/ DBA Tech Lead at a non-profit with 1,001-5,000 employees
A scalable unified analytics platform with good performance
Pros and Cons
- "The feature I like best is performance. We use Red Tool and Red Job for the data warehouse and reporting. It's perfect. Performance is good, and it can return ad hoc queries very quickly. Of course, it's a cluster, so it's easy to scale."
- "It's hard to make it slow for a small data volume. For large volumes, it's hard to make it work. It's also hard to make it faster, and to make it scale."
What is our primary use case?
Our use case is a typical data warehouse. We just use the data warehouse for reporting and the storage of data. Our users are the staff team who do the reporting and data analysis.
What is most valuable?
The feature I like best is performance. We use Red Tool and Red Job for the data warehouse and reporting. It's perfect. Performance is good, and it can return ad hoc queries very quickly. Of course, it's a cluster, so it's easy to scale.
What needs improvement?
It's hard to make it slow for a small data volume. For large volumes, it's hard to make it work. It's also hard to make it faster, and to make it scale.
For how long have I used the solution?
I have been using Vertica for about five years.
What do I think about the stability of the solution?
It's a stable solution.
What do I think about the scalability of the solution?
Vertica is a scalable solution.
How are customer service and technical support?
Technical support is good, and they react quickly.
How was the initial setup?
The initial setup is okay. You will need some knowledge and some training. I'd say learning takes a couple of months. We use one person to maintain the database side. With the DevOps team, everyone has a different role. But for our database, it's just one person.
What's my experience with pricing, setup cost, and licensing?
The price is reasonable. We use a pay per license model. Firstly, you need to buy a license. After that, you mainly pay the annual support fee of around 20% or 25%. I think their prices are quite reasonable.
What other advice do I have?
We tried to use data lake kind of stuff for machine learning, but for the key functionality of the data warehouse, it's great. Personally, I feel they are over-marketing the machine learning feature and for something like the semi-structured data. But for the data warehouse, it's truly a good solution. I want to recommend it highly.
I would tell potential users that it's hard to make it slow for small data volumes. For large volumes, it's hard to make it work, make it faster, and make it scale. Depending on your workload and your use case, you need to first purchase the Red Tool. After that, you need to follow the best practices to have an efficient design.
On a scale from one to ten, I would give Vertica a nine.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Vertica
October 2024
Learn what your peers think about Vertica. Get advice and tips from experienced pros sharing their opinions. Updated: October 2024.
816,406 professionals have used our research since 2012.
Creator and Manager of Intelligent Water Loss Management Models at Qintess
Enhanced capabilities, good customer service, large data scalability and stable
Pros and Cons
- "The solution has great capabilities. The tool that instructs the internal database forward is easy to use and is very powerful."
- "They could improve on customer service."
What is our primary use case?
The solution is a BI solution that includes machine learning. Our company is involved in the distribution of water and we use it to capturing data for several points and to discover where there might have been a loss of drinkable water.
There is a problem with the water distribution because the company that I'm working for has an index of 32% of water loss during the process of the distribution. These losses can be from a different source. It can be from leakage, it can be an error or on the meter read which can have many issues, sometimes the problem occurs in different hours depending on the pressure of the water network. We need to use artificial intelligence to collect millions of the data points to detect where the problem might be coming from.
What is most valuable?
The solution has great capabilities. The tool that instructs the internal database forward is easy to use and is very powerful.
What needs improvement?
The product could be less expensive and could benefit from a better marketing strategy.
In a future release, I would like to have one application to help create intelligent models.
For how long have I used the solution?
We have been testing and developing the solution for two years.
What do I think about the stability of the solution?
We did not have any technical problem with the solution.
What do I think about the scalability of the solution?
The solution has great scalability. We started with one terabyte of compressed data, this is a lot of data and we never had problems with the scalability. You can have hundreds of terabytes with the solution if you want, it all depends on your needs.
How are customer service and technical support?
The customer service is very good. They could improve on their expertise and knowledge of bigger projects and their support for them. Some of their information about collecting data I was not satisfied with their help. They could improve on customer service a bit.
I rate the technical support an eight out of ten.
Which solution did I use previously and why did I switch?
We currently use IDOL as well as Vertica.
How was the initial setup?
The product is not easy to set up because you need a lot of training. It has less to do about the product itself, but the knowledge on how to use it. For example, the spreadsheet product Excel, If you don't know mathematics, you will have difficulty to make big Excel model and that's the same with the Vertica. It's just only a tool and depending on your capabilities to design what you need.
What about the implementation team?
We are the developers of a solution and it is a very sophisticated project. It's something that we spent two years to develop. We already tested it and we are only waiting for the customer to try it and then purchase it. It has taken some time to implement the solution to the way we wanted for our company.
What's my experience with pricing, setup cost, and licensing?
It's difficult today to compete with open-source solutions. In these areas, there is a lot of competition and the price of this solution is a bit pricy.
Which other solutions did I evaluate?
We use another product called IDOL and use them both together as our solution. Sometimes you use both or sometimes you use each one separately. The two products are machine learning products but with different uses. IDOL has a more developed application and is much bigger than in Vertica.
What other advice do I have?
This solution is used by several big companies such as Bank of America, Uber, and Facebook. Where you need a BI with intelligence. We use the solution because it is very good, you can make interconnections with anything to collect the data, any type of data. I have tried other products and they did not fit as well as this one did, I recommend Vertica.
I rate Vertica a nine out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Data Scientist at a media company with 501-1,000 employees
The fact that it is a columnar database is valuable. Columnar storage has its own benefit with a large amount of data.
What is most valuable?
The fact that it is a columnar database is valuable. Columnar storage has its own benefit with a large amount of data. It's superior to most traditional relational DB when dealing with a large amount of data. We believe that Vertica is one of the best players in this realm.
How has it helped my organization?
Large-volume queries are executed in a relatively short amount of time, so that we could develop reports that consume data in Vertica.
What needs improvement?
Speed: It's already doing what it is supposed to do in terms of speed but still, as a user, I hope it gets even faster.
Specific to our company, we do store the data both in AWS S3 and Vertica. For some batch jobs, we decided to create a Spark job rather than Vertica operations for speed and/or scalability concerns. Maybe this is just due to the computation efficiency between SQL operations vs. a programmatic approach. Even with some optimization (adding projections for merge joins and grouped by pipelined), it's still taking a longer time than a Spark job in some cases.
For how long have I used the solution?
I have personally used it for about 2.5 years.
What do I think about the stability of the solution?
I have not recently encountered any stability issues; we have good health checks/monitoring around Vertica now.
What do I think about the scalability of the solution?
I have not encountered any scalability issues; I think it's scalable.
How are customer service and technical support?
N/A; don't have much experience on this.
Which solution did I use previously and why did I switch?
We do have some pipelines accessing raw data directly and process it as a batch Spark job. Why? I guess it's because the type of operations we do can be done easily in code vs. SQL.
What other advice do I have?
I would recommend using Vertica for those people/teams having large denormalized fact tables that need to be processed efficiently. I worked around optimizing the query performance dealing with projections, merge joins and groupby pipelines. It paid off at the end.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Management Consultant at a computer software company with 51-200 employees
A SQL-based compute platform like Vertica enables far less human overhead in operations and analytics.
What is most valuable?
Scale-out, analytical functions, ML.
How has it helped my organization?
We are an HP partner. A SQL-based compute platform like Vertica enables far less human overhead in operations and analytics.
What needs improvement?
More ML, both data prep, models, evaluation and workflow. Improved support for deep analytics/ predictive modelling with machine learning algorithms. This area of analytics need a stack of functionality in order to support the scenario. The needed functionality includes:
- Data preparation. Scaling, centering, removing skewness, gap filling, pivoting, feature selection and feature generation
- Algorithms/models. Non-linear models in general. More specifically, penalized models, tree/rule-based models (incl. ensambles), SVM, MARS, Neural networks, K-nearest neighbours, Naïve bayes, etc.
- Support the concept of a “data processing pipeline” with data prep. + model. One would typically use “a pipeline” as the overall logical unit used to produce predictions/scoring.
- Automatic model evaluation/tuning. With algorithms requiring tuning, support for automated testing of different settings/tuning parameters is very useful. Should include (k fold) cross validation and bootstrap for model evaluation
- Some sort of hooks to use external models in a pipeline i.e. data prep in Vertica + model from Spark/R.
- Parity functionality for the Java SDK compared to C++. Today the C++ SDK is the most feature rich. The request is to bring (and keep) the Java SDK up to feature parity with C++.
- Streaming data and notifications/alerts. Streaming data is starting to get well supported with the Kafka integration. Now we just need a hook to issue notifications on streaming data. That is, running some sort of evaluation on incoming records (as they arrive to the Vertica tables) and possibly raising a notification.
For how long have I used the solution?
Two years.
What was my experience with deployment of the solution?
No, not really.
What do I think about the stability of the solution?
No.
What do I think about the scalability of the solution?
No.
Which solution did I use previously and why did I switch?
Postgresql, MySQL, SQL Server. Switched because of scalability and reliability, analytics functionality. V being a better engineered product.
How was the initial setup?
Straightforward. Good docs helped a lot.
What's my experience with pricing, setup cost, and licensing?
Its reasonably priced for non-trivial data problems.
Which other solutions did I evaluate?
Yes, Hadoop / Spark, SQL Server.
What other advice do I have?
See additional functionality above.
Disclosure: My company has a business relationship with this vendor other than being a customer: We are a vendor partner.
CIO at a tech services company with 1,001-5,000 employees
It works well. When we ran into issues, there seemed to be a lot of different opinions for how to resolve them.
What is most valuable?
We use Vertica as our primary data warehouse. It works well, relatively, most of the time.
What needs improvement?
I just expect it to work and be serviceable. When we ran into issues, there seemed to be a lot of different opinions for how to resolve the issues and that was the feedback I gave to them. You talked to one tech, you talk to a different tech they had a much different approach. That was a big frustration point for us.
The upgrade path and which way we should go. So at the end it created a lot of confusion for us, so I wouldn't upgrade it again lightly. We're going to remain on it for the next year, but we'll probably re-evaluate at that point if we want to continue with Vertica or something else.
What do I think about the stability of the solution?
It's been stable since November and before that, to be fair, it was stable for quite a while.
What do I think about the scalability of the solution?
The reason we like Hadoop and others is because they scale up, pricing doesn't scale up at the same level. Vertica is a license per terabyte product. They do give you discounts the more volume you get, but it adds up over time fast. We could scale at a lower cost with than other solutions.
Scaling was a pain point. Getting recommendation on how to set it up ultimately to provide the best performance, how many notes, other things. We got different answers from them.
Which solution did I use previously and why did I switch?
We use MongoDB for some of our other internal production apps. It's a lot more involved and more complex than we like to go for a, just standard data warehouse, but we might look at Hadoop or similar for that.
How was the initial setup?
There's a lot of complexities with the upgrade and costs of data failures. That was last year. It was kind of good that I forgot about those pain points.
What other advice do I have?
I would recommend that they highly evaluate all their options. If they're just going to run a small data warehouse, it's probably not a bad solution. If it's something they know is going to grow dramatically and unpredictably? I don't know. I would evaluate hard.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Technical Team Lead, Business Intelligence at a tech company with 501-1,000 employees
The most valuable feature is the merge function, which is essentially the upsert function. We've had issues with query time taking longer than expected for our volume of data.
What is most valuable?
The most valuable feature is the merge function, which is essentially the upsert function. It's become our ELT pattern. Previously, when we used the ETL tool to manage upserts, the load time was significantly longer. The merge function load time is pretty much flat relative to the volume of records processed.
How has it helped my organization?
HP Vertica has helped us democratize data, making it available to users across the organization.
What needs improvement?
We've had issues with query time taking longer than expected for our volume of data. However, this is due to not understanding the characteristics of the database and how to better tune its performance.
For how long have I used the solution?
We've been using HP Vertica for three years, but only in the last year have we really started to leverage it more. We're moving to a clustered environment to support the scale out of our data warehouse.
We use it as the database for the our data warehouse. In it's current configuration, we use it as a single node, but we're moving to a clustered environment, which is what the vendor recommends.
What was my experience with deployment of the solution?
We had no issues with the deployment.
What do I think about the stability of the solution?
We've had no issues with the stability.
What do I think about the scalability of the solution?
We've had no issues scaling it.
How are customer service and technical support?
I'd rate technical support as low to average. The tech support provides the usual canned response. We've had to learn most of how to harness the tool on our own.
Which solution did I use previously and why did I switch?
I haven't used anything similar.
How was the initial setup?
HP Vertica was in place when I joined the company, but it wasn't used as extensively as it is now.
What about the implementation team?
We implemented it in-house, I believe.
What other advice do I have?
Loading into HP Vertica is straightforward, similar to other data warehouse appliance databases such as Netezza. However, tuning it for querying requires a lot more thought. It uses projections that are similar to indexes. Knowing how to properly use projections does take time. One thing to be mindful of with columnar databases is that the fewer the columns in your query, the faster the performance. The number of rows impacts query time less.
My advice would be to try out the database connecting to your ETL tools and perform time studies on the load and query times. It's a good database. It works similar to Netezza from my experience but it is a lot cheaper. Pricing is based on the size of the database.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Chief Data Scientist at a tech vendor with 10,001+ employees
We're using Vertica, just because of the performance benefits. On big queries, we're getting sub-10 second latencies.
My company recognized early, near the inception of the product, that if we were able to collect enough operational data about how our products are performing in the field, get it back home and analyze it, we'd be able to dramatically reduce support costs. Also, we can create a feedback loop that allows engineering to improve the product very quickly, according to the demands that are being placed on the product in the field.
Looking at it from that perspective, to get it right, you need to do it from the inception of the product. If you take a look at how much data we get back for every array we sell in the field, we could be receiving anywhere from 10,000 to 100,000 data points per minute from each array. Then, we bring those back home, we put them into a database, and we run a lot of intensive analytics on those data.
Once you're doing that, you
realize that as soon as you do something, you have this data you're starting to
leverage. You're making support recommendations and so on, but then you realize
you could do a lot more with it. We can do dynamic
cache sizing. We can figure out how much cache a customer needs based on
an analysis of their real workloads.
We found that big
data is really paying off for us. We want to continue to increase how much it's
paying off for us, but to do that we need to be able to do bigger queries
faster. We have a team of data scientists and we don't want them sitting here
twiddling their thumbs. That’s what brought us to Vertica.
We have a very tight feedback loop. In one release we put out, we may make some changes in the way certain things happen on the back end, for example, the way NVRAM is drained. There are some very particular details around that, and we can observe very quickly how that performs under different workloads. We can make tweaks and do a lot of tuning.
Without the kind of data we have, we might have to have multiple cases being opened on performance in the field and escalations, looking at cores, and then simulating things in the lab.
It's a very labor-intensive, slow process with very little data to base the decision on. When you bring home operational data from all your products in the field, you're now talking about being able to figure out in near real-time the distribution of workloads in the field and how people access their storage. I think we have a better understanding of the way storage works in the real world than any other storage vendor, simply because we have the data.
I don’t remember the exact year, but it may have been eight years ago roughly that I became aware of Vertica. At some point, there was an announcement that Mike Stonebraker was involved in a group that was going to productize the C-Store Database, which was sort of an academic experiment at UC Berkeley, to understand the benefits and capabilities of real column store.
I was immediately
interested and contacted them. I was working at another storage company at the
time. I had a 20 terabyte
(TB) data
warehouse,
which at the time was one of the largest Oracle on Linux data
warehouses in the world.
They didn't want to
touch that opportunity just yet, because they were just starting out in alpha
mode. I hooked up with them again a few years later, when I was CTO at a
different company, where we developed what's substantially an extract,
transform, and load (ETL) platform.
By then, they were
well along the road. They had a great product and it was solid. So we tried it
out, and I have to tell you, I fell in love with Vertica because of the
performance benefits that it provided.
When you start thinking about collecting as many different data points as we like to collect, you have to recognize that you’re going to end up with a couple choices on a row store. Either you're going to have very narrow tables and a lot of them or else you're going to be wasting a lot of I/O overhead, retrieving entire rows where you just need a couple fields.
That was what piqued my
interest at first. But as I began to use it more and more, I realized that the
performance benefits you could gain by using Vertica properly were another
order of magnitude beyond what you would expect just with the column-store
efficiency.
That's because of
certain features that Vertica allows, such as something called pre-join
projections. At a high-level, it lets you maintain the normalized logical
integrity of your schema, while having under the hood, an optimized denormalized query
performance physically on disk.
Can you be
efficient if you have a denormalized structure on disk because Vertica allows
you to do some very efficient types of encoding on your data. So all of the low
cardinality columns that would have been wasting space in a row store end up
taking almost no space at all.
It's been my
impression, that Vertica is the data warehouse that you would have wanted to
have built 10 or 20 years ago, but nobody had done it yet.
Nowadays, when I'm evaluating other big data platforms, I always have to look at it from the perspective of it's great, we can get some parallelism here, and there are certain operations that we can do that might be difficult on other platforms, but I always have to compare it to Vertica. Frankly, I always find that Vertica comes out on top in terms of features, performance, and usability.
I built the environment at
my current company from the ground up. When I got here, there were roughly 30
people. It's a very small company. We started with Postgres. We started with
something free. We didn’t want to have a large budget dedicated to the backing
infrastructure just yet. We weren’t ready to monetize it yet.
So, we started on
Postgres and we've scaled up now to the point where we have about 100 TBs on
Postgres. We get decent performance out of the database for the things that we
absolutely need to do, which are micro-batch updates and transactional
activity. We get that performance because the database lives here.
I don't know what
the largest unsharded Postgres instance is in the world, but I feel
like I have one of them. It's a challenge to manage and leverage. Now, we've
gotten to the point where we're really enjoying doing larger queries. We really
want to understand the entire installed base of how we want to do analyses that
extend across the entire base.
We want to understand the
lifecycle of a volume. We want to understand how it grows, how it lives, what
its performance characteristics are, and then how gradually it falls into
senescence when people stop using it. It turns out there is a lot of really
rich information that we now have access to to understand storage lifecycles in
a way I don't think was possible before.
But to do that, we
need to take our infrastructure to the next level. So we've been doing that and
we've loaded a large number of our sensor data that’s the numerical data I have
talked about into Vertica, started to compare the queries, and then started to
use Vertica more and more for all the analysis we're doing.
Internally, we're
using Vertica, just because of the performance benefits. I can give you an
example. We had a particular query, a particularly large query. It was to look
at certain aspects of latency over a month across the entire installed base to
understand a little bit about the distribution, depending on different factors,
and so on.
We ran that query in
Postgres, and depending on how busy the server was, it took anywhere from
12 to 24 hours to run. On Vertica, to run the same query on the same data takes
anywhere from three to seven seconds.
I anticipated that
because we were aware upfront of the benefits we'd be getting. I've seen it
before. We knew how to structure our projections to get that kind of
performance. We knew what kind of infrastructure we'd need under it. I'm really
excited. We're getting exactly what we wanted and better.
This is only a
three node cluster. Look at the performance we're getting. On the smaller
queries, we're getting sub-second latencies. On the big ones, we're getting
sub-10 second latencies. It's absolutely amazing. It's game changing.
People can sit at
their desktops now, manipulate data, come up with new ideas and iterate without
having to run a batch and go home. It's adramatic productivity increase. Data
scientists tend to be fairly impatient. They're highly paid people, and you
don’t want them sitting at their desk waiting to get an answer out of the
database. It's not the best use of their time.
When it comes to the cloud
model for deployment, there's the ease of adding nodes without downtime, the
fact that you can create a K-safe
cluster. If my cluster is 16 nodes wide now, and I want two nodes
redundancy, it's very similar to RAID. You can specify that,
and the database will take care of that for you. You don’t have to worry about
the database going down and losing data as a result of the node failure every
time or two.
I love the fact that you don’t have to pay
extra for that. If I want to put more cores or nodes on it or I want to
put more redundancy into my design, I can do that without paying more for it.
Wow! That’s kind of revolutionary in itself.
It's great to see a database company incented
to give you great performance. They're incented to help you work better with
more nodes and more cores. They don't have to worry about people not being able
to pay the additional license fees to deploy more resources. In that sense,
it's great.
We have our own private cloud -- that’s how I
like to think of it -- at an offsite colocation facility. We do DR here. At the same time, we have a K-safe cluster. We had a hardware
glitch on one of the nodes last week, and the other two nodes stayed up, served
data, and everything was fine.
Those kinds of features are critical, and that
ability to be flexible and expand is critical for someone who is trying to
build a large cloud infrastructure, because you're never going to know in
advance exactly how much you're going to need.
If you do your job right as a cloud provider,
people just want more and more and more. You want to get them hooked and you
want to get them enjoying the experience. Vertica lets you do that.
Disclosure: PeerSpot has made contact with the reviewer to validate that the person is a real user. The information in the posting is based upon a vendor-supplied case study, but the reviewer has confirmed the content's accuracy.
Buyer's Guide
Download our free Vertica Report and get advice and tips from experienced pros
sharing their opinions.
Updated: October 2024
Popular Comparisons
Snowflake
Teradata
Oracle Exadata
VMware Tanzu Data Solutions
Apache Hadoop
SAP BW4HANA
IBM Netezza Performance Server
Oracle Database Appliance
SAP IQ
Yellowbrick Data Warehouse
Buyer's Guide
Download our free Vertica Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which is the best RDMBS solution for big data?
- What is the biggest difference between Amazon Redshift and Vertica
- Oracle Exadata vs. HPE Vertica vs. EMC GreenPlum vs. IBM Netezza
- When evaluating Data Warehouse solutions, what aspect do you think is the most important to look for?
- At what point does a business typically invest in building a data warehouse?
- Is a data warehouse the best option to consolidate data into one location?
- What are the main differences between Data Lake and Data Warehouse?
- Infobright vs. Exadata vs. Teradata vs. SQL Server Data Warehouse- which is most compatible with front end tools?
- What is the best data warehouse tool?
- Which Data Strategy solution have you used?