Try our new research platform with insights from 80,000+ expert users
Abhik Ray - PeerSpot reviewer
Co-Founder at Quantic
Real User
Has good processing power and speed and is capable of handling large volumes of data and doing online analysis
Pros and Cons
  • "The most important feature is its ability to handle large volumes. Some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume. So, its processing power and speed are most valuable."
  • "It requires a great deal of learning curve to understand. The overall Hadoop ecosystem has a large number of sub-products. There is ZooKeeper, and there are a whole lot of other things that are connected. In many cases, their functionalities are overlapping, and for a newcomer or our clients, it is very difficult to decide which of them to buy and which of them they don't really need. They require a consulting organization for it, which is good for organizations such as ours because that's what we do, but it is not easy for the end customers to gain so much knowledge and optimally use it."

What is our primary use case?

Its main use case is to create a data warehouse or data lake, which is a collection of data from multiple product processors used by a banking organization. They have core banking, which has savings accounts or deposits as one system, and they have a CRM or customer information system. They also have a credit card system. All of them are separate systems in most cases, but there is a linkage between the data. So, the main motivation is to consolidate all that data in one place and link it wherever required so that it acts as a single version of the truth, which is used for management reporting, regulatory reporting, and various forms of analyses.

We have done two or three projects with Hadoop, and we have taken the latest version available at that time. So far, it was deployed on-premises.

What is most valuable?

The most important feature is its ability to handle large volumes. Some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume. So, its processing power and speed are most valuable.

Another feature that I like is online analysis. In some cases, data requires online analysis. We like using Hadoop for that.

What needs improvement?

It requires a great deal of learning curve to understand. The overall Hadoop ecosystem has a large number of sub-products. There is ZooKeeper, and there are a whole lot of other things that are connected. In many cases, their functionalities are overlapping, and for a newcomer or our clients, it is very difficult to decide which of them to buy and which of them they don't really need. They require a consulting organization for it, which is good for organizations such as ours because that's what we do, but it is not easy for the end customers to gain so much knowledge and optimally use it. However, when it comes to power, I have nothing to say. It is really good.

For how long have I used the solution?

We have been working with this solution for two and a half to three years.

Buyer's Guide
Apache Hadoop
December 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.

What do I think about the stability of the solution?

The core file system and the offline data ingestion are extremely stable. In my experience, there is a bit less stability during online data ingestion. When you have incremental online data, sometimes it stops or aborts before finishing. It is rare, but it does, but the offline data injection and the basic processing are very stable.

What do I think about the scalability of the solution?

Its scalability is very good. Most of our clients have used it on-prem. So, to a large extent, it is up to them to provide hardware for large data, which they have. Its scalability is linear. As long as the hardware is given to it, there are no complaints.

About 70% of its users are from a client's IT in terms of setting it up and providing support to make sure that the pipeline is there. Business users are about 30%. They are the people who use the analytics derived from the warehouse or data lake. Collectively, there are about 120 users. The size of the data is mostly in terms of the number of records it handles, which could be 30 or 40 million.

How are customer service and support?

We have not dealt with them too many times. I would rate them a four out of five. There are no complaints.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Some of our clients are using Teradata, and some of them are using Hadoop.

How was the initial setup?

After the hardware is available, getting the environment and software up and running has taken us a minimum of a week or 10 days. Sometimes, it has taken us longer, but usually, this is what it takes at the minimum to get everything up. It includes the downloads and also setting it up and making things work together to start using it.

For the original deployment, because there are so many components and not everyone knows everything pretty well, we have seen that we had to deploy four or five people in various areas at the initial deployment stage. However, once it is running, one or two people are required for maintenance.

What was our ROI?

Different clients derive different levels of return based on the sophistication of the analytics that they derive out of it and how they use it. I don't know how much ROI they have got, but I can say that some clients have not got a decent ROI, but some of our clients are happy with it. It is very much client-dependent.

What's my experience with pricing, setup cost, and licensing?

We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable.

What other advice do I have?

I would rate it a nine out of ten because of the complexity, but technically, it is okay.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Aria Amini - PeerSpot reviewer
Data Engineer at Behsazan Mellat
Real User
Top 5
A big-data engineering solution that integrates well into a variety of environments
Pros and Cons
  • "Its integration is Hadoop's best feature because that allows us to support different tools in a big data platform."
  • "It could be more user-friendly."

What is our primary use case?

We use the Apache Hadoop environment for use cases involving big data engineering. We have many applications, such as collecting, transforming, loading, and storing lag event data for big organizations.

What is most valuable?

Its integration is Hadoop's best feature because that allows us to support different tools in a big data platform. Hadoop can integrate all of these features in various environments and have use cases beyond all of the tools in the environment.

What needs improvement?

It could be more user-friendly. Other platforms, such as Cloudera, used for big data, are more user-friendly and presented in a more straightforward way. They are also more flexible than Hadoop. Hadoop's scrollback is not easy to use, either.

For how long have I used the solution?

I have used Apache Hadoop for three years, and I use Hadoop's open-source version.

What do I think about the stability of the solution?

Hadoop is stable because it's run on a cluster. And if some issues occur for a range of servers, Hadoop could continue its activity.

What do I think about the scalability of the solution?

Apache Hadoop is very good for scalability because one of its main features is its scalability tool. For all the big data infrastructure, we have about ten employees working in the Hadoop environment as engineers and developers. One of our clients is a bank, and the Hadoop environment can retrieve a lot of data, so we could have an unlimited number of end users.

How was the initial setup?

The initial setup is, to some extent, difficult because additional skills are required, specifically knowledge of the operating system at installation. We need someone with professional skills to install the Hadoop environment. With one engineer with those skills, Hadoop takes ten days to two weeks to deploy the solution.

Two or three people are needed to maintain the solution. At least two people are required to maintain the Hadoop stack, in case of unexpected situations, like when something gets corrupted, and they need to solve the problem as fast as possible. Hadoop is easy to maintain because of its governance feature, which helps maintain all the Hadoop stacks.

Which other solutions did I evaluate?

Some competitors include Kibana from Elasticsearch, Splunk, and Cloudera. Each of them has some advantages and disadvantages, but Hadoop is more flexible when working in a big data environment. Compared to Splunk and Cloudera, Apache Hadoop is platform-independent and works on any platform. It is also open-source.

What other advice do I have?

We use Hadoop's open-source version and do not receive direct support from Apache. There are good resources on the web, though, so we have no problem getting help, but not directly from the company.

If you want to use big data on a larger scale, you should use Hadoop. But you could use alternatives if you're going to use big data to analyze data in the short term and don't need cybersecurity. You could use your cloud's features. For example, if you are on Google or Amazon Cloud, you could use in-built features instead of Apache Hadoop. If you are, like us, working with banks that don't want to use the cloud or some commercial clouds or have large-scale data, Hadoop is a good choice for you.

I rate Apache Hadoop an eight out of ten because it could be more user-friendly and easier to install. Also, Hadoop has changed some features in the commercial version.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Apache Hadoop
December 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.
CEO at AM-BITS LLC
Real User
Top 5
A hybrid solution for managing enterprise data hubs, monitoring network quality, and implementing an AntiFraud system
Pros and Cons
  • "The most valuable feature is scalability and the possibility to work with major information and open source capability."
  • "The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data."

What is our primary use case?

This solution is used for a variety of purposes, including managing enterprise data hubs, monitoring network quality, implementing an AntiFraud system, and establishing a conveyor system.

What is most valuable?

The most valuable feature is scalability and the possibility to work with major information and open source capability.

What needs improvement?

The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data.

For how long have I used the solution?

I have been using Apache Hadoop for ten years. Initially, we worked directly, but now we use Cloudera and Bigtop. We are the solution provider.

What do I think about the stability of the solution?

The tool's stability is good. 

What do I think about the scalability of the solution?

We may have 15 people working on this solution.

I rate the solution’s scalability a ten out of ten.

How was the initial setup?

The setup is not easy for a financial or telecom company.


It takes around one month for basic development and around three to four months for enterprise. We require more than 50 engineers to do the engineering stuff and more than 20 If for the data engineering team.


In terms of production, the most significant aspects are security and staging, with a focus on either a one-month or three-month timeframe for security considerations.

What other advice do I have?

The best advice is not to start a project based on Apache Hadoop alone. It is based on technology, and needs a skilled team.

Overall, I rate the solution an eight out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
PeerSpot user
Infrastructure Engineer at Zirous, Inc.
Real User
Top 20
The Distributed File System stores video, pictures, JSON, XML, and plain text all in the same file system.

What is most valuable?

The Distributed File System, which is the base of Hadoop, has been the most valuable feature with its ability to store video, pictures, JSON, XML, and plain text all in the same file system.

How has it helped my organization?

We do use the Hadoop platform internally, but mostly it is for R&D purposes. However, many of the recent projects that our IT consulting firm has taken on have deployed Hadoop as a solution to store high-velocity and highly variable data sizes and structures, and be able to process that data together quickly and efficiently.

What needs improvement?

Hadoop in and of itself stores data with 3x redundancy and our organization has come to the conclusion that the default 3x results in too much wasted disk space. The user has the ability to change the data replication standard, but I believe that the Hadoop platform could eventually become more efficient in their redundant data replication. It is an organizational preference and nothing that would impede our organization from using it again, but just a small thing I think could be improved.

For how long have I used the solution?

This version was released in January 2016, but I have been working with the Apache Hadoop platform for a few years now.

What was my experience with deployment of the solution?

The only issues we found during deployment were errors originating from between the keyboard and the chair. I have set up roughly 20 Hadoop Clusters and mostly all of them went off without a hitch, unless I configured something incorrectly on the pre-setup.

What do I think about the stability of the solution?

We have not encountered any stability problems with this platform.

What do I think about the scalability of the solution?

We have scaled two of the clusters that we have implemented; one in the cloud, one on-premise. Neither ran into any problems, but I can say with certainty that it is much, much easier to scale in a cloud environment than it is on-premise.

How are customer service and technical support?

Customer Service:

Apache Hadoop is open-source and thus customer service is not really a strong point, but the documentation provided is extremely helpful. More so than some of the Hadoop vendors such as MapR, Cloudera, or Hortonworks.

Technical Support:

Again, it's open source. There are no dedicated tech support teams that we've come across unless you look to vendors such as Hortonworks, Cloudera, or MapR.

Which solution did I use previously and why did I switch?

We started off using Apache Hadoop for our initial Big Data initiative and have stuck with it since.

How was the initial setup?

Initial setup was decently straightforward, especially when using Apache Ambari as a provisioning tool. (I highly recommend Ambari.)

What about the implementation team?

We are the implementers.

What's my experience with pricing, setup cost, and licensing?

It's open source.

Which other solutions did I evaluate?

We solely looked at Hadoop.

What other advice do I have?

Try, try, and try again. Experiment with MapReduce and YARN. Fine tune your processes and you will see some insane processing power
results.

I would also recommend that you have at least a 12-node cluster: two master nodes, eight compute/data nodes, one hive node (SQL), 1 Ambari dedicated node.

For the master nodes, I would recommend 4-8 Core, 32-64 GB RAM, 8-10 TB HDD; the data nodes, 4-8 Core, 64 GB RAM, 16-20 TB RAID 10 HDD; hive node should be around 4 Core, 32-64 GB RAM, 5-6 TB RAID 0 HDD; and the Ambari dedicated server should be 2-4 Core, 8-12 GB RAM, 1-2 TB HDD storage.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user340983 - PeerSpot reviewer
it_user340983Infrastructure Engineer at Zirous, Inc.
Top 20Real User

We have since partnered with Hortonworks and are researching into the Cloudera and MapR spaces right now as well. Though our strong suit is Hortonworks, we do have a good implementation team for any of the distributions.

See all 2 comments
Lucas Dreyer - PeerSpot reviewer
Data Engineer at BBD
Real User
Top 5Leaderboard
Good standard features, but a small local-machine version would be useful
Pros and Cons
  • "What comes with the standard setup is what we mostly use, but Ambari is the most important."
  • "In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency."

What is our primary use case?

The primary use case of this solution is data engineering and data files.

The deployment model we are using is private, on-premises.

What is most valuable?

We don't use many of the Hadoop features, like Pig, or Sqoop, but what I like most is using the Ambari feature. You have to use Ambari otherwise it is very difficult to configure.

What comes with the standard setup is what we mostly use, but Ambari is the most important.

What needs improvement?

Hadoop itself is quite complex, especially if you want it running on a single machine, so to get it set up is a big mission.

It seems that Hadoop is on it's way out and Spark is the way to go. You can run Spark on a single machine and it's easier to setup.

In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency. I don't think that this is viable, but if it is possible, then latency on smaller guide queries for analysis and analytics.

I would like a smaller version that can be run on a local machine. There are installations that do that but are quite difficult, so I would say a smaller version that is easy to install and explore would be an improvement.

For how long have I used the solution?

I have been using this solution for one year.

What do I think about the stability of the solution?

This solution is stable but sometimes starting up can be quite a mission. With a full proper setup, it's fine, but it's a lot of work to look after, and to startup and shutdown.

What do I think about the scalability of the solution?

This solution is scalable, and I can scale it almost indefinitely.

We have approximately two thousand users, half of the users are using it directly and another thousand using the products and systems running on it. Fifty are data engineers, fifteen direct appliances, and the rest are business users.

How are customer service and technical support?

There are several forums on the web, and Google search works fine. There is a lot of information available and it often works.

They also have good support in regards to the implementation.

I am satisfied with the support. Generally, there is good support.

Which solution did I use previously and why did I switch?

We used the more traditional database solutions such as SAP IQ  and Data Marks, but now it's changing more towards Data Science and Big Data.

We are a smaller infrastructure, so that's how we are set up.

How was the initial setup?

The initial setup is quite complex if you have to set it up yourself. Ambari makes it much easier, but on the cloud or local machines, it's quite a process.

It took at least a day to set it up.

What about the implementation team?

I did not use a vendor. I implemented it myself on the cloud with my local machine.

Which other solutions did I evaluate?

There was an evaluation, but it was a decision to implement with Data Lake and Hortonworks data platform.

What other advice do I have?

It's good for what is meant to do, a lot of big data, but it's not as good for low latency applications.

If you have to perform quick queries on naive or analytics it can be frustrating.

It can be useful for what it was intended to be used for.

I would rate this solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Real User
We are able to ingest huge volumes/varieties of data, but it needs a data visualization tool and enhanced Ambari for management
Pros and Cons
  • "Initially, with RDBMS alone, we had a lot of work and few servers running on-premise and on cloud for the PoC and incubation. With the use of Hadoop and ecosystem components and tools, and managing it in Amazon EC2, we have created a Big Data "lab" which helps us to centralize all our work and solutions into a single repository. This has cut down the time in terms of maintenance, development and, especially, data processing challenges."
  • "Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done."
  • "Most valuable features are HDFS and Kafka: Ingestion of huge volumes and variety of unstructured/semi-structured data is feasible, and it helps us to quickly onboard a new Big Data analytics prospect."
  • "Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them."
  • "General installation/dependency issues were there, but were not a major, complex issue. While migrating data from MySQL to Hive, things are a little challenging, but we were able to get through that with support from forums and a little trial and error."

What is our primary use case?

Big Data analytics, customer incubation. 

We host our Big Data analytics "lab" on Amazon EC2. Customers are new to Big Data analytics so we do proofs of concept for them in this lab. Customers bring historical, structured data, or IoT data, or a blend of both. We ingest data from these sources into the Hadoop environment, build the analytics solution on top, and prove the value and define the roadmap for customers.

How has it helped my organization?

Initially, with RDBMS alone, we had a lot of work and few servers running on-premise and on cloud for the PoC and incubation. With the use of Hadoop and ecosystem components and tools, and managing it in Amazon EC2, we have created a Big Data "lab" which helps us to centralize all our work and solutions into a single repository. This has cut down the time in terms of maintenance, development and, especially, data processing challenges. 

We were using MySQL and PostgreSQL for these engagements, and scaling and processing were not as easy when compared to Hadoop. Also, customers who are embarking on a big journey with semi-structured information prefer to use Hadoop rather than a RDBMS stack. This gives them clarity on the requirements.

In addition, since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done.

Flexibility, ease of data processing, reduced cost and efforts are the three key improvements for us.

What is most valuable?

HDFS and Kafka: Ingestion of huge volumes and variety of unstructured/semi-structured data is feasible, and it helps us to quickly onboard a new Big Data analytics prospect.

What needs improvement?

Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them.

For how long have I used the solution?

Less than one year.

What do I think about the stability of the solution?

We have a three-node cluster running on cloud by default, and it has been stable so far without any stoppages due to Hadoop or other ecosystem components.

What do I think about the scalability of the solution?

Since this is primarily for customer incubation, there is a need to process huge volumes of data, based on the proof of value engagement. During these processes, we scale the number of instances on demand (using Amazon spot instances), use them for a defined period, and scale down when the PoC is done. This gives us good flexibility and we pay only for usage.

How is customer service and technical support?

Since this is mostly community driven, we get a lot of input from the forums and our in-house staff who are skilled in doing the job. So far, most of the issues we have had during setup or scaling have primarily been on the infrastructure side and not on the stack. For most of the problems we get answers from the community forums.

How was the initial setup?

We didn't have any major issues except for knowledge, so we hired the right person who had hands-on experience with this stack, and worked with the cloud provider to get the right mechanism for handling the stack.

General installation/dependency issues were there, but were not a major, complex issue. While migrating data from MySQL to Hive, things are a little challenging, but we were able to get through that with support from forums and a little trial and error. In addition, the old PoCs which were migrated had issues in directly connecting to Hive. We had to build some user functions to handle that.

What's my experience with pricing, setup cost, and licensing?

We normally do not suggest any specific distributions. When it comes to cloud, our suggestion would be to choose different types of instances offered by Amazon cloud, as we are technology partners of Amazon for cost savings. For all our PoCs, we stick to the default distribution.

Which other solutions did I evaluate?

None, as this stack is familiar to us and we were sure it could be used for such engagements without much hassle. Our primary criteria were the ability to migrate our existing RDBMS-based PoC and connectivity via our ETL and visualization tool. On top of that, support for semi-structured data for ELT. All three of these criteria were a fit with this stack.

What other advice do I have?

Our general suggestion to any customer is not to blindly look and compare different options. Rather, list the exact business needs - current and future - and then prepare a matrix to see product capabilities and evaluate costs and other compliance factors for that specific enterprise.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1384338 - PeerSpot reviewer
Vice President - Finance & IT at a consumer goods company with 1-10 employees
Real User
Great micro-partitions, helpful technical support and quite stable
Pros and Cons
  • "The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so."
  • "The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."

What is our primary use case?

As an example of a use case, when I was a contractor for Cisco, we were processing mobile network data and the volume was too big. RDBMS was not supporting anything. We started using the Hadoop framework to improve the process and get the results faster.

What is most valuable?

The data is stored in micro-partitions which makes the processes very fast compared to other RDBMS systems. Apache Spark is in the memory process, and it's much better than MapReduce.

Micro-partitions and the HDFS are both excellent features.

What needs improvement?

I'm not sure if I have any ideas as to how to improve the product.

Every year, the solution comes out with new features. Spark is one new feature, for example. If they could continue to release new helpful features, it will continue to increase the value of the solution.

The solution could always improve performance. This is a consistent requirement. Whenever you run it, there is always room for improvement in terms of performance.

The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning.

We would prefer it if users didn't just get pushed through to certification-based learning, as certifications are expensive. Maybe if they could arrange it so that the certification was at a lesser cost. The certification cost is currently around $2,500 or thereabout. 

For how long have I used the solution?

I've been using the solution for four years.

What do I think about the stability of the solution?

We haven't had too many problems with stability. For the POC we used a small amount of data and we started with 10 nodes. We're gradually increasing in now to 40 nodes. We haven't seen any issues after the small teething period in the beginning. The configuration issues and the performance issues have subsided. Once we learned how to stack everything, it has been much better.

What do I think about the scalability of the solution?

The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so.

We are supporting a multitenancy model and we get the data on supporting the users. I would say, per organization, we have eight to 10 users and probably have a total of around 40 users across the board.

How are customer service and technical support?

We started on the solution as a POC. Once we got into production, we had some minor issues. We get great support. They share advice and helped us tweak some things in terms of the configurations. We've been satisfied with the level of service we've been provided.

Which solution did I use previously and why did I switch?

We have only ever used Apache Hadoop, or a version of it. When we looked for the commercial tier, there was Cloudera and Hortonworks. We started with the Hortonworks due to the fact that at that time we felt it was cost-effective. However, Cloudera bought Hadoop and Hortonworks and now it's all basically the same solution.  

How was the initial setup?

The initial setup was a little complex the first time around. We were new to the system, and we didn't have any expertise at that time. Once we get some support and insights into how to work everything properly it went more smoothly.

First, we started with a POC - proof of concept. It takes a couple of days in terms of understanding and configuring everything, etc. When we went to production, it was a couple of hours for deployment and we put into practice everything we learned from the POC.

There's definitely a learning curve. It's stable for us now. 

We have a team of developers doing multiple tasks on the solution and few of them are taking care of Hadoop, so we do have a few people handling maintenance.

What about the implementation team?

As we were new to the solution, we found we needed some outside assistance to guide us. However, that was for the POC. In the end, I did it myself. 

What other advice do I have?

We're just a customer. We don't have a business relationship with Hadoop. 

My day-to-day job is data modeling and architecting.

Originally we used it as an open-source solution. We downloaded it, then we went for a commercial version of it.

In terms of advice, I'd tell other potential users that whether the solution is right for them depends on a few items. If the data volume is too big, it's IoT data, or the stream of data is too much, this solution can handle it and I would definitely recommend Apache Hadoop. 

Recently, in the last 18 months, I've been working with the Snowflake, it's a Data Lake project, and I am really impressed with that one. I got a certification so that we started using Snowflake set for our Data Lake environment.

I'd rate the solution eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Real User
Reduces cost, saves time, and provides insight into our unstructured data
Pros and Cons
  • "The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics."
  • "We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it."

What is our primary use case?

We use this solution for our Enterprise Data Lake.

How has it helped my organization?

Using this solution has reduced the overall TCO. It has also improved data processing time for the machine and provides greater insight into our unstructured data.

What is most valuable?

The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics.

What needs improvement?

We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it.

For how long have I used the solution?

More than four years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Apache Hadoop Report and get advice and tips from experienced pros sharing their opinions.
Updated: December 2024
Product Categories
Data Warehouse
Buyer's Guide
Download our free Apache Hadoop Report and get advice and tips from experienced pros sharing their opinions.