Try our new research platform with insights from 80,000+ expert users
it_user1093134 - PeerSpot reviewer
Technical Architect at RBSG Internet Operations
Real User
Good database and highly scalable, with good plug and play analytics tools
Pros and Cons
  • "The most valuable feature is the database."
  • "It would be good to have more advanced analytics tools."

What is our primary use case?

We are primarily dumping all the prior payment transaction data into a loop system and then we use some of the plug and play analytics tools to translate it.

What is most valuable?

The most valuable feature is the database.

What needs improvement?

We're finding vulnerabilities in running it 24/7. We're experiencing some downtime that affects the data.

It would be good to have more advanced analytics tools.

For how long have I used the solution?

I've been using the solution for five years.

Buyer's Guide
Apache Hadoop
November 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.

What do I think about the scalability of the solution?

The solution is scalable. From a payments perspective, we're using the solution on a large scale.

How are customer service and support?

We've never contacted technical support.

Which solution did I use previously and why did I switch?

We didn't previously use a different solution.

How was the initial setup?

The initial setup was complex. There was a lot of data that we had to bring over from various sources and it was quite a long process.

What about the implementation team?

We did have some assistance with the implementation.

What other advice do I have?

We use the on-premises deployment model.

We're more inclined towards an operational data source to fill our customer's needs. Hadoop is good for analytics and some reporting requirements. 

It's a good solution for those needing something for the purposes of management reporting.

I'd rate the solution eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1040328 - PeerSpot reviewer
IT Expert at a tech services company with 1,001-5,000 employees
Real User
An inexpensive and flexible suite that helps users integrate varied legacy systems
Pros and Cons
  • "The best thing about this solution is that it is very powerful and very cheap."
  • "The upgrade path should be improved because it is not as easy as it should be."

What is our primary use case?

We primarily use this product to integrate legacy systems.

How has it helped my organization?

It helps us work with older products and more easily create solutions. 

What is most valuable?

The most valuable thing about this program for us is that it is very powerful and very cheap. We're using a lot of the program's modules and features because we're using software and hardware that can be difficult to integrate. For example, we're using supersets and a lot of old products from difficult systems. We love having the various options and features that allow us to work with flexibility.

What needs improvement?

We are using HDTM circuit boards, and I worry about the future of this product and compatibility with future releases. It's a concern because, for now, we do not have a clear path to upgrade. The Hadoop product is in version three and we'd like to upgrade to the third version. But as far as I know, it's not a simple thing.

There are a lot of features in this product that are open-source. If something isn't included with the distribution we are not limited. We can take things from the internet and integrate them. As far as I know, we are using Presto which isn't included in HDP (Hortonworks Data Platform) and it works fine. Not everything has to be included in the release. If something is outside of HDP and it works, that is good enough for me. We have the flexibility to incorporate it ourselves.

For how long have I used the solution?

We have been using the product for about five years.

What do I think about the stability of the solution?

The product is well tested and very stable. We have no problems with the stability of it at all. Really we just install it and forget about fussing with it. We just use the features it offers to be productive.

What do I think about the scalability of the solution?

This is a scalable solution and we like what it does. It is currently serving about 100 users at our organization and it seems like it can handle more easily.

How are customer service and technical support?

We actually have not used technical support. Everything we needed a solution for we just use Google and it's enough for us. Sometimes we do have issues, but not often. The issues are mainly to do with the terminals because it's a bit complicated to integrate these other systems. We have managed to solve all the problems up till now.

Which solution did I use previously and why did I switch?

We had a very old version of Hadoop which was already installed by another company and we upgraded it. We didn't really switch we just upgraded what was here.

How was the initial setup?

The initial setup wasn't very easy because of the incredible security, but we have managed to get by that. It's sort of simple, in my opinion, once you get past that part. I think, in all, it took about half of a year. But it wasn't a new deployment, it's an upgrade and the bigger challenge was moving the data. We pretty much just supported the existing product and moved to HDP.

What about the implementation team?

We have everything on-premises and we did the deployment and maintenance. 
It took four people. We want to increase usage of Hadoop and we are thinking about it very heavily. We're actually in the process of doing it. At the same time, we are integrating things from other systems to Hadoop.

What other advice do I have?

I would give this product a rating of eight out of ten. It would not be a ten out of ten because of some problems we are having with the upgrade to the newer version. It would have been better for us if these problems were not holding us back. I think eight is good enough.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Apache Hadoop
November 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
it_user693231 - PeerSpot reviewer
Big Data Engineer at a tech vendor with 5,001-10,000 employees
Vendor
HDFS allows you to store large data sets optimally. After switching to big data pipelines our query performances had improved hundred times.

What is most valuable?

HDFS allows you to store large data sets optimally.

How has it helped my organization?

After switching to big data pipelines, our query performance improved a hundred times.

What needs improvement?

Rolling restarts of data nodes need to be done in a way that can be further optimized. Also, I/O operations can be optimized for more performance.

For how long have I used the solution?

I have used Hadoop for over three years.

What do I think about the stability of the solution?

Once we had an issue with stability, due to a complete shutdown of a cluster. Bringing up a cluster took a lot of time because of some order that needed to be followed.

What do I think about the scalability of the solution?

We have not had scalability issues.

How are customer service and technical support?

The community is very supportive and provided prompt replies and suggestions to JIRA tickets.

Which solution did I use previously and why did I switch?

We didn’t have a previous solution. It was a move from RDBMS to big data.

How was the initial setup?

Initial setup of a few nodes was simple, but as we increased the node count it became complex, as we need to maintain rack topology, etc.

What's my experience with pricing, setup cost, and licensing?

It’s free and it is open source.

What other advice do I have?

I would suggest using this product. We were able to use this for petabytes of data.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1040328 - PeerSpot reviewer
IT Expert at a tech services company with 1,001-5,000 employees
Real User
An robust open source software library and framework with many useful tools
Pros and Cons
  • "I liked that Apache Hadoop was powerful, had a lot of tools, and the fact that it was free and community-developed."
  • "The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop."

What is our primary use case?

We used Apache Hadoop mainly for ETL and data analysis.

What is most valuable?

I liked that Apache Hadoop was powerful, had a lot of tools, and the fact that it was free and community-developed. 

What needs improvement?

The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop.

For how long have I used the solution?

I worked with Apache Hadoop for about five years.

What do I think about the scalability of the solution?

Apache Hadoop is scalable. We had about 150 people using it at the organization. Some were data scientists, others were from the engineering side, and people from management because Apache Hadoop provided some reports.

How was the initial setup?

The initial setup was straightforward. However, it was challenging to make it secure. We managed to do that and implement Kerberos because it's the only way to make Hadoop safe. But it was easy and worked for a few years without any problems. Three people implemented this solution over three months.

What about the implementation team?

We implemented this solution.

What's my experience with pricing, setup cost, and licensing?

The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop.

What other advice do I have?

On a scale from one to ten, I would give Apache Hadoop a nine.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Abhik Ray - PeerSpot reviewer
Co-Founder at Quantic
Real User
Powerful data ingestion and consolidation tools prepare the data for predictive analytics
Pros and Cons
  • "The most valuable features are powerful tools for ingestion, as data is in multiple systems."
  • "It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake."

What is our primary use case?

The primary use is as a data lake. 

How has it helped my organization?

Using this solution has allowed us to consolidate the data. It has made it such that data science-based algorithms can be written for predictive analytics.

What is most valuable?

The most valuable features are powerful tools for ingestion, as data is in multiple systems.

What needs improvement?

It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake.

For how long have I used the solution?

I have been using Apache Hadoop for two years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user1208307 - PeerSpot reviewer
Practice Lead (BI/ Data Science) at a tech services company with 11-50 employees
Real User
Good for managing and replication of big data but needs a better user interface
Pros and Cons
  • "It's good for storing historical data and handling analytics on a huge amount of data."
  • "The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment."

What is most valuable?

The solution is perfect for when you have big data. It's good for managing and replication.

It's good for storing historical data and handling analytics on a huge amount of data.

What needs improvement?

It could be because the solution is open source, and therefore not funded like bigger companies, but we find the solution runs slow.

The solution isn't as mature as SQL or Oracle and therefore lacks many features.

The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment.

For how long have I used the solution?

I've been using the solution for seven years.

What do I think about the stability of the solution?

The solution is stable.

What other advice do I have?

I've used the solution under cloud, hybrid and on-premises deployment models.

I'd recommend the solution, but it depends on the company's requirements. If you don't have huge amounts of data, you probably don't need Hadoop. If you need a completely private environment, and you have lots of big data, consider Hadoop. You don't even need to invest in the infrastructure as you can just use a cloud deployment.

I'd rate the solution seven out of ten. I'd rate it higher if it had a better user interface.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Software Architect at a tech services company with 10,001+ employees
Real User
Gives us high throughput and low latency for KPI visualization
Pros and Cons
  • "High throughput and low latency. We start with data mashing on Hive and finally use this for KPI visualization."

    What is our primary use case?

    Data aggregation for KPIs. The sources of data come in all forms so the data is unstructured. We needed high storage and aggregation of data, in the background.

    How has it helped my organization?

    We start with data mashing on Hive and finally use this for KPI visualization. This intermediate step not only mashes data in the form that we want through data Cube slicing, but also helps us save states as snapshots for multiple time frames.

    Without this, we would have had to plan another data source for only this purpose. Moving this step closer to processing worked better than keeping it at visualization. Although we can't completely avoid using data stores/snapshots at visualization, this step proved to be promising for getting data ready for better analytics and insights.

    What is most valuable?

    High throughput and low latency. We start with data mashing on Hive and finally use this for KPI visualization.

    What needs improvement?

    At the beginning, MRs on Hive made me think we should get down to Hadoop MRs to have better control of the data. But later, Hive as a platform upgraded very well. I still think a Spark-type layer on top gives you an edge over having only Hive.

    For how long have I used the solution?

    Less than one year.

    What other advice do I have?

    I rate it an eight out of 10. It's huge, complex, slow. But does what it is meant for.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Download our free Apache Hadoop Report and get advice and tips from experienced pros sharing their opinions.
    Updated: November 2024
    Product Categories
    Data Warehouse
    Buyer's Guide
    Download our free Apache Hadoop Report and get advice and tips from experienced pros sharing their opinions.