Try our new research platform with insights from 80,000+ expert users
Teodor Muraru - PeerSpot reviewer
Developer at Emag
Real User
Top 5
Helps to store and retrieve information
Pros and Cons
  • "Apache Hadoop is crucial in projects that save and retrieve data daily. Its valuable features are scalability and stability. It is easy to integrate with the existing infrastructure."

    What is our primary use case?

    The solution helps to store and retrieve information.

    What is most valuable?

    Apache Hadoop is crucial in projects that save and retrieve data daily. Its valuable features are scalability and stability. It is easy to integrate with the existing infrastructure. 

    For how long have I used the solution?

    I have been using the tool for a few years. 

    What do I think about the stability of the solution?

    I rate the tool's stability a nine out of ten. 

    Buyer's Guide
    Apache Hadoop
    November 2024
    Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
    816,406 professionals have used our research since 2012.

    How are customer service and support?

    I take support from the DevOps team. 

    What other advice do I have?

    I recommend the tool to others since it is good. 

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    PeerSpot user
    R&D Head, Big Data Adjunct Professor at SK Communications Co., Ltd.
    Real User
    Not dependent on third-party vendors
    Pros and Cons
    • "We selected Apache Hadoop because it is not dependent on third-party vendors."
    • "Real-time data processing is weak. This solution is very difficult to run and implement."

    What needs improvement?

    Apache Hadoop's real-time data processing is weak and is not enough to satisfy our customers, so we may have to pick other products. We are continuously researching other solutions and other vendors.

    Another weak point of this solution, technically speaking, is that it's very difficult to run and difficult to smoothly implement. Preparation and integration are important.

    The integration of this solution with other data-related products and solutions, and having other functions, e.g. API connectivity, are what I want to see in the next release.

    For how long have I used the solution?

    We've started using Apache Hadoop since 2011.

    Which solution did I use previously and why did I switch?

    We selected Apache Hadoop because it is not dependent on third-party vendors. Previously, our main business unit was related to big vendors like IBM, Oracle, and EMC, etc. We wanted to have a competitive advantage in technology, so we selected the Apache project and used Apache open source.

    What about the implementation team?

    The solution was implemented through a local vendor team here in Korea.

    Which other solutions did I evaluate?

    We evaluated IBM, Oracle, and EMC solutions.

    What other advice do I have?

    My position in the company falls under the research and development of new technologies and solutions. I investigate, research, download, and read information and reports as part of my job.

    Our company has a big data business division, and we propose, develop, and implement things which are related to big data projects. We are using Cloud Hadoop open source versions, distributed versions, and commercial Hadoop distributed versions. We propose all these versions to our customers from any industry.

    Our focus is on the public sector. Big data is our strong point in Korea. Our company is the leader in big data technology, including infrastructure and visualization. This is a solution we provide to our customers. We are also in partnership with IBM. Our main focus is on Apache Hadoop.

    We provide Apache Hadoop to our customers. I work for a systems integrator and technical consulting company.

    Overall, our satisfaction with this solution is so-so. We continuously investigate new technologies and other solutions.

    The Hadoop open source version was implemented in 95% of our company's customer base. Our remaining customers had the local vendor's Hadoop platform package implemented for them.

    Our company is in the big data business. Before the big data business back in 1976, we implemented BI (business intelligence), DW (data warehouse), EIS, and DSS (decision support system), so we are in partnership with IBM.

    I don't have advice for people looking into implementing this solution because I'm not in the business unit. I'm in the research field. My role is to plan new technology and provide consultation to our customers for big data projects in the early stages.

    My rating for Apache Hadoop from a technical standpoint is eight out of ten.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Apache Hadoop
    November 2024
    Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
    816,406 professionals have used our research since 2012.
    Anand Viswanath - PeerSpot reviewer
    Project Manager at Unimity Solutions
    Real User
    Top 5Leaderboard
    Offers reasonable integration features but needs to improve the setup process
    Pros and Cons
    • "The tool's stability is good."
    • "The load optimization capabilities of the product are an area of concern where improvements are required."

    What is our primary use case?

    I use the solution in my company for security purposes.

    In my company, we have intranet portals that we need to ensure are not accessible by outsiders. All the data that are within the internal applications is only accessible with valid credentials within the domain. In general, my company uses Apache Hadoop to secure our internal applications.

    What needs improvement?

    Tools like Apache Hadoop are knowledge-intensive in nature. Unlike other tools in the market currently, we cannot understand knowledge-intensive products straight away. To use Apache Hadoop, a person needs intensive knowledge, which is something that not everybody can get familiarized with in a straightforward manner. It would be beneficial if navigating through tools like Apache Hadoop is made user-friendly. For non-technical users, if the tool is made easy to navigate, it will be easier to use, and one may not have to depend on experts.

    The load optimization capabilities of the product are an area of concern where improvements are required.

    The complex setup phase can be made easier in the future.

    For how long have I used the solution?

    I have four years of experience with Apache Hadoop.

    What do I think about the stability of the solution?

    The tool's stability is good.

    What do I think about the scalability of the solution?

    I am not sure about the scalability features of the product.

    There are around 500 users of the product in my company.

    When there is a huge load or a huge number of people accessing the product simultaneously, there is a visible delay in the loading of pages.

    How was the initial setup?

    The product's initial setup phase is complex.

    I have not dealt with the setup phase straight away. I always like to rely on the infra person in my company who knows Apache Hadoop.

    The solution is deployed on the cloud.

    What about the implementation team?

    The product can be deployed with the help of the in-house infra team at my company.

    What other advice do I have?

    There was a scenario when the product was essential for my company's data analytics needs. Before my company makes any web solution available in production, we have prototypes and replicas of the application in lower environments. My company uses Apache Hadoop to ensure that the lower environments in which we operate are secure and accessible only by those people in our company with valid credentials.

    I suggest that those planning to use the product first understand the tool's features and capabilities and then choose the right configuration to avoid misconfigurations.

    The product's integration capabilities are good since I see that we have not faced any time outs or downtime in our company when using the tool.

    My company uses the tool to have security and the right availability, which means availability to the right people at the right time. So I think our expectation was met. The value we got from the tool was what we wanted in our company.

    My company started to use the tool expecting that it would offer security and ensure its availability to the right people at the right time. I believe that the tool was able to meet our company's expectations, so we got the value that we expected the product to deliver.

    I rate the tool a seven out of ten.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    reviewer1976262 - PeerSpot reviewer
    Credit & Fraud Risk Analyst at a financial services firm with 10,001+ employees
    Real User
    Has the ability to take a large amount of data and deliver the necessary splices and summary charts
    Pros and Cons
    • "Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial."
    • "I mentioned it definitely, and this is probably the only feature we can improve a little bit because the terminal and coding screen on Hadoop is a little outdated, and it looks like the old C++ bio screen. If the UI and UX can be improved slightly, I believe it will go a long way toward increasing adoption and effectiveness."

    What is our primary use case?

    We use Apache Hadoop for analytics purposes.

    What is most valuable?

    The ability to take a lot of data and attempt to basically deliver the appropriate splices and summary chart is the most crucial function that I have discovered. 

    This stands in contrast to some of the other tools that are available, such as SQL and SAS, which are likely incapable of handling such a large volume of data. Even R, for instance, is unable to handle such data volumes. 

    Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial.

    What needs improvement?

    In terms of processing speed, I believe that some of this software as well as the Hadoop-linked software can be better. While analyzing massive amounts of data, you also want it to happen quickly. Faster processing speed is definitely an area for improvement.

    I am not sure about the cloud's technical aspects, whether there are things that happen in the cloud architecture that essentially make it a little slow, but speed could be one. And, second, the Hadoop-linked programs and Hadoop-linked software that are available could do much more and much better in terms of UI and UX.

    I mentioned it definitely, and this is probably the only feature we can improve a little bit because the terminal and coding screen on Hadoop is a little outdated, and it looks like the old C++ bio screen. 

    If the UI and UX can be improved slightly, I believe it will go a long way toward increasing adoption and effectiveness.

    For how long have I used the solution?

    I have been using Apache Hadoop for six months.

    What do I think about the stability of the solution?

    It is far more stable than some of the other software that I have tried. It's also the current version of Hadoop software and is becoming increasingly more stable.

    When a new version is released, the subsequent ones are always more stable and easier to use.

    What do I think about the scalability of the solution?

    According to what I have seen in my current enterprise, once I joined the organization, it was fairly simple to have it for an employee, and this is true for everyone who's been onboarded in my own designation. I would imagine that it is fairly scalable across an enterprise.

    I am fairly certain that we have between 10 and 15,000 employees who use it.

    How are customer service and support?

    I have not had any direct experience with technical support.

    We have an in-house technical support team that handles it.

    Which solution did I use previously and why did I switch?

    I have since changed careers, I no longer use any automation tools, nor does my job need me to compare the capabilities of other tools.

    I am working with Risk Analytic tools. I work with data these days, therefore I use technologies like Hive, Shiny R, and other data-intensive programs.

    Shiny is a plugin that you can have on R. As a result of changing my profiles, I am now working in a position that is more data-centric and less focused on process automation.

    We currently have proprietary tools, a proprietary cloud software, therefore I don't really need to employ any external cloud vendors. Aside from that, I only use the third-party technologies I've already indicated, primarily Hadoop and R.

    This is one of the prime, one of the cornerstone software that we use. I have never been in a position to compare the like-for-like comparison with another software.

    How was the initial setup?

    As it is proprietary software for the enterprise that I am currently working on, I had no trouble setting it up.

    What's my experience with pricing, setup cost, and licensing?

    I am not sure about the price, but in terms of usability and utility of the software as a whole, I would rate it a three and a half to four out of five.

    Which other solutions did I evaluate?

    When I was a digital transformation consultant for my prior employer, I downloaded and read the reviews.

    It involved learning about workflow automation tools as well as process automation. I looked at a number of these platforms as part of that, but I have never actually used them.

    What other advice do I have?

    I would recommend this solution for data professionals who have to work hands-on with big data.

    For instance, if you work with smaller or more finite data sets, that is, data sets that do not keep updating themselves, I would most likely recommend R or even Excel, where you can do a lot of analysis. However, for data professionals who work with large amounts of data, I would strongly recommend Hadoop. It's a little more technical, but it does the job.

    I would rate Apache Hadoop an eight out of ten. I would like to see some improvements, but I appreciate the utility it provides.

    Which deployment model are you using for this solution?

    Public Cloud
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Senior Associate at a financial services firm with 10,001+ employees
    Real User
    Relatively fast when reading data into other platforms but can't handle queries with insufficient memory
    Pros and Cons
    • "As compared to Hive on MapReduce, Impala on MPP returns results of SQL queries in a fairly short amount of time, and is relatively fast when reading data into other platforms like R."
    • "The key shortcoming is its inability to handle queries when there is insufficient memory. This limitation can be bypassed by processing the data in chunks."

    What is most valuable?

    Impala. As compared to Hive on MapReduce, Impala on MPP returns results of SQL queries in a fairly short amount of time, and is relatively fast when reading data into other platforms like R (for further data analysis) or QlikView (for data visualisation).

    How has it helped my organization?

    The quick access to data enabled more frequent data backed decisions.

    What needs improvement?

    The key shortcoming is its inability to handle queries when there is insufficient memory. This limitation can be bypassed by processing the data in chunks.

    For how long have I used the solution?

    Two-plus years.

    What do I think about the stability of the solution?

    Typically instability is experienced due to insufficient memory, either due to a large job being triggered or multiple concurrent small requests.

    What do I think about the scalability of the solution?

    No. This is by default a cluster-based setup and hence scaling is just a matter of adding on new data nodes.

    How are customer service and technical support?

    Not applicable to Cloudera. We have a separate onsite vendor to manage the cluster.

    Which solution did I use previously and why did I switch?

    No. Two years ago this was a new team and hence there were no legacy systems to speak of.

    How was the initial setup?

    Complex. Cloudera stack itself was insufficient. Integration with other tools like R and QlikView was required and in-house programs had to be built to create an automated data pipeline.

    What's my experience with pricing, setup cost, and licensing?

    Not much advice as pricing and licensing is handled at an enterprise level.

    However do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea.

    Which other solutions did I evaluate?

    Yes. Oracle Exadata and Teradata.

    What other advice do I have?

    Try open-source Hadoop first but be aware of greater implementation complexity. If open-source Hadoop is "too" complex, then consider a vendor packaged Hadoop solution like HortonWorks, Cloudera, etc.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    reviewer2150616 - PeerSpot reviewer
    Lead Data Scientist at a transportation company with 51-200 employees
    Real User
    Top 5
    Distributes data processing tasks across multiple nodes and has a straightforward setup process
    Pros and Cons
    • "The platform's quick data processing capabilities have been instrumental in supporting our AI-driven projects."
    • "The product's availability of comprehensive training materials could be improved for faster onboarding and skill development among team members."

    What is most valuable?

    The product's distributed computing capability is the most effective. It allows us to distribute data processing tasks across multiple nodes, significantly speeding up processing time.

    What needs improvement?

    The product's availability of comprehensive training materials could be improved for faster onboarding and skill development among team members.

    For how long have I used the solution?

    I've been working with Apache Hadoop for about four years now.

    What do I think about the stability of the solution?

    We encounter occasional issues like memory constraints during extensive data processing. They have been manageable through scaling adjustments. 

    I rate the stability an eight or nine. 

    What do I think about the scalability of the solution?

    Hadoop's scalability can be rated a nine out of ten due to its exceptional flexibility, allowing horizontal scaling by adding or removing nodes as required.

    How are customer service and support?

    Technical support has been decent, although there can be delays in resolving new or complex issues.

    How would you rate customer service and support?

    Neutral

    How was the initial setup?

    The product deployment process was quite straightforward, taking only a few minutes for installation.

    What was our ROI?

    The platform has resulted in significant cost savings on compute resources by optimizing usage based on workload demands.

    What's my experience with pricing, setup cost, and licensing?

    I would rate the product's subscription-based pricing a six out of ten. It's reasonable, but there's room for improvement in cost-effectiveness.

    What other advice do I have?

    The platform's quick data processing capabilities have been instrumental in supporting our AI-driven projects. I would recommend it, especially for organizations dealing with large-scale data processing and needing robust distributed computing capabilities. 

    I would rate Apache Hadoop an eight out of ten.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Satya Raju - PeerSpot reviewer
    Archtect - software engineering at Innominds
    Real User
    Top 5Leaderboard
    Robust data processing and analytics with potential improvements for streaming capabilities
    Pros and Cons
    • "Hadoop can store any kind of data—structured, unstructured, and semi-structured—and presents it using the relational model through Hive."
    • "Hadoop lacks OLAP capabilities."

    What is our primary use case?

    I use Hadoop as a data lake in an AIML solution, where it connects to various data sources and ingests data into Hadoop. It is utilized for processing large data volumes with various data sources such as RDBMS, file systems, Kafka for real-time streaming data, IoT, web sockets, and API metadata.

    How has it helped my organization?

    Hadoop provides a robust data lake functionality, allowing for the ingestion and processing of varied data types. It ensures no data loss through data replication and efficient transformation jobs handled in parallel, enhancing data analytics.

    What is most valuable?

    Hadoop can store any kind of data—structured, unstructured, and semi-structured—and presents it using the relational model through Hive. The combination with Spark enhances data analytics capabilities.

    What needs improvement?

    Hadoop lacks OLAP capabilities. I recommend adding a Delta Lake feature to make the data compatible with ACID properties. Also, video and audio streaming import issues could be improved to ensure proper data validation.

    For how long have I used the solution?

    I have been working with Apache Hadoop for the last ten years.

    What do I think about the stability of the solution?

    The stability of Hadoop is good. I have been working with it for the last ten years and have not encountered significant stability issues.

    What do I think about the scalability of the solution?

    Hadoop offers good scalability with horizontal scaling, especially when deployed on cloud platforms like Cloudera, which takes care of scaling the infrastructure. On-premises requires the maintenance of data nodes.

    How are customer service and support?

    I rate the customer support at eight out of ten. It is satisfactory.

    How would you rate customer service and support?

    Positive

    Which solution did I use previously and why did I switch?

    I have worked with traditional RDBMS which do not support analytics, querying, and data replication as efficiently as Hadoop.

    How was the initial setup?

    Setting up Apache Hadoop requires some learning curve and is of medium complexity. I rate the setup a seven out of ten.

    What about the implementation team?

    We need provision for the cluster deployment, including a master node, coordinator node, and the setup of Spark and Hive for development.

    Which other solutions did I evaluate?

    I have also worked with HDFS, Hive, and Apache Spark with Scala and Java.

    What other advice do I have?

    As cloud technology is emerging, it is advisable to transition from traditional Hadoop to cloud-based solutions like AWS EMR and Azure, which offer better maintenance-free infrastructure management.

    I'd rate the solution seven out of ten.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Analytics Platform Manager at a consultancy with 10,001+ employees
    Real User
    Parallel processing allows us to get jobs done, but the platform needs more direct integration of visualization applications
    Pros and Cons
    • "Two valuable features are its scalability and parallel processing. There are jobs that cannot be done unless you have massively parallel processing."
    • "I would like to see more direct integration of visualization applications."

    What is our primary use case?

    We use it as a data lake for streaming analytical dashboards.

    How has it helped my organization?

    There is a lot of difference. I think the best case is that we are able to drill down to transactional records and really build a root-cause analysis for various issues that might arise, on demand. Because we're able to process in parallel, we don't have to wait for the big data warehouse engine. We process down what the data is and then build it up to an answer, and we can have an answer in an hour rather than 10 hours.

    What is most valuable?

    • Scalability
    • Parallel processing

    There are jobs that cannot be done unless you have massively parallel processing; for instance, processing call-detail records for telecom.

    What needs improvement?

    In general, Hadoop has as lot of different component parts to the platform - things like Hive and HBase - and they're all moving somewhat independently and somewhat in parallel. I think as you look to platforms in the cloud or into walled-garden concepts, like Cloudera or Azure, you see that the third-party can make sure all the components work together before they are used for business purposes. That reduces a layer of administration configuration and technical support.

    I would like to see more direct integration of visualization applications.

    For how long have I used the solution?

    More than five years.

    What do I think about the stability of the solution?

    In general, stability can be a challenge. It's hard to say what stability means. You're in an environment that's before production-line manufacturing, where none of the parts relate together exactly as they should. So that can create some instability.

    To realize the benefit of these kinds of open-source, big-data environments, you want to use as many different tools as you can get. That brings with it all this overhead of making them work together. It's kind of a blessing and a curse, at the same time: There's a tool for everything.

    How are customer service and technical support?

    Apache is the open-source foundation that Cloudera and Hortonworks contribute code and some work to. I don't know that there is actually support and structure, per se, for Apache.

    We have had premium, at various times with various companies. From the three dominant companies I've worked with - Cloudera, Hortonworks, and MapR - there is a premium support package but that still only covers their base. Distribution is not necessarily all the add-ons that are on top of it, which is really a big challenge: to get everything to work together.

    Which solution did I use previously and why did I switch?

    There are the older relational database technologies: Netezza, SQL Server, MySQL, Oracle, Teradata. All have some advantages and some disadvantages. Most notably, they are all significantly more expensive in terms of the capital expense, rather than the operational expense. They are "walled-garden," so to speak, that are curated and have a distinct set of tools that work with them, and not the bleeding-edge ingenuity that comes with an open-source platform.

    Data warehousing is 30 years old, at least. Big data is, in its current form, has only been around for four or five years old.

    How was the initial setup?

    There are capacities in which I have been responsible for setup, administration, and building the applications on those environments. Each of the components is relatively straightforward. The complexity comes from all the different components.

    What other advice do I have?

    Implement for defined use cases. Don't expect it to all just work very easily.

    I would rate this platform a seven out of 10. On the one hand, it's the only place you can use certain functions, and on the other hand, it's not going to put any of the other ones out of business. It's really more of a complement. There is no fundamental battle between relational databases and Hadoop.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Download our free Apache Hadoop Report and get advice and tips from experienced pros sharing their opinions.
    Updated: November 2024
    Product Categories
    Data Warehouse
    Buyer's Guide
    Download our free Apache Hadoop Report and get advice and tips from experienced pros sharing their opinions.