It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming.
It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database.
What I like about Apache Hadoop is that it's for big data, in particular big data analysis, and it's the easier solution. I like the data processing feature for AI/ML use cases the most because some solutions allow me to collect data from relational databases, while Hadoop provides me with more options for newer technologies.
Vice President - Finance & IT at a consumer goods company with 1-10 employees
Real User
2020-07-14T08:15:56Z
Jul 14, 2020
The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so.
The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics.
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect...
It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming.
The tool's stability is good.
Hadoop File System is compatible with almost all the query engines.
It's open-source, so it's very cost-effective.
It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database.
The most valuable feature is scalability and the possibility to work with major information and open source capability.
Its integration is Hadoop's best feature because that allows us to support different tools in a big data platform.
Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial.
One valuable feature is that we can download data.
I liked that Apache Hadoop was powerful, had a lot of tools, and the fact that it was free and community-developed.
What I like about Apache Hadoop is that it's for big data, in particular big data analysis, and it's the easier solution. I like the data processing feature for AI/ML use cases the most because some solutions allow me to collect data from relational databases, while Hadoop provides me with more options for newer technologies.
The scalability of Apache Hadoop is very good.
We selected Apache Hadoop because it is not dependent on third-party vendors.
Hadoop is extensible — it's elastic.
Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability.
The performance is pretty good.
The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so.
The most valuable features are powerful tools for ingestion, as data is in multiple systems.
The most valuable feature is the database.
It's good for storing historical data and handling analytics on a huge amount of data.
The ability to add multiple nodes without any restriction is the solution's most valuable aspect.
What comes with the standard setup is what we mostly use, but Ambari is the most important.
The best thing about this solution is that it is very powerful and very cheap.
The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics.
Two valuable features are its scalability and parallel processing. There are jobs that cannot be done unless you have massively parallel processing.