We use the solution for workflow distribution. It's an ETL for real-time and batch-mode processing. It's mainly used for all the stuff, including data warehousing.
The tool's most interesting features are the distributed file system and unstructured data processing capability. Because we have a lot of unstructured data, like XML and social media logs, these features make it more valuable than the usual data warehousing solutions.
Data warehouse solutions mainly use structured, regular, and formatted data, but Cloudera Distribution for Hadoop can handle unstructured data. This is the most interesting part. Also, the huge amount of data can be tuned in HDFS rather than relational databases. Cloudera Distribution for Hadoop can be a promising solution for distributed file systems, real-time processing, batch mode processing, AI, and machine learning use cases.
We are using several security features in the solution. These include Linux's security implementations and its built-in firewall. We also rely on single sign-on and encryption—at rest and in transit—for sensitive data. It has access, ensuring that not everyone can use every service; for example, some users can access Hive, others Impala, and others hBase, depending on their privileges.
We also use LDAP to track who registers or logs into the cluster. Additionally, we use key nodes to manage firewalls between Cloudera Manager or the Cloudera cluster and other data sources.