We are considering Datameer as a BI on top of our Hadoop ecosystem. I have been researching it and have a few questions:
1. How fast are the complex SQL queries run on the data stored in HDFS? Are they similar to Hive or faster? How do they compare against a distributed columnar database?
2. Is it suited for real time querying?
3. Is there a limit on the number records that can be brought into the Datameer spreadsheet? For example, can I visualize 5 million data points?
4. Datameer currently seems to support only a few machine learning methods but does seem to have R integration. Can we use any R package for building machine learning models?
1. Actually, Datameer offers the first data analytics solution built on Hadoop that helps business users access, analyze and use massive amounts of data. So, Datameer's take is that Hive-based approaches.
2. If you have real time requirements you should not pull data directly out of Datameer or Hive. Columnar storages make some processing faster but will still not be low latency. But you can use those tools together with a low latency database. Use Datameer or Hive for the heavy lifting to filter and pre aggregate big data into smaller data and then export that out into a database.
3. Datameer Analysis Solution has this capability.
4. Could you consider PLYRMR is one of the most downloaded packages for data manipulation in R.