I currently am working as a Special Strategic Adviser. I am involved in strategic risk management analysis and mitigation actions.
We are currently evaluating SQream Technologies SQream DB. Does anybody have experience with them and can attest to them being the best RDBMS vendor for big data of 30TB+?
Are there any other RDBMS solutions for big data that I should be evaluating?
Thanks! I appreciate the help.
I haven't used SQream personally. However, if you are only considering GPU based rdbms's please check the following
hackernoon.com
SQreamDB is a GPU DB. It is not suitable for real-time oltp of course.
Cassandra is best suited for OLTP database use cases, when you need a scalable database (instead of SQL server, Postgres)
SQream is a GPU database suited for OLAP purposes. It's the best suite for a very large data warehouse, very large queries needed mass parallel activity since GPU is great in massive parallel workload.
Also, SQream is quite cheap since we need only one server with a GPU card, the best GPU card the better since we will have more CPU activity. It's only for a very big data warehouse, not for small ones.
Your best DB for 40+ TB is Apache Spark, Drill and the Hadoop stack, in the cloud.
Use the public cloud provider's elastic store (S3, Azure BLOB, google drive) and then stand up Apache Spark on a cluster sized to run your queries within 20 minutes. Based on my experience (Azure BLOB store, Databricks, PySpark) you may need around 500 32GB nodes for reading 40 TB of data.
Costs can be contained by running your own clusters but Databricks manage clusters for you.
I would recommend optimizing your 40TB data store into the Databricks delta format after an initial parse.
Morten, the most popular comparisons of SQream can be found here: www.itcentralstation.com
The top ones include Cassandra, MemSQL, MongoDB, and Vertica.