It all depends on what you wish to accomplish. Are you talking about fast databases, ETLs, a Machine Learning tool, integration with R or Python, Self-Service Data Visualization Tool, Collaboration? No size fits all...
Dataiku, Domino, RapidMiner are notable candidates for your purpose, I presume.
It has been 2 years when I checked several vendors and made the list as candidates. They all support large-scale data manipulation for data analysis and machine learning development as a platform that can be used by many people in a collaborative way.
Professor of Health Services Research at a university with 1,001-5,000 employees
Real User
2021-08-24T10:48:49Z
Aug 24, 2021
I suspect that I cannot answer this. I have used Knime and RapidMiner with data sets that have had up to about 80,000 rows and 1,500 columns and both have performed well. However, I doubt whether the questioner would classify my usage as "large amounts of data". If my usage is like theirs, then both packages can be recommended.
Both Knime and RapidMiner offer the facility to link with Python or R, and those languages have modules or methods which offer better performance on large data sets (multi-processing or using GPUs, etc.), so those combinations might serve their purpose. So, they might use, say, Knime for ease of use and, say, R for the excess power or RapidMiner and Python.
Community Manager at a tech services company with 51-200 employees
Real User
Aug 19, 2020
@Yogesh PARTE Good point - this is a more general question, but I do agree that it's easier to make recommendations with more details. Would you mind sharing more about why H20.ai Sparkling Water is your preferred choice in this instance?
Data Science Platforms designed to support the end-to-end data science process, enabling data professionals to develop, deploy, and manage data-driven applications. These platforms integrate a wide range of tools for data preparation, model building, testing, and deployment, streamlining workflows for data scientists, engineers, and business analysts.
DakaIku is a great general purpose data science platform for both supervised and unsupervised learning. It handles Big Data very well.
@Ziad Chaudhry I'd also vote for Dataiku, look at their cases https://www.dataiku.com/storie...
Sparkcognition's Darwin product can handle very large data sets.
Thanks for your input @AaronCooke :)
Data science platform is a vague term.
It all depends on what you wish to accomplish. Are you talking about fast databases, ETLs, a Machine Learning tool, integration with R or Python, Self-Service Data Visualization Tool, Collaboration? No size fits all...
Dataiku, Domino, RapidMiner are notable candidates for your purpose, I presume.
It has been 2 years when I checked several vendors and made the list as candidates. They all support large-scale data manipulation for data analysis and machine learning development as a platform that can be used by many people in a collaborative way.
I suspect that I cannot answer this. I have used Knime and RapidMiner with data sets that have had up to about 80,000 rows and 1,500 columns and both have performed well. However, I doubt whether the questioner would classify my usage as "large amounts of data". If my usage is like theirs, then both packages can be recommended.
Both Knime and RapidMiner offer the facility to link with Python or R, and those languages have modules or methods which offer better performance on large data sets (multi-processing or using GPUs, etc.), so those combinations might serve their purpose. So, they might use, say, Knime for ease of use and, say, R for the excess power or RapidMiner and Python.
If you want to handle computer vision data, I recommend the Superb AI Suite.
https://www.superb-ai.com/
The question also needs to specify which domain, what kind of data and public or private platforms.
For structured/tabular data driverless AI / H20.ai sparkling water is my preferred platform.
@Yogesh PARTE Good point - this is a more general question, but I do agree that it's easier to make recommendations with more details. Would you mind sharing more about why H20.ai Sparkling Water is your preferred choice in this instance?
My experience has not been on large scale systems. Not even multi-terabytes. My mult-megabytes would not help. Sorry!
IBM SPSS Modeler
@EzzAbdelfattah IMHO it's pretty much limited and outdated to handle with the latest frameworks features,