I have experience working as a senior integration architect for AI/ML enablement for a manufacturing company with 10,000+ employees.
We are evaluating data science platforms. Which vendor offers an end-to-end solution that really works from features management to model deployment?
Thanks! I appreciate the help.
There is a lot of vendors that offers their data science platforms, but it depends on of what you call end-to-end vendors and if you write the Word really, well makes me think that you already test many of them. Data science platforms came from a variety of vendors like IBM, SAP, Microsoft, Domino Data labs, RapidMinder among others. First I suggest that you have a person or team ready to test these solutions, if not, remember to prepare some profiles with skills of programming and process design.
My recommendation is if you already work with IBM ask for their Data Science experience. In other case my suggestion is to try RapidMiner that seems to be very useful with a fluid interface for model deployment and could try Sas Enterprise Miner as the top of the model building and model deployment and appears as one of the leaders of these platforms.
I hope this was useful and regards.
KNIME or Alterxy is a good choice for a company to deploy AI applications.
It has:
1. light data processing like ETL,
2. AI modeling develop and deploy,
3. and output simple charts or output to databases for further use like API/BI/etc.
If you deploy in the cloud, you can also use the AWS Sagemaker or other cloud tools.
There are many vendors offering end to end deployment with pros and cons. You can evaluate based on :
- On-prem vs cloud requirement
- Data volume that you want to process
- Do you already have ETL processes in place to extract the relevant data from diff sources?
- How are you planning to consume your ML output (API/dashboard/reports, etc)?
- Lastly, your ML algorithms that you intend to use and whether analyzing structured or unstructured data or both.
If you need further details, I will ask my presales to get in touch with you. Please provide me your contact information
.
DataRobot for OnPrem
SageMaker for AWS
Another thing you need to be cognizant of is end-to-end platforms allow you to build and deploy models to production, that is ML 101, where the market is moving is building and scaling predictive applications for numerous business process and cases. Also many end-to-end platforms do not have the capabilities to deal with data drift, model retraining once it's in production and for more advanced use cases the capability for human-in-the-loop feedback to help retrain the model. A final thought I will put out there is explainability and interpretability are paramount today, you can build your models in open source, use these other tools to put them into production but you are going to have a gaping hole when someone comes to ask you, how did you build the model, what weights did you put on your features, how are you dealing with bias, etc. Majority of all platforms out there today, help you stitch together disparate open source solutions, but when you actually get into product-ionizing and scaling multiple business processes that are operationalized with machine learning they don't work.
The current issue today with the majority of DS platforms is they are based on disparate open-source libraries, or you need 5-6 different tools to build your end-to-end ML workflow, most have never seen production either.
At BigML we've been around for 10+ years were the first to market with an MLaaS platform and can help you and your team accomplish true end-to-end ML (source > dataset> model > predictions > production) all in a singular platform, we work with many clients in your space, and would be happy to talk with you. You can even sign up for our platform for free and take it for a spin.
One potential solution might be the SAS platform www.sas.com
As others have said, many options but add Dataiku, H2Oi, Alteryx, and Databricks to your list.
Check out our system at Novi.Systems. It's an entirely integrated platform that includes hardware and software that performs what you require and much more. We'd be glad to set up a demo for you that allows you to load your data and "test drive" all the capabilities for up to four weeks. Contact me at mike@novi.systems
Please check for H2Oi, AzureML, Tensorflow.
For "end-to-end" platform for data science, I would prefer KNIME.
I think KNIME is especially better in working with various sources of data and preprocessing, easier to modify/add/remove flows from time to time when situations are changed.
For analytic, I have 50% of chance using KNIME nodes, and another 50% to code in Python node. Anyway it gives flexibility that you can write your own codes (I don't write R). And things are much simpler when data is well preprocessed.
It also provide data visualisation nodes, good enough but for fancy presentation, you will want to try others like Tableau.
Therefore it is easy to scale up as KNIME can nicely simplify the process before preprocessing.
I would suggest having working sessions for Data Robot (if your implementation is on-prem).
SageMaker is what I would recommend if you plan for AWS.
Data Science and Advanced Analytics adoption has become more tempting for organizations across almost all fields and business domains. Corporate leaders rush to enforce an analytics-driven decision-making culture hoping to accelerate business performance. They invest heavily in technology, collection, and data storage as fundamental business priorities, but more as a "knee-jerk" reaction that is not enough for an effective analytics strategy, although it might seem completely understandable.
Unfortunately, most organizations fail to get the best value possible from employing such a practice.
Although technology, tools, data storage capabilities, and the right talent pool are essential pillars, they do not guarantee actionable intelligence, generating substantial value for the business.
Business individuals and stakeholders often believe in the importance of making analytics-driven decisions. However, they hardly develop actionable use cases on top of analytics-driven recommendations in practice.
Primary reasons possibly include the conventional separation between data and business and the gap between insight and impact.
It wouldn't be incredible to let business individuals manage data independently and build predictive analysis models (without needing to write a single line of code) when needed!
We are introducing the AI Surge, a no-code AI platform that helps businesses predict without writing a single line of code.
It's like Data science without data scientists.
We wanted to offer you a free beta trial.
Zero cost for data engineering
Zero cost for data science
Zero cost for your scalable cloud infrastructure
In return, we want your honest feedback, and you can enjoy the product for free for 365 days (*Limited number of free users)
Work on your personal AI project for free.
Another thing you need to be cognizant of is end-to-end platforms allow you to build and deploy models to production, that is ML 101, where the market is moving is building and scaling predictive applications for numerous business process and cases. Also many end-to-end platforms do not have the capabilities to deal with data drift, model retraining once it's in production and for more advanced use cases the capability for human-in-the-loop feedback to help retrain the model. A final thought I will put out there is explainability and interpretability are paramount today, you can build your models in open source, use these other tools to put them into production but you are going to have a gaping hole when someone comes to ask you, how did you build the model, what weights did you put on your features, how are you dealing with bias, etc. Majority of all platforms out there today, help you stitch together disparate open source solutions, but when you actually get into product-ionizing and scaling multiple business processes that are operationalized with machine learning they don't work.
If you want to perform some ETL along with feature management and model deployment then I would recommend Alteryx + Data Robot
The best data science platform is the one you try to fits best to fulfill all your requirements and that is the goal you want to reach, the data you have for use into the platform and the results that you wanted to have accordingly with your goals. So there is a lot of tools to use but my suggestion is to try those that is the most accepted if you do not work with one specific vendor. So try with RapidMiner, SAS Enterprise Miner, KNIME or Alterxy.