I haven't found it particularly useful. It lacks state-of-the-art algorithms and impressive outcomes. While it might offer insights for basic warehouse tasks, it falls short of deeper understanding and results. Moreover, a new user interface would be great, especially for beginners. Something that guides them through the available tools and helps them achieve their goals. I haven't seen anything like that myself, though maybe it's there and I missed it.
I am not sure but I think that machine-learning solutions in Weka cannot be uploaded to a server for production. Nowadays solutions are not simple like in toy regressions that can be manipulated in worksheets or in other languages. The solution, frequently black-box ones, must be uploaded on a server from the software of the solution, and I don't think Weka does it.
The visualization of Weka is subpar and could improve. Machine learning and visualization do not work well together. For example, we want to know how we can we delete empty cells or how can we fill in the empty cells without cleaning the data system and putting it together. In a future release, having a data-cleaning feature would be beneficial.
I haven't really utilized other automation learning tools. I'm just starting to use Weka. I'm not going to be able to say, "Oh, this is the area they should improve on." I'm mostly learning how to use it. I have heard people say that they didn't like it. Since it was built by a university for machine learning purposes, they don't have good support. So I know that's an area for improvement. A few people said it became slow after a while.
The product is good, but I would like it to work with big data. I know it has a Spark integration they could use to do analysis in clusters, but it's not so clear how to use it. In this case, it would be more how to handle big amounts of data. My project in my thesis was not so big. It was not 100 Gigabytes, but for sure these tools could be really useful. They should integrate it in a better way with Spark and have better cluster processing.
More accurate documentation should be published by the Weka company — that would be really helpful. When it comes to data visualization, I think there are lots of ways in which the data could be visualized, like pie charts. There are many more, but within the basic Weka tool, I don't see many tools that are available where we can analyze and visualize the data that well. If they could improve that area, I think it would be really good. They should focus more on data visualization, that would be really great as I have experienced many issues relating to this.
Help documentation could be more user friendly. For instance, all ordinary manuals in R follow the same structure, with examples ready to be run and many times with the interpretation of the outputs. For some packages, R has the so-called “Vignettes”, with plenty of explanations and pictures, like in a book. I don’t think Weka has such examples. In Weka packages, documentation is not so “uniform”, not the same structure, as written by different (free style) authors.
If you were to open the software, there's a section written filter. Then you'd choose your filtering. The filter section lacks some specific transformation tools. If you want to change a variable from a numeric variable to a categorical variable, you don't have a feature that can enable you to change a variable from a numeric variable to a categorical variable. This needs to be improved. Also, when you go to classification, there are some cases in which, under any employed data, under the classification section that you can not actually use tests data alone or trend data alone. Under classification and clustering as well, they should give options to only supply when you're making classification or performing classification on a dataset, then there needs to be an option to either use at trend data first, and then you supply a test data later on. If they went full open-source, like Python and R, it would help the growth of the solution.
If you have one missing value in your dataset and this missing value belongs to a specific attribute and the attribute is a numeric attribute and there is only one missing data, whenever you import this data, the problem is that Weka cannot understand that this is a numeric field. It converts everything into a string, and there is no way to convert the string into numerical math. It's really very complicated. You will be lucky enough if you get clean data. Every time we get this kind of data with missing values, if we try to understand how many missing datasets there are if it is very less, we just remove this from the dataset itself before importing that. There is no use of algorithm pipelines. In Python, we create a pipeline. First, we use that kind of clustering algorithm, suppose K means clustering, based on that specific cluster, we can choose one cluster. And based on that cluster, we can implement an algorithm. This pipeline is missing in Weka. There is also a problem with the visualization. It only can do only two or three types of visualizations.
Solution Architect / Data Scientist (upwork) at Freelancer
Real User
2020-11-10T08:17:00Z
Nov 10, 2020
I believe there are a few newer algorithms that are not present in the Weka libraries. If I want to have a solution that involves deep learning, I don't think that Weka has that capability. In that case, I have to use Python to predict any algorithms based on deep learning.
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
I haven't found it particularly useful. It lacks state-of-the-art algorithms and impressive outcomes. While it might offer insights for basic warehouse tasks, it falls short of deeper understanding and results. Moreover, a new user interface would be great, especially for beginners. Something that guides them through the available tools and helps them achieve their goals. I haven't seen anything like that myself, though maybe it's there and I missed it.
Weka could be more stable.
I am not sure but I think that machine-learning solutions in Weka cannot be uploaded to a server for production. Nowadays solutions are not simple like in toy regressions that can be manipulated in worksheets or in other languages. The solution, frequently black-box ones, must be uploaded on a server from the software of the solution, and I don't think Weka does it.
The visualization of Weka is subpar and could improve. Machine learning and visualization do not work well together. For example, we want to know how we can we delete empty cells or how can we fill in the empty cells without cleaning the data system and putting it together. In a future release, having a data-cleaning feature would be beneficial.
Weka is a little complicated and not necessarily suited for users who aren't skilled and experienced in data science.
I haven't really utilized other automation learning tools. I'm just starting to use Weka. I'm not going to be able to say, "Oh, this is the area they should improve on." I'm mostly learning how to use it. I have heard people say that they didn't like it. Since it was built by a university for machine learning purposes, they don't have good support. So I know that's an area for improvement. A few people said it became slow after a while.
The product is good, but I would like it to work with big data. I know it has a Spark integration they could use to do analysis in clusters, but it's not so clear how to use it. In this case, it would be more how to handle big amounts of data. My project in my thesis was not so big. It was not 100 Gigabytes, but for sure these tools could be really useful. They should integrate it in a better way with Spark and have better cluster processing.
I think there is a little bit of space for improvement.
More accurate documentation should be published by the Weka company — that would be really helpful. When it comes to data visualization, I think there are lots of ways in which the data could be visualized, like pie charts. There are many more, but within the basic Weka tool, I don't see many tools that are available where we can analyze and visualize the data that well. If they could improve that area, I think it would be really good. They should focus more on data visualization, that would be really great as I have experienced many issues relating to this.
Help documentation could be more user friendly. For instance, all ordinary manuals in R follow the same structure, with examples ready to be run and many times with the interpretation of the outputs. For some packages, R has the so-called “Vignettes”, with plenty of explanations and pictures, like in a book. I don’t think Weka has such examples. In Weka packages, documentation is not so “uniform”, not the same structure, as written by different (free style) authors.
If you were to open the software, there's a section written filter. Then you'd choose your filtering. The filter section lacks some specific transformation tools. If you want to change a variable from a numeric variable to a categorical variable, you don't have a feature that can enable you to change a variable from a numeric variable to a categorical variable. This needs to be improved. Also, when you go to classification, there are some cases in which, under any employed data, under the classification section that you can not actually use tests data alone or trend data alone. Under classification and clustering as well, they should give options to only supply when you're making classification or performing classification on a dataset, then there needs to be an option to either use at trend data first, and then you supply a test data later on. If they went full open-source, like Python and R, it would help the growth of the solution.
If you have one missing value in your dataset and this missing value belongs to a specific attribute and the attribute is a numeric attribute and there is only one missing data, whenever you import this data, the problem is that Weka cannot understand that this is a numeric field. It converts everything into a string, and there is no way to convert the string into numerical math. It's really very complicated. You will be lucky enough if you get clean data. Every time we get this kind of data with missing values, if we try to understand how many missing datasets there are if it is very less, we just remove this from the dataset itself before importing that. There is no use of algorithm pipelines. In Python, we create a pipeline. First, we use that kind of clustering algorithm, suppose K means clustering, based on that specific cluster, we can choose one cluster. And based on that cluster, we can implement an algorithm. This pipeline is missing in Weka. There is also a problem with the visualization. It only can do only two or three types of visualizations.
I believe there are a few newer algorithms that are not present in the Weka libraries. If I want to have a solution that involves deep learning, I don't think that Weka has that capability. In that case, I have to use Python to predict any algorithms based on deep learning.