Head of Data Science center of excellence at Ameriabank CJSC
Real User
Top 5
2024-09-23T07:34:00Z
Sep 23, 2024
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud solutions like Databricks can simplify the process, they may also be less cost-efficient.
Lead Data Scientist at a transportation company with 51-200 employees
Real User
Top 5
2024-08-05T11:22:30Z
Aug 5, 2024
I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten.
They provide an open-source license for the on-premise version. However, we have to pay for the cloud version including data centers and virtual machines.
It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project. If I propose using Spark for a project, one of the first questions I get from management is about the cost of Databricks Spark on the cloud platform we're using, whether it's Azure, GCP, or AWS. If we could reduce the collection, system conversion, and transformation network costs by even just 2% to 3%, it would be a significant benefit for us.
Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera. But in that case, you don't have any support. If you face a problem, you might find something in the community, but you cannot ask Cloudera about it. If you have open source, you don't have support, but you have a community. Cloudera has different packages, which are licensed versions of products like Apache Spark. In this case, you can ask Cloudera for everything.
Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free.
Chief Data-strategist and Director at Theworkshop.es
Real User
Top 10
2021-08-18T14:51:07Z
Aug 18, 2021
We use the open-source version. It is free to use. However, you do need to have servers. We have three or four. they can be on-premises or in the cloud.
Apache spark is available in cloud services like AWS cloud, Azure. We have to use the specific service for our use case. For example we can use AWS Glue which runs spark for ETL process, AWS EMR /Azurre data brick for on demand data processing in the cloud. Basically it depends on how much capacity we will processing the data. It is recommended to get started with minimal configuration and stop the services when not in use.
Managing Consultant at a computer software company with 501-1,000 employees
Real User
2020-02-02T10:42:14Z
Feb 2, 2020
The initial setup is straightforward. It took us around one week to set it up, and then the requirements and creation of the project flow and design needed to be done. The design stage took three to four weeks, so in total, it required between four and five weeks to set up.
Technical Consultant at a tech services company with 1-10 employees
Consultant
2019-12-23T07:05:00Z
Dec 23, 2019
I would suggest not to try to do everything at once. Identify the area where you want to solve the problem, start small and expand it incrementally, slowly expand your vision. For example, if I have a problem where I need to do streaming, just focus on the streaming and not on the machine learning that Spark offers. It offers a lot of things but you need to focus on one thing so that you can learn. That is what I have learned from the little experience I have with Spark. You need to focus on your objective and let the tools help you rather than the tools drive the work. That is my advice.
Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function...
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud solutions like Databricks can simplify the process, they may also be less cost-efficient.
I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten.
They provide an open-source license for the on-premise version. However, we have to pay for the cloud version including data centers and virtual machines.
Apache Spark is an open-source tool. It is not an expensive product.
The solution is moderately priced.
It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project. If I propose using Spark for a project, one of the first questions I get from management is about the cost of Databricks Spark on the cloud platform we're using, whether it's Azure, GCP, or AWS. If we could reduce the collection, system conversion, and transformation network costs by even just 2% to 3%, it would be a significant benefit for us.
It is an open-source solution, it is free of charge.
Apache Spark is an expensive solution.
Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera. But in that case, you don't have any support. If you face a problem, you might find something in the community, but you cannot ask Cloudera about it. If you have open source, you don't have support, but you have a community. Cloudera has different packages, which are licensed versions of products like Apache Spark. In this case, you can ask Cloudera for everything.
We are using the free version of the solution.
Licensing costs depend on where you source the solution.
It's an open-source product. I don't know much about the licensing aspect.
Spark is an open-source solution, so there are no licensing costs.
This is an open-source tool, so it can be used free of charge. There is no cost involved.
Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free.
We use the open-source version. It is free to use. However, you do need to have servers. We have three or four. they can be on-premises or in the cloud.
Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera.
I'm unsure as to how much the licensing is for the solution. It's not an aspect of the product I deal with directly.
Apache spark is available in cloud services like AWS cloud, Azure. We have to use the specific service for our use case. For example we can use AWS Glue which runs spark for ETL process, AWS EMR /Azurre data brick for on demand data processing in the cloud. Basically it depends on how much capacity we will processing the data. It is recommended to get started with minimal configuration and stop the services when not in use.
The initial setup is straightforward. It took us around one week to set it up, and then the requirements and creation of the project flow and design needed to be done. The design stage took three to four weeks, so in total, it required between four and five weeks to set up.
I would suggest not to try to do everything at once. Identify the area where you want to solve the problem, start small and expand it incrementally, slowly expand your vision. For example, if I have a problem where I need to do streaming, just focus on the streaming and not on the machine learning that Spark offers. It offers a lot of things but you need to focus on one thing so that you can learn. That is what I have learned from the little experience I have with Spark. You need to focus on your objective and let the tools help you rather than the tools drive the work. That is my advice.