AWS Presales Solutions Architect at Escala 24x7 Inc.
Real User
Top 5
2024-09-26T16:31:00Z
Sep 26, 2024
Actually, there have been many improvements with the query editor (version two) and the serverless type of cloud cluster, which is great. The only minor issue I faced was that it took a bit longer than expected to change the cluster to have more space or storage. Otherwise, everything is great.
Solutions Architect at a hospitality company with 501-1,000 employees
Real User
Top 20
2024-09-26T05:51:00Z
Sep 26, 2024
There are no significant issues preventing us from doing our tasks. However, there might be some limitations from a business intelligence perspective, but nothing we can't find a workaround for.
It is not easy to deal with queries in Amazon Redshift and see the same data. In our company, we can easily create a query, but the problem is that when we are doing the extractions, we need to do them for analytics purposes, and that is when Amazon Redshift gets a little bit slow. If any tools are available in the product for analytics purposes, it would be good. If we are able to integrate the tool into Excel or something, it will be good. MySQL and some other databases can be directly integrated into Excel for analytics purposes, and if Amazon Redshift could have such functionalities, it would be very easy for analytics.
Owner at a comms service provider with 1,001-5,000 employees
Real User
Top 20
2024-06-11T12:54:58Z
Jun 11, 2024
I do not like the product. It is difficult to work with unstructured data and JSON documents. It would be great if the solution could handle unstructured data more easily. It will be useful if the tool provides support for vector operations.
Senior Data Engineer at a computer software company with 201-500 employees
Real User
Top 5
2024-05-09T20:47:02Z
May 9, 2024
Sometimes, it's difficult to get the metadata from Redshift. The product has too many layers to get simple information. Redshift does not have primary-key tools. The vendor must consider adding them.
Senior Data Platform Manager at a manufacturing company with 10,001+ employees
Real User
Top 5
2024-04-10T16:56:22Z
Apr 10, 2024
In terms of improvement, I believe Amazon Redshift could work on reducing its costs, as they tend to increase significantly. Additionally, there are occasional issues with nodes going down, which can be problematic. We often encounter issues like someone dropping a column or changing the order of columns, which can cause synchronization problems when pushing data through our pipeline. It's a minor issue, but it can be annoying.
It would be good to see Redshift as a serverless offering. The proposition may be unclear, but at the time, there were certain limitations with the pay-as-you-go offering. However, a serverless offering would be more flexible on-demand pricing, which would be good to see because Redshift is not expensive, but I always have to buy a new server if I need more computing than I have. Setting up a new server is an easy task, but it would be better if I could scale my Redshift cluster up or down as needed; still, there is a need for manual control. For example, my analyst team is working on a job that requires a lot of computing and is only needed for this month, week, or even today. The job should scale up and down automatically, but it is not yet fully developed.
Senior Director Data Architecture at Managed Markets Insight & Technology, LLC
Real User
Top 5
2023-07-27T20:51:51Z
Jul 27, 2023
As our scalability requirements and data growth exceeded expectations, Redshift didn't scale up to meet our business needs. So, at that point, we made a switch to Snowflake, which provided the scalability we needed. So, scalability is one area of improvement. One area where Amazon Redshift could improve is in adopting the compute-separate, data-separate architecture, which Delta, Snowflake are adopting, and a few others in the cloud data warehouse spectrum. Although Redshift introduces Aqua to achieve some level of scalability, I still feel that when it comes to scaling up, whether it's vertical scaling or horizontal scaling, there is a noticeable amount of downtime for end consumers. So, if I need to switch from DC1 to DC2, or from one compute/storage optimization to another, I have to bring down the entire cluster and then bring it back up. That's a pain point.
When compared to Snowflake, Amazon Redshift does not have the capability to dynamically increase the VM file. However, Amazon Redshift provides a virtual database called 'VW' that allows you to increase the size of the warehouse to run faster on a monthly basis without changing anything. This feature is not available in Redshift. So it's a limitation of Redshift. It's not possible to immediately increase the virtual warehouse size in Amazon Redshift. When compared to Snowflake, we cannot increase the virtual warehouse size in Redshift.
Specifically, with Redshift Spectrum, some SQL commands have to run on Redshift instead of Redshift Spectrum, which slows down a few things in the tool. In the solution, user-based access is quite hard. In general, certain permissions are difficult to manage. The solution is expensive compared to Snowflake. I have used Snowflake because they're cheaper than Amazon Redshift. Some of the commands are run on Redshift itself, and some of the commands are completed by Spectrum, which is problematic. However, it is faster when computed by Spectrum.
Redshift's serverless technology needs to improve because not everyone is technically inclined. Organizations want to quickly access and import data into their data warehouse without hassle. Redshift's ETL tool, Glue, is not seamlessly integrated with Redshift. I've encountered many instances where it couldn't fetch the perfect data type from the source, which should be intuitive. Snowflake's ETL tool, on the other hand, is more intuitive and seamless.
During our last office project, Redshift couldn't perform well even for a data size of 6 TB. Thus, compared to Teradata and Snowflake, the solution needs to work faster. They should extend the plan by including better optimization and readability as we get while using Teradata. Also, they should provide zero-copy coding and sharing facilities.
Senior Data Scientist at a tech services company with 51-200 employees
Real User
Top 10
2023-03-09T22:01:00Z
Mar 9, 2023
They should provide a structured way to work with interim data than to store it in parquet files locally. Also, Redshift is unwieldy. There should be better integration between Python and Redshift. It could be more accessible without using so many sequels. They should make writing and reading the data frames into and from Redshift easier. The performance could be better. I have used Redshift for extensive queries. For the large tables, it's easier to unload to Redshift, but subquery tables that run complex grids are slower for configuration. I have to use the unloaded command to unload the whole table. Further, I have to read the table into a server with extensive memory in Python and process the data ahead. It's not optimal.
AWS Snowflake has a very good feature for cloning databases. It makes it easy to clone a data warehouse, which is useful. I would like to see this feature in Redshift.
I would like Amazon Redshift to improve its performance, analytics, scalability, and stability. Other than these points, I am not aware of any other areas to address since Amazon provides a variety of independent services for their customers to choose from, and if one were to express dissatisfaction with Amazon Redshift, Amazon would likely suggest AWS Glue as an alternative. Similarly, if another issue arose, Amazon might recommend Amazon RDS. There are a lot of things they try to upsell to you, each with its own pros and cons and in different packages offering different perks. So, it all depends on your business needs and what you choose for your business. I wouldn't criticize Amazon for this because they have created packages tailored to their customer's needs, which helps to prevent customers from looking elsewhere.
IoT Consultant at a computer software company with 1,001-5,000 employees
Consultant
Top 20
2022-12-13T14:45:00Z
Dec 13, 2022
Pricing sometimes depends on the setup (key, etc.) which makes it hard for somebody new to AWS. Detailed research has to be conducted to end up with a competitive solution in terms of pricing and performance.
Data Analyst at a tech vendor with 51-200 employees
MSP
2021-12-27T19:44:47Z
Dec 27, 2021
I cannot state which features of the solution are in need of improvement, since those which we make use of have not changed. The solution has four maintenance windows so, when it comes to stability, I think it would be better to decrease their number. I rate the solution as an eight out of ten because it is not 100 percent, as there is much service-related maintenance required. It would be nice to see support for the usual LTP features, such as those involving traditional processing.
Senior Solutions Architect at a retailer with 10,001+ employees
Real User
2021-08-07T10:41:15Z
Aug 7, 2021
Planting is the primary key enforcement that should be improved but there is probably a reason that they don't follow the reference architecture. It means they are creating clones of the data shading. Cost control measures could be improved along with added transparency.
Data Engineer at a tech services company with 51-200 employees
Real User
2021-07-02T07:10:42Z
Jul 2, 2021
We recently moved from the DC2 cluster to the RA3 cluster, which is a different node type and we are finding some issues with the RA3 cluster regarding connection and processing. There is room for improvement in this area. We are in talks with AWS regarding the connection issues. In an upcoming release, I would like to have a Snowflake-like feature where we can create another cluster in the same data warehouse, with the same data. You can create a different cluster and compute nodes for each of your use cases, for retail, and for your data analyst all while keeping your underlying data safe. Additionally, the cluster resize process takes down the cluster for too long, approximately 15 minutes. There are limitations to the size, you can resize only by a multiplier of two, for example, if you have four nodes then you can either go to eight nodes or you can come down to two nodes. There should be fewer limitations.
Improvement could be made in the area of streaming data. The capability can definitely be improved. There are other products like Kinesis which is a separate service we use for streaming data ingestion. Whatever features are missing in Redshift, they have separate sources but if there were the feasibility to ingest real-time streaming data directly into RedShift, that would be very useful.
Redshift is a multi-tier engine that works like a calculator. There is some missing functionality and sometimes it's so difficult to work in. We need to convert these functionalities using VACUUM inside Amazon Redshift and then it causes some complexity. Sometimes I'd like for them to support some special features or some special installations because we need automatic populations. I would like to see more programming outside of the cloud. I would like to see more functionalities under JSON files. the only functionality that they have now with JSON is reports. I would also like to see other data sources like MongoDB.
We have had some challenges with respect to considering some of the high-end availability architecture for production. We don't find many issues now, but initially, we had some challenges. This is an older product, so when it comes to usability, it requires a technical person to work with it. It requires a specialist and a good business case to work on it. It has to be a little more user-friendly than what it is today. In our experiments, the handling of unstructured data was not very smooth.
Cloud & Data - practice leader at Micropole Belgium
Real User
2020-07-19T08:15:38Z
Jul 19, 2020
I would like a better way to ingest data in realtime because there is a bit too much latency. There are too many limitations with respect to concurrency. It is now possible to auto-scale it, although that is still slow. It could offer smaller nodes with decoupling of storage and processing because for the moment, the only nodes available to work that way are huge, and for large companies.
The OLAP slide and dice features need to be improved. For example, if a business wants to bring in a general ledger from an ERP, they want to slice and dice the data. What we have found is that they have a lot of formulas that are used to calculate metrics, so what we do is use SQL Server Analysis Services. The question then becomes one of adopting a single vendor and transitioning to Azure. If Redshift had similar capabilities then it would be very good.
Service Manager & Solution Architect at a logistics company with 10,001+ employees
Real User
2020-06-17T10:55:59Z
Jun 17, 2020
The managing updates, deletes, and role-level change performance is very low. For example, while you are doing inserts, updates, deletes, and amalgamates, the performance is very, very poor. If you want to query the database after you have a lot of terabytes of data, the load, performance-wise, is very low. Looking at the performance of the query, querying the database, and especially with the amalgamates when it is getting updated, it is really poor. We like this solution and have tried all of the native services; they were working quite well. The only concern about Redshift was managing the cluster, especially the EMR cluster. Our company policy was not to use EMR clusters, especially with the nodes failing. There were many instances of downtime happening. Essentially, there was too much data traffic. The other drawback was the CDC, as we do not have any tools that can support it. Creating the structure is easy on the DDL side, but after you create the table and you want to transform the data to store it in a database, the performance is poor. It takes a lot of time to ingest and update the data. After you ingest the data and someone wants to fetch it in the table, it takes a lot of time performance-wise to return the results.
It would be useful to have an option where all of the data can be queried at once and then have the result shown. As it is now, when we run a query and we are looking at the results, part of the data remains to be processed at the back end. That works very well, but in some cases, we require the whole data to be queried at once and then have the results shown. We have not faced many use cases where it would have been useful, but in one or two, we used other methods to achieve this goal. When our clients contact customer support, they don't want to speak with a machine. Instead, they want to chat with a real person who can provide a solution. Customer service bots can provide solutions but they cannot understand our problems.
Senior System Engineer at Infosys Technologies Ltd
Real User
2020-01-12T07:22:00Z
Jan 12, 2020
Pricing is one of the concerns that I have because if you compare Snowflake with Redshift, it provides some of the same services, but at a much cheaper rate. So pricing is one of the things that it could improve. It should be more competitive. Otherwise, everything else looks good, especially the data storage and analytical processes.
From my perspective, the product could be improved by making it more flexible. There are now more flexible products on the market that allow for expandability and dynamic expansion as the market changes with regard to data warehouses. Although the product is simple to use there can be problems. If you declare some unique key in a column and then store it, the database is going to believe this is what you have and results will be distorted. It's fine if the query is simple but if it's complex or you have too many queries per hour, it can create a bottleneck for Redshift and then you can't return and recover. It requires some fine-tuning. For additional features, I would like to see support for partitions, it doesn't exist yet as a feature. It's quite an important issue when you're dealing with large databases. Also, I believe the product needs improvement in parallel threading to support more database users without jeopardizing performance.
Compatibility with other products, for example, Microsoft and Google, is a bit difficult because each one of them wants to be isolated with their solutions. That's a big problem now.
What is Amazon Redshift?
Amazon Redshift is a fully administered, petabyte-scale cloud-based data warehouse service. Users are able to begin with a minimal amount of gigabytes of data and can easily scale up to a petabyte or more as needed. This will enable them to utilize their own data to develop new intuitions on how to improve business processes and client relations.
Initially, users start to develop a data warehouse by initiating what is called an Amazon Redshift cluster or a set of...
Actually, there have been many improvements with the query editor (version two) and the serverless type of cloud cluster, which is great. The only minor issue I faced was that it took a bit longer than expected to change the cluster to have more space or storage. Otherwise, everything is great.
There are no significant issues preventing us from doing our tasks. However, there might be some limitations from a business intelligence perspective, but nothing we can't find a workaround for.
It is not easy to deal with queries in Amazon Redshift and see the same data. In our company, we can easily create a query, but the problem is that when we are doing the extractions, we need to do them for analytics purposes, and that is when Amazon Redshift gets a little bit slow. If any tools are available in the product for analytics purposes, it would be good. If we are able to integrate the tool into Excel or something, it will be good. MySQL and some other databases can be directly integrated into Excel for analytics purposes, and if Amazon Redshift could have such functionalities, it would be very easy for analytics.
I do not like the product. It is difficult to work with unstructured data and JSON documents. It would be great if the solution could handle unstructured data more easily. It will be useful if the tool provides support for vector operations.
The high price of the product is an area of concern where improvements are required.
The product must provide new indexes that support special data structures or data types like TEXT.
Sometimes, it's difficult to get the metadata from Redshift. The product has too many layers to get simple information. Redshift does not have primary-key tools. The vendor must consider adding them.
In terms of improvement, I believe Amazon Redshift could work on reducing its costs, as they tend to increase significantly. Additionally, there are occasional issues with nodes going down, which can be problematic. We often encounter issues like someone dropping a column or changing the order of columns, which can cause synchronization problems when pushing data through our pipeline. It's a minor issue, but it can be annoying.
The product must become a bit more serverless. Users should have to pay only for the resources they consume.
It would be good to see Redshift as a serverless offering. The proposition may be unclear, but at the time, there were certain limitations with the pay-as-you-go offering. However, a serverless offering would be more flexible on-demand pricing, which would be good to see because Redshift is not expensive, but I always have to buy a new server if I need more computing than I have. Setting up a new server is an easy task, but it would be better if I could scale my Redshift cluster up or down as needed; still, there is a need for manual control. For example, my analyst team is working on a job that requires a lot of computing and is only needed for this month, week, or even today. The job should scale up and down automatically, but it is not yet fully developed.
As our scalability requirements and data growth exceeded expectations, Redshift didn't scale up to meet our business needs. So, at that point, we made a switch to Snowflake, which provided the scalability we needed. So, scalability is one area of improvement. One area where Amazon Redshift could improve is in adopting the compute-separate, data-separate architecture, which Delta, Snowflake are adopting, and a few others in the cloud data warehouse spectrum. Although Redshift introduces Aqua to achieve some level of scalability, I still feel that when it comes to scaling up, whether it's vertical scaling or horizontal scaling, there is a noticeable amount of downtime for end consumers. So, if I need to switch from DC1 to DC2, or from one compute/storage optimization to another, I have to bring down the entire cluster and then bring it back up. That's a pain point.
When compared to Snowflake, Amazon Redshift does not have the capability to dynamically increase the VM file. However, Amazon Redshift provides a virtual database called 'VW' that allows you to increase the size of the warehouse to run faster on a monthly basis without changing anything. This feature is not available in Redshift. So it's a limitation of Redshift. It's not possible to immediately increase the virtual warehouse size in Amazon Redshift. When compared to Snowflake, we cannot increase the virtual warehouse size in Redshift.
Specifically, with Redshift Spectrum, some SQL commands have to run on Redshift instead of Redshift Spectrum, which slows down a few things in the tool. In the solution, user-based access is quite hard. In general, certain permissions are difficult to manage. The solution is expensive compared to Snowflake. I have used Snowflake because they're cheaper than Amazon Redshift. Some of the commands are run on Redshift itself, and some of the commands are completed by Spectrum, which is problematic. However, it is faster when computed by Spectrum.
Redshift's serverless technology needs to improve because not everyone is technically inclined. Organizations want to quickly access and import data into their data warehouse without hassle. Redshift's ETL tool, Glue, is not seamlessly integrated with Redshift. I've encountered many instances where it couldn't fetch the perfect data type from the source, which should be intuitive. Snowflake's ETL tool, on the other hand, is more intuitive and seamless.
During our last office project, Redshift couldn't perform well even for a data size of 6 TB. Thus, compared to Teradata and Snowflake, the solution needs to work faster. They should extend the plan by including better optimization and readability as we get while using Teradata. Also, they should provide zero-copy coding and sharing facilities.
They should provide a structured way to work with interim data than to store it in parquet files locally. Also, Redshift is unwieldy. There should be better integration between Python and Redshift. It could be more accessible without using so many sequels. They should make writing and reading the data frames into and from Redshift easier. The performance could be better. I have used Redshift for extensive queries. For the large tables, it's easier to unload to Redshift, but subquery tables that run complex grids are slower for configuration. I have to use the unloaded command to unload the whole table. Further, I have to read the table into a server with extensive memory in Python and process the data ahead. It's not optimal.
AWS Snowflake has a very good feature for cloning databases. It makes it easy to clone a data warehouse, which is useful. I would like to see this feature in Redshift.
I would like Amazon Redshift to improve its performance, analytics, scalability, and stability. Other than these points, I am not aware of any other areas to address since Amazon provides a variety of independent services for their customers to choose from, and if one were to express dissatisfaction with Amazon Redshift, Amazon would likely suggest AWS Glue as an alternative. Similarly, if another issue arose, Amazon might recommend Amazon RDS. There are a lot of things they try to upsell to you, each with its own pros and cons and in different packages offering different perks. So, it all depends on your business needs and what you choose for your business. I wouldn't criticize Amazon for this because they have created packages tailored to their customer's needs, which helps to prevent customers from looking elsewhere.
Pricing sometimes depends on the setup (key, etc.) which makes it hard for somebody new to AWS. Detailed research has to be conducted to end up with a competitive solution in terms of pricing and performance.
The customer support could be more responsive.
Infinite storage is available in Snowflake and is not available in Redshift. Analytical tools for integration would be helpful in the future.
I would like to see improvement in the pricing and the simplicity of using this solution.
Redshift's GUI could be more user-friendly. It's easier to perform queries and all that stuff in Azure Synapse Analytics.
The technical support should be better in terms of their knowledge, and they should be more customer-friendly.
I cannot state which features of the solution are in need of improvement, since those which we make use of have not changed. The solution has four maintenance windows so, when it comes to stability, I think it would be better to decrease their number. I rate the solution as an eight out of ten because it is not 100 percent, as there is much service-related maintenance required. It would be nice to see support for the usual LTP features, such as those involving traditional processing.
Amazon should provide more cloud-native tools that can integrate with Redshift like Microsoft's development tools for Azure.
Planting is the primary key enforcement that should be improved but there is probably a reason that they don't follow the reference architecture. It means they are creating clones of the data shading. Cost control measures could be improved along with added transparency.
We recently moved from the DC2 cluster to the RA3 cluster, which is a different node type and we are finding some issues with the RA3 cluster regarding connection and processing. There is room for improvement in this area. We are in talks with AWS regarding the connection issues. In an upcoming release, I would like to have a Snowflake-like feature where we can create another cluster in the same data warehouse, with the same data. You can create a different cluster and compute nodes for each of your use cases, for retail, and for your data analyst all while keeping your underlying data safe. Additionally, the cluster resize process takes down the cluster for too long, approximately 15 minutes. There are limitations to the size, you can resize only by a multiplier of two, for example, if you have four nodes then you can either go to eight nodes or you can come down to two nodes. There should be fewer limitations.
Improvement could be made in the area of streaming data. The capability can definitely be improved. There are other products like Kinesis which is a separate service we use for streaming data ingestion. Whatever features are missing in Redshift, they have separate sources but if there were the feasibility to ingest real-time streaming data directly into RedShift, that would be very useful.
Redshift is a multi-tier engine that works like a calculator. There is some missing functionality and sometimes it's so difficult to work in. We need to convert these functionalities using VACUUM inside Amazon Redshift and then it causes some complexity. Sometimes I'd like for them to support some special features or some special installations because we need automatic populations. I would like to see more programming outside of the cloud. I would like to see more functionalities under JSON files. the only functionality that they have now with JSON is reports. I would also like to see other data sources like MongoDB.
We have had some challenges with respect to considering some of the high-end availability architecture for production. We don't find many issues now, but initially, we had some challenges. This is an older product, so when it comes to usability, it requires a technical person to work with it. It requires a specialist and a good business case to work on it. It has to be a little more user-friendly than what it is today. In our experiments, the handling of unstructured data was not very smooth.
I would like a better way to ingest data in realtime because there is a bit too much latency. There are too many limitations with respect to concurrency. It is now possible to auto-scale it, although that is still slow. It could offer smaller nodes with decoupling of storage and processing because for the moment, the only nodes available to work that way are huge, and for large companies.
The OLAP slide and dice features need to be improved. For example, if a business wants to bring in a general ledger from an ERP, they want to slice and dice the data. What we have found is that they have a lot of formulas that are used to calculate metrics, so what we do is use SQL Server Analysis Services. The question then becomes one of adopting a single vendor and transitioning to Azure. If Redshift had similar capabilities then it would be very good.
The managing updates, deletes, and role-level change performance is very low. For example, while you are doing inserts, updates, deletes, and amalgamates, the performance is very, very poor. If you want to query the database after you have a lot of terabytes of data, the load, performance-wise, is very low. Looking at the performance of the query, querying the database, and especially with the amalgamates when it is getting updated, it is really poor. We like this solution and have tried all of the native services; they were working quite well. The only concern about Redshift was managing the cluster, especially the EMR cluster. Our company policy was not to use EMR clusters, especially with the nodes failing. There were many instances of downtime happening. Essentially, there was too much data traffic. The other drawback was the CDC, as we do not have any tools that can support it. Creating the structure is easy on the DDL side, but after you create the table and you want to transform the data to store it in a database, the performance is poor. It takes a lot of time to ingest and update the data. After you ingest the data and someone wants to fetch it in the table, it takes a lot of time performance-wise to return the results.
It would be useful to have an option where all of the data can be queried at once and then have the result shown. As it is now, when we run a query and we are looking at the results, part of the data remains to be processed at the back end. That works very well, but in some cases, we require the whole data to be queried at once and then have the results shown. We have not faced many use cases where it would have been useful, but in one or two, we used other methods to achieve this goal. When our clients contact customer support, they don't want to speak with a machine. Instead, they want to chat with a real person who can provide a solution. Customer service bots can provide solutions but they cannot understand our problems.
Pricing is one of the concerns that I have because if you compare Snowflake with Redshift, it provides some of the same services, but at a much cheaper rate. So pricing is one of the things that it could improve. It should be more competitive. Otherwise, everything else looks good, especially the data storage and analytical processes.
From my perspective, the product could be improved by making it more flexible. There are now more flexible products on the market that allow for expandability and dynamic expansion as the market changes with regard to data warehouses. Although the product is simple to use there can be problems. If you declare some unique key in a column and then store it, the database is going to believe this is what you have and results will be distorted. It's fine if the query is simple but if it's complex or you have too many queries per hour, it can create a bottleneck for Redshift and then you can't return and recover. It requires some fine-tuning. For additional features, I would like to see support for partitions, it doesn't exist yet as a feature. It's quite an important issue when you're dealing with large databases. Also, I believe the product needs improvement in parallel threading to support more database users without jeopardizing performance.
Running parallel queries results in poor performance and this needs to be improved.
The speed of the solution and its portability needs improvement.
Compatibility with other products, for example, Microsoft and Google, is a bit difficult because each one of them wants to be isolated with their solutions. That's a big problem now.
In the next release, a pivot function would be a big help. It could save a lot of time creating a query or process to handle operations.