Column store and distributed processing is optimized for read access. We grew to 3000+ users with no impact.
Column store is a data compression technique for relational data. I’m using it now in SQL Server 2016. We configured a 16-core VM for handling requests on the DB. The recommendation was to separate inbound data packets into related chunks, which were 1/16th of the size.
This way, the import process could make full use of parallelization, and it worked. We imported 20 million rows of sales facts in less than 15 seconds, and the content was query-able immediately. I’ve never seen that before. This was impressive. This meant that we could completely rebuild the data warehouse to “current” from "scratch" within minutes, assuming that the data was in S3 already.
Tables that would typically be 2GB in size are now about 250MB. This means more data in memory. You can also configure the tables to live in the memory of all of the available cores. This is good for small dimension tables. You can also fragment them across all cores, for the larger fact tables. This allows for distributed query processing. Once you set it up, it just worked. It was all specified in the PG-SQL table statements.
There were two data centers in Sydney that were guaranteeing us a distributed solution. We really didn’t notice this. It was more of a check box situation. At one point, there was an outage at AWS, but it didn’t impact our operations directly.
This has given us the ability to provide metrics to the large number of company staff on their performance without impacting core systems.
I’d like to see these RedShift features arrive in other languages, such as SQL's ColumnStore index.
.
I have used this solution for three years.
There have been no stability issues.
Technical support always met my expectations.
I was on a team that was using AWS tools for Dick Smith Electronics (now liquidated). The tools ceased use in February of 2016.
Prior to that, we were using them fully for about 3 years. We loaded data to Redshift according to the best practices included in the online docs and through consultation with the AWS staff. The combination of S3 and Redshift for this purpose was very high in performance. Redshift was used to provide the data model to an instance of MicroStrategy for BI reporting.
We were using MicroStrategy, which generated all the SQL that our reporting services needed.
As such, I could only comment on the data engineering phase. Technically, this was so impressive that I don’t know what to add. I don’t recall feeling that it missed anything. If anything, I was not using all the available features. AWS documentation is great in this regard. You can tell they have put a lot of thought into it.
A lot of the future direction in database technology has to do with memory optimization and concurrency (VoltDB). This is more targeted towards transactional processing, and not data warehousing.
Memory-only data warehousing solves a lot of access issues without having to think too hard about the problem from the consumers' point of view. I am sure that you can already configure this.