(First posted October 4, 2012 in the
Breakthrough Analysis blog. )
Visual analytics leader TIBCO, with its September 25 launch of Spotfire 5.0 and announcement of a new Teradata alliance, wants analytics customers to have it both ways, promoting both in-memory analytics and, for the largest or deepest problems, push-down of calculations into the Teradata engine via a technique called in-database processing. Flexibility is good. In-memory processing speeds interactive-analysis response times which in-database analytics reduces data-access delays and calculation time, taking advantage of parallel processing by the the database management system (DBMS). For each of a large variety of analytical processing challenges, which approach works best?
On the one hand, the new analytics partnership — “Spotfire harnesses Teradata for executing complex calculations and predictive analytics in-database” – delivers “extreme data discovery and analytics.” On the other, newly launched Spotfire 5.0 is a “re-architected in-memory engine specifically built to enable users from across the enterprise to visualize and interact with massive amounts of data” that doubles down on the long-standing positioning of TIBCO Spotfire as “in-memory analytics software for next generation business intelligence.”
Next Generation BI and Database Systems
The next generation of business intelligence is indeed interactive and visual, typically involving iterative, exploratory analyses. Tableau and QlikTech, notably, compete with TIBCO on this front. And business is increasingly demanding real-time analysis of high-velocity data, without the latency involved in writing data streams to a database. These scenarios are tailor-made for in-memory processing. But here’s the rub. Next-gen BI also calls on huge volumes of diverse data and, often, the application of sophisticated computational algorithms, necessary to make sense of time series, geolocation, connection-network, and textual data. If you’re crunching high-volume data from social/mobile sources, and from certain species of machine/sensor-generated data, you may working beyond the responsiveness bounds of in-memory analytics… or you may not.
QlikView, for instance, can directly import social-sourced data via connectors from QlikTech partner company QVSource, and all three of the companies I’m mentioned work with ‘unified information access’ vendor Attivio to ingest data from a variety of unstructured sources that Attivio handles. And Teradata’s partner ecosystem includes text-analytics providers Attensity and Clarabridge, who software runs outside Teradata systems rather than in-database, just as SAP, in a tighter coupling, provides text capabilities via a data-services framework.
It’s a complex analytical-software world out there! We see that in-memory and in-database analytics occupy overlapping territory. How to choose the right approach in a bi-BI world? In what circumstances should you pull data from the external DBMS for those fast in-memory analyses — TIBCO touts the “two-second advantage” — and when should you push-down complex calculations and predictive analytics into an external database system?
TIBCO-partner Teradata provides a very worthy analytical DBMS, with parallel query processing, high reliability, low-latency data availability, broad data-management capabilities, and rich in-database analytics. So happens I wrote a paper for the company a couple of years back, Frequently Asked Questions about In-Database Analytics. (Teradata has paid me to write other papers, QlikTech is a consulting client, Attensity is a sponsor of my up-coming Sentiment Analysis Symposium – If you’re into social-media analytics, market research, or customer experience, check it out, October 30 in San Francisco – and I have done paid work for SAP and Sybase in the last year.) Teradata is not the only player in the game. Oracle and Microsoft SQL Server Analysis Services are additional external in-database analytics options with Spotfire, and DBMS options including EMC Greenplum, IBM Netezza, and SAP’s Sybase IQ all support in-database analytics unallied with TIBCO.
In-Memory vs. In-Database Guidance
A TIBCO-provided customer testimonial hints at one decision criterion. TIBCO references MGM Resorts International, a Spotfire and Teradata customer. “‘Being able to work with Spotfire directly connected to billions of data records through Teradata will greatly improve our ability to manage the Big Data dilemma,’ said Becky Wanta, Senior Vice President and Global Chief Technology & Innovation Officer,” as quoted by TIBCO. Focus on “billions of data records” and the word “manage.” In-database analytics in a centralized data store provides for shared-but-controlled access and robust administration, not only for data but also for database-embedded analytical routines, whether instantiated in SQL, custom code, or code libraries.
Consider another customer testimonial: Alan Falkingham, director, business intelligence at Procter & Gamble says, again quoted in the TIBCO release, “We are excited by the prospect of Spotfire 5.0 being able to efficiently analyze and visualize extreme data volumes by executing analytics directly within our database architecture.” Key-in here on “visualize” — visual analysis happens in the user-facing front-end, often as part of an iterative, exploratory process, where in-memory excels — where the efficiency is in handling heavy computations, that touch those “extreme data volumes,” close to the data, in the DBMS.
TIBCO Spotfire Vice President of Marketing Mark Lorion explained, “Being in-memory, there will still be limits based on specific machines configured and deployed… Our approach enables organizations to bridge between/distribute across the two architectural approaches to best fit the use case. This allows companies to leverage the benefits of in-memory freedom with the existing investments in data managed elsewhere.”
Still, how to decide what’s done in-database and what’s done in-memory? Lorien didn’t respond to a question I posed, asking the limits of the in-memory approach. I asked Gartner analyst Merv Adrian his take. It was, simply, the following: “I haven’t thought about it much. Anything that fits in memory ought to be done there, in my opinion. But the DBMS in-memory doing processing in-database would be ideal.” Ideal indeed, not currently doable so far as I know. SAP HANA is the most prominent in-memory database system, but while the HANA database has a column store option and handles multidimensional (OLAP) analyses, but I don’t believe you can embed serious analytics in-database. Similarly, Kognitio‘s in-memory analytical platform doesn’t support in-database analytics.
Putting aside capacity questions, analytical-routine availability is a key factor. You may not have a choice, if implementations of the algorithms you need are available or can be programmed in-memory or in-database but not both.
Heterogeneous Environments
Looking ahead, two premises I will state are that as tempting as it is to try to handle all work in-memory, disk-reliant analytical database systems — relational like Teradata and others I’ve cited, or NoSQL — will remain a corporate reality for a while to come, and that neither DBMS-embedded analytics nor in-memory analytics will completely meet enterprise BI needs anytime soon. Software companies such as TIBCO and Teradata, with duplicative predictive-analytics capabilities, would not partner if they were able to go it alone.
So you’re left with what we used to call systems analysis, and with experimenting, to see what works best given your own mix of data (volume, type, and pace), analytical routines, user workload, and management and administration requirements.
Enterprise IT environments are heterogeneous, meeting diverse demands. Flexibility and systems interoperability are musts for partial-solution vendors including both in-memory and DBMS-centered analytics providers. Flexibility and systems interoperability are must-haves for the earlier-generation analytics, the next, and the analytics generations after next as far as the eye can see.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
I used Spotfire few months ago and I noticed the following pros: it is a very intuitive design tool, once the user learn one thing, it's quite easy to use that knowledge to do something else. Marking visualization components is very intuitive and easy to understand. The tool is also very easy to customize. On the other hand, the GUI is poorly designed and difficult to use.