“Big Data” as it is generally defined as:
A large voluminous, fast changing, multi-structured dataset, generally distributed in nature, non-EDW compliant (mix of RDBMS/non-RDBMS) dataset
Due to the nature and flow of data, it has associated issues and complexities like performance, governance and analytics etc…
In general discussions, “Big Data” term is generally used analogous to Hadoop infrastructure, but in this discussion, we will assume Big Data as a principle rather than a current technology/solution.
“Qlikview” on other hand, is In-memory Business Intelligence Solution able to handle very large datasets and advocates its user friendliness and intuitiveness, shorter development cycles and analytical capabilities among other things…
Now once, Big Data/Qlikview introduction is clear, let’s see these two bit closely and together
Following charts generally lists 3 aspects of Qlikview/Big Data
|
Big Data
|
Qlikview
|
Configuration
|
Distributed
|
It is distributed in nature… b) Multiple Filesystems c) Multiple OS
|
Traditionally Single instance of Qlikview Application is a) Single Server b) Single OS
|
Properties
|
Volume
|
Very large Volume (terabytes or more)
|
Large Volume (20-100 of Gigabytes is not unheard of)
|
Velocity
|
Data speed is varied, based on operation… Click Stream data, RFID data is real time
|
Data Frequency(speed) is generally mini-batch with small delays
|
|
|
Data is generally pre-defined datatype, and most of datasets in general occurrence is Date, text, numbers etc.
|
Implication
|
System
|
Generally data is large for a single DB/Filesystem to handle, so data is generally distributed across a number of systems. FYI – The system hides this complexity from end-user Datasize is unlimited in this case
|
Generally data from multiple datasets, since memory based, it is limited by max memory applicable on server
|
Data Relationship
|
Dataset is traditionally non-EDW (images/videos/blogs/web logs etc)
|
Generally a Dimensional model
|
|
With big data comes, problems like performance, management and analytics. There are tools which are improving in these areas
|
In memory solution, allow fast performant reporting and analytics user response
|
Quite obvious there are number of ways, where the systems are designed differently and surely for different purposes, Does it make Qlikview and Big Data as non-compatible solutions or two applications with own strengths and perfect partners when used in conjunction can deliver solution the business needs?
The answer of this question lies in answering the bigger question… “What does business wants and what does IT have?”
With more and more mountain of information flowing in and new nomenclature becoming more common in regularly use. (Mega/Giga/Tera) being now replaced by (Peta/Exa/Zetta). New technologies and more processing power coming to fore, surely underlining the validity of Moore’s law.
What is more important is not what this data holds, but what useful or relevant information it has and which can be used efficiently. Information is Gold; it has to be treated that way. But while looking at information, we need to understand and look at another analogy, which comes from mining of precious metals. While searching for Gold, it is not like that miners find the strips of Gold and they don’t. It is more likely that Gold may not be visible to naked eye and need sophisticated processes to find and process Gold. A terminology Strip Ratio (1 ppm for Gold) defines amount of waste need to be processed to find area of your interest
So new technologies allow exposure to more and more data with different “strip ratio” and to make it as information, as John mentioned in this article Qlikview and Big Data: It’s all about Relevance. The information is about relevance, having more and more information on users desk or computer, will not make users any more capable and knowledgeable and geared toward making right decisions, but what is needed, an organized, structured view on that information set, where information which is relevant to the user is presented in way, the user can utilize.
Major critics of In-memory based solution generally come out with this argument – “Memory/Hardware/OS Limitation” and hence its (Qlikview) in-ability to handle this Big Data in single application.
Picking the comparison from Big Data itself, technical reasons were governing reason of distributed nature and Architecture of Big Data.
I see no surprise if same principle is applied across BI and Analytics with distributed Apps as answer to it, with Business driving the data distribution split rationale.
As ideally Business doesn’t need a single application, showing them universe of data, but wanted to see the data of their universe in single forum, so on top of well distributed dataset, there can be array of Qlikview apps serving specific business needs or drive.
So revamping some of the definitions:
Hadoop (HDFS) – Data is being global, distributed and varied and large, it is split into multiple subsets, location and copies for multiple usage…
Qlikview (BI) – Information is being global, distributed and varied and large, it is split into multiple apps, location and copies for multiple usage…
The new world order
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Good to note your experience. Can you help me understand the reasons on "why your business requested a move from Tableau to Qlik ?"