2 min read · Mar 28, 2022
--
Lets review the concept of data virtualization in modern data architecture and the major products available in the market.
What Data Virtualization Is
What Data Virtualization Is Not
Why Do we need it?
Why do we need it? Are existing data integration and data federation technologies not enough?
Does Data Virtualization and Data Fabric refer to the same thing?
How does it fit in a Data Mesh architecture?
Major drivers
breaking data silos in the organisation, promote self-service culture, enforce data governance. There are six essential capabilities that a data virtualization product should offer [Reference: Denodo]
- Data abstraction by a business-driven semantic layer
- No data replication or relocation
- provides real time information
- enables self-service data services
- centralized metadata, security and governance
- location-agnostic architecture (multi-cloud, hybrid)
Data Sources
RDBMS, NoSQL databases, unstructured or semi-structured data (delimited files, XML, JSON, parquet, ORC formats), web service, ESB message queue or a Kafka topic, REST API, MDX cube
These data sources can be hosted on-premise or any public or private cloud infrastructure, in any location in a hybrid or multi-cloud architecture.
How does other associated aspects get managed in these patterns?
- Data governance and security
- self-service
- Performance
- Flexibility
Semantic Layer capability in BI products
We can create a semantic layer, which is understandable and common across business community. BI products like Power BI, Tableau or Microstrategy e.g. support this, over the physical database tables and views.
Denodo
IBM Cloud Pak for Data
TIBCO Data Virtualization
Starburst
Uses Trino (earlier Presto SQL), a distributed SQL query engine under the hood.
SAP HANA
Refer this link
Oracle Data Service Integrator
Informatica Data Virtualization
Refer the documentation
Data Virtuality
The youtube video by Denodo
Data Virtualization for Business Intelligence Systems By Rick van der Lans