Data virtualization for enabling the data driven org

Santhoshkumar P
3 min readMar 15, 2021

More than ever before, the modern era is seeing companies leaning towards data driven decision making. Here let us see how Data virtualization can be leveraged for enabling an organisation to be data driven.

Data virtualization

“Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view (or single view of any other entity) of the overall data.

Unlike the traditional extract, transform, load (“ETL”) process, the data remains in place, and real-time access is given to the source system for the data. This reduces the risk of data errors, of the workload moving data around that may never be used, and it does not attempt to impose a single data model on the data” - Wikipedia

https://www.clearpeaks.com/data-virtualization/

What is data driven decision making?

Data-driven decision-making (sometimes abbreviated as DDDM) is the process of using data to inform your decision-making process and validate a course of action before committing to it.

Steps to effectively make data-driven decisions

These steps can help you find the “who, what, where, when, and why” to make the most of data — for you, for colleagues, and the business.

  1. Identify business objectives
  2. Survey business teams for key sources of data
  3. Collect and prepare the data you need
  4. View and explore data
  5. Develop insight
  6. Act on and share your insight

https://www.tableau.com/learn/articles/data-driven-decision-making

Problem

In the process, step 3 is complex, when you have a diverse set of tools, systems, process, domains and so on. That’s where the Data platform / Data lake comes into picture, and building and maintaining them are not easy.

Challenges with data lakes

If the company is small to medium size, and has less of the tools/domains/data in the ecosystem, do we really need to spend a lot of resources to build a Data lake?

Solution

Data virtualization comes to your rescue, and is a possible saviour to keep progressing in the data driven journey. With the virtualization tools you can build a virtual Data platform then explore and build insights on top of it which is mentioned in step 4,5,6.

Benefits

  • Hides away all the complexity of multiple endpoints
  • Saves tons of resources while avoiding the Data Platform/Data Lake
  • Deliver business results very quickly
  • Real time data delivery
  • Centralized data governance and Security

Caveats

  1. Make sure the virtualization tools are getting data from API/files not from source database. Because if you want to evolve the data model in the database or change the database it will affect the consumers.
  2. If you like to build insights for the historical data, make sure it is available in the API/files. Usually the API will expose the current data.
  3. Support for streaming data in virtualization tools are limited, they can access only the latest events, so it’s better to store the data in a storage or expose the same data in a different interface like API/files.

Thanks to Rathinakumar P

I enjoy working at ThoughtWorks, and so would you. If you are interested to work on challenging problems, you may consider applying for ThoughtWorks here.

--

--