Data has always been an important asset in financial services. Financial services institutions (FSIs) are constantly looking to extract more value from their data but struggle to capture, store, and analyze all the data due to their growing data volumes.

Data from digital customer touch points, transactions and regulatory reporting are increasing at an unprecedented rate, exploding from terabytes to petabytes and sometimes exabytes of data.

In this article, Fred Groen from Amazon Web Services covers some big data trends in the financial services industry and introduce the data lakehouse which can connect all of your data into a coherent whole.


​Traditional analytics approaches can’t handle huge data volumes because they don’t scale well enough and are too expensive. A growing variety of data sources is another challenge. Data is increasingly diverse. For example to personalize user engagement, banks now capture unstructured click-stream data from mobile applications to drive offers in addition to more traditional sources like transaction records.

Finance professionals using data in their day-to-day roles come from a huge variety of personas, from CEOs' monitoring performance of the business to customer support personnel investigating a failed transaction.  In short, many people from a wide variety of roles need access to data to make decisions. 

With the growth in personalization, as well as fraud detection and other modern applications in financial services that rely on data as the foundation, the ability to move data seamlessly is critical.

Against this backdrop of increasing data volumes and the desire to get the right data to the right people, an architectural pattern known as the data lakehouse has emerged.

​What is a data lakehouse?

A data lakehouse is an environment designed to combine the data structure and data management features of a data warehouse with the low-cost storage of a data lake.

A data lakehouse will help you quickly get insights from all of your data to all of your users and will allow you to build for the future so you can easily add new analytic approaches and technologies as they become available. Its foundation is a modern data architecture that enables teams to rapidly build a scalable data lake around a cloud storage solution.

Builders then have access to a broad and deep collection of purpose-built data services that provide the performance required for use cases like interactive dashboards for executives or personalization for data scientists.

Enabling teams to easily move data between the data lake and purpose-built data services allows for a connection between transactional systems and analytics. For example if a customer service engineer wants to investigate a failed transaction, they should be able to combine data from a data warehouse and the transactional system.


​With more people accessing data, setting up governance and compliance processes in a unified way to secure, monitor, and manage their access is important.

This architecture also allows teams to choose cloud services that give them low costs without compromising performance or scale to future proof the platform as data volumes grow in the future. 

We call this modern, cloud-based analytics architecture the lakehouse architecture. It’s not just about integrating your data lake and your data warehouse, but it’s about connecting your lake, your warehouse, and all your other purpose-built services into a coherent whole.

New data-intensive applications like data analytics, artificial intelligence and the Internet of things are driving huge growth in enterprise data. With this growth comes a new set of IT architectural considerations that revolve around the concept of data gravity.

Data gravity describes the effect that as data accumulates, there is a higher likelihood that additional services and applications will be attracted to the data.

​Contributed by:
Fred Groen, Amazon Web Services