Data Architect for Financial Risk Systems: Part 1 - Overview - Reducing data volume

Overview
An existing market risk system is processing 800-900 million rows daily. This number is made up of leaf node level data (pnl vector and sensitivities). Data is then aggregated up the hierarchy, as well as along dimensions, aggregated data is then persisted in the database.

The same leaf node data can belong to multiple hierarchies.
As users add more hierarchies, or dimensions, even with the same amount of raw data, increase in data volume is becoming insupportable.

Quite often, users do not get their VaR numbers until late afternoon, and when they get it, reporting performance is not very good.

Analysis

Daily leaf node data for both pnl and sensitivities amount to ~10 million rows
ETL is spending ages loading these rows and aggregating the data up the hierarchy and along dimensions
As a result daily partition size of the FACT tables are enormous
Development effort has been focused on reducing ETL time. System is ETL driven, rather than report driven.
Data model was originally designed as a data warehouse, however quickly turned into a hybrid system (denormalised FACT tables with a lot of normalised supporting tables)

Current thoughts

10 million rows seem to be a "tiny" amount
Rather than persisting the aggregated data, how about we perform aggregation on-the-fly.
Can the database cope with 20 concurrent requests to produce these aggregated data?
How about 100 requests?
What will the speed be like for a global report, where most of the leaf node data will have to be read?
With analytical functions required to produce VaR ranking, can the database cope with aggregation plus ranking analytics?

Watch this space...

Data Architect for Financial Risk Systems

Monday, 28 July 2008

Part 1 - Overview - Reducing data volume - aggregation on the fly

No comments:

Contributors

Blog Archive

Growing an IT company

IT works, the way you do