Data Architect for Financial Risk Systems: August 2008

Current model
I have spent a few days now looking at the existing data model. The model mainly consists of ETL related objects (normalised - 3NF), and reporting tables (wide denormalised). Access to reporting tables are done by joining various ETL tables to the reporting tables, resulting in very complex expensive joins.

Where we want to be
We want to reintroduce the basic fundamentals of data warehousing here, without introducing too many changes, as this is a live system, where a lot of investment has been made. Try to protect existing investment by reusing as many components as possible.

The key things that I would like to re-emphasise are:

ETL tables should be off-limits to the business. No users or reports should query these ETL tables.
we need to move from existing model to a star schema.

OK, so let's now braindump what's important here.

FACT tables

candidate for FACT tables (pnl vector, sensitivities, limits)
asset class covered will be IR, FX, CS, AB (asset backed) and EQ. Will all the data received be of the same grain? i.e. will they all be additive?
Make sure FACT table design is REPORT driven, not ETL driven. While there is pressure to be able to load data as fast as possible, remember that the success of this risk system will not be judged by how fast the data can be pumped in, but how quickly users can pull the data out in a timely fashion, as well as whether data can be presented in a useful format for users
Currently I have identified about 30 attributes used to store these data in its current format
Remembering that we will try to perform aggregation on the fly, it would be preferable to ensure that FACT tables stay as small as possible. One way to achieve this is by moving the common attributes out to common dimension tables, and introduce use of surrogate keys.

DIMENSION tables

Dr Kimball suggests 15-25 dimension tables max. I think for this purpose I want to aim for no more than 10.

I have a feeling that we will not be able to create a pure star schema, as there are some complex data structure which will force me to snowflake the design. OK, wll try to minimise this

Common dimensions can be identified as follows

Date
Hierarchy
Asset Class/Accounting Type and grouping
Currency and currency group
Bucket detail information
Product details
IR/FX detailed attributes
CS/AB detailed attributes
EQ detailed attributes
Limit details

Data Architect for Financial Risk Systems

Monday, 4 August 2008

DW 2.0: The Architecture for the Next Generation of Data Warehousing

Saturday, 2 August 2008

Part 2 - data model analysis

Contributors

Blog Archive

Growing an IT company

IT works, the way you do