Lumpers and splitters in data modeling

Starting a new thread in response to Ax's question in another thread...

This is a good summary of what went on in our data modeling sessions:

https://www.dbta.com/Columns/D...d-Conquer-83689.aspx

What I participated in was theoretically high level logical data modeling. Others were supposed to take the logical design to a physical implementation. But the logical modeling group consisted of a bunch of former programmers and DBAs, so our discussions often drifted into "how would you implement this?".

We were working on a Product model for financial instruments. We were using a model from Seer Technology as a starting point. As I recall, Seer's products were spun off from work originally done at Credit Suisse. Jon might be familiar with details.

Seer's model was moderately lumped and quite elegant. It had also been developed looking at the information from the perspective of a firm, and we were an exchange, so we were tweaking it.

The lumpers among us liked the elegance and future flexibility of a lumped model, but the splitters were on the side of the people who had to program and implement actual systems, namely a bunch of COBOL programmers who didn't necessarily have the very broad and deep understanding of the business that allowed them to understand why the lumped model made a lot of sense.

In one meeting, one of our lumpers went to the white board and put up the ultimate lumpers' data model. It had two entities, Object and Object Type, in a many-to-many relationship.

I thought it was perfect but that's not the model we ultimately ended up with.

--------------------------------
When the world wearies and society ceases to satisfy, there is always the garden - Minnie Aumônier

Posts: 38235 | Location: Somewhere in the middle | Registered: 19 January 2010

Nina

Pinta & the Santa Maria
Has Achieved Nirvana
Picture of Nina

posted

Hide Post

The urge to lump or split seems to exist in many situations. One that pops immediately to mind for me is classification statistics, such as cluster analysis. The goal of a cluster analysis is to look at multiple measures (data fields) for a group of observations, and see if you can determine which observations appear to be similar or dissimilar, based on the multiple measures you've analyzed. There are two broad-stroke ways to approach this. The lumper approach puts all the observations into a single cluster, then breaks the single cluster into two, then three, etc. The choice of how many clusters is made based on the variance explained at each stage. At some point, the improvement in explained variance is so small it's not worth it to add another cluster to your model as it really doesn't provide any additional insight. (FYI, if the # clusters = # observations then there is 100% explained variance.)

The splitters would start with a large number of clusters, and calculate clusters pretty much backward from the lumper approach: reduce the clusters by one and check the change in explained variance, rinse, repeat.

This is a very broad stroke explanation. There are also countless hybrids.

Posts: 35428 | Location: West: North and South! | Registered: 20 April 2005

Please Wait. Your request is being processed...

well-temperedforum.groupee.net

The Well-Tempered Forum

Off Key

Lumpers and splitters in data modeling

	View $GS_USERNAME's Public Profile
	Add $GS_USERNAME to my Buddies
	Add $GS_USERNAME to my Ignore ListRemove $GS_USERNAME from my Ignore List
	Invite $GS_USERNAME to a Private Topic
	View Recent Posts by $GS_USERNAME
	Notify me of New Posts by $GS_USERNAME

Quick Reply to: Lumpers and splitters in data modeling
Guest Name

Close \| Use Full Posting Form \| Quick Quote