well-temperedforum.groupee.net    The Well-Tempered Forum  Hop To Forum Categories  Off Key    Lumpers and splitters in data modeling

Moderators: QuirtEvans, pianojuggler, wtg
Go
New
Find
Notify
Tools
Reply
  
Lumpers and splitters in data modeling
 Login/Join
 
Has Achieved Nirvana
Picture of wtg
posted
Starting a new thread in response to Ax's question in another thread...

This is a good summary of what went on in our data modeling sessions:

https://www.dbta.com/Columns/D...d-Conquer-83689.aspx

What I participated in was theoretically high level logical data modeling. Others were supposed to take the logical design to a physical implementation. But the logical modeling group consisted of a bunch of former programmers and DBAs, so our discussions often drifted into "how would you implement this?".

We were working on a Product model for financial instruments. We were using a model from Seer Technology as a starting point. As I recall, Seer's products were spun off from work originally done at Credit Suisse. Jon might be familiar with details.

Seer's model was moderately lumped and quite elegant. It had also been developed looking at the information from the perspective of a firm, and we were an exchange, so we were tweaking it.

The lumpers among us liked the elegance and future flexibility of a lumped model, but the splitters were on the side of the people who had to program and implement actual systems, namely a bunch of COBOL programmers who didn't necessarily have the very broad and deep understanding of the business that allowed them to understand why the lumped model made a lot of sense.

In one meeting, one of our lumpers went to the white board and put up the ultimate lumpers' data model. It had two entities, Object and Object Type, in a many-to-many relationship.

I thought it was perfect but that's not the model we ultimately ended up with. Big Grin


--------------------------------
When the world wearies and society ceases to satisfy, there is always the garden - Minnie Aumônier

 
Posts: 38223 | Location: Somewhere in the middle | Registered: 19 January 2010Reply With QuoteReport This Post
Pinta & the Santa Maria
Has Achieved Nirvana
Picture of Nina
posted Hide Post
The urge to lump or split seems to exist in many situations. One that pops immediately to mind for me is classification statistics, such as cluster analysis. The goal of a cluster analysis is to look at multiple measures (data fields) for a group of observations, and see if you can determine which observations appear to be similar or dissimilar, based on the multiple measures you've analyzed. There are two broad-stroke ways to approach this. The lumper approach puts all the observations into a single cluster, then breaks the single cluster into two, then three, etc. The choice of how many clusters is made based on the variance explained at each stage. At some point, the improvement in explained variance is so small it's not worth it to add another cluster to your model as it really doesn't provide any additional insight. (FYI, if the # clusters = # observations then there is 100% explained variance.)

The splitters would start with a large number of clusters, and calculate clusters pretty much backward from the lumper approach: reduce the clusters by one and check the change in explained variance, rinse, repeat.

This is a very broad stroke explanation. There are also countless hybrids.
 
Posts: 35428 | Location: West: North and South! | Registered: 20 April 2005Reply With QuoteReport This Post
  Powered by Social Strata  
 

    well-temperedforum.groupee.net    The Well-Tempered Forum  Hop To Forum Categories  Off Key    Lumpers and splitters in data modeling