Tuesday, July 21, 2009

Data Modeling

We're in our 9th unit now and to get a better grasp of what it takes to put together a solid repository we've been going over data models and entity-relationship models. On the service it makes a lot of sense but in practice is quite challenging. When you have to stop and think about how attributes relate to a particular object or entity and how to best break that down for database management you can quickly get frustrated.

The areas I'm going to really need to work at understanding or normalization and Entity Relationship Diagrams. I just need to continue reading and seeing examples of how these are put together. I also need to find some more tutorials on how to decipher the crows foot notation. Unfortunately the wikipedia entry is rather vague with a diagram that doesn't go into too much detail. I'm still not altogether sure what the notations mean and how to use them. I wonder if having a solid background in calculous, physics, or logic would help.

I think another problem with most of the tutorials is they each tend to stick with one example. Seeing how different types of repository data can be modeled would be helpful. For example lets have some basic examples of manuscript, photograph, and other document collections modeled and see what others have already done to model, relate, and normalize their data-streams. I'm sure with a little searching I'll be able to uncover some other examples but it seems that either institutions keep this information close to the chest or they haven't actually done this yet.

The good news is that as I work through these concepts (with a little more time than these past few days) I'll have a much better understanding of what the programers are having to tackle behind the scenes. If I've learned anything out of this, its that a working digital repository is far more complicated than just placing objects on a server and filling in metadata fields. You have to understand how items relate to each other and clear up ambiguity, redundancy, and non-essential information to improve the speed and storage capacity of your database.

1 comment:

CynW said...

I think perhaps the good news may be that although careful though will go into the process, one need not really 'reinvent' the wheel for every project; and that while understanding the foundation of the command line is useful, like many other areas, there is helpful software available.