Surrogate or natural key?


Author
Message
ChanKK
ChanKK
Advanced StrataFrame User (622 reputation)Advanced StrataFrame User (622 reputation)Advanced StrataFrame User (622 reputation)Advanced StrataFrame User (622 reputation)Advanced StrataFrame User (622 reputation)Advanced StrataFrame User (622 reputation)Advanced StrataFrame User (622 reputation)Advanced StrataFrame User (622 reputation)Advanced StrataFrame User (622 reputation)
Group: Forum Members
Posts: 190, Visits: 1.3K
Hi,

Shall anyone share his experience on using surrogate key in database design? I know that surrogate key is not extensible, but I found that is is very troublesome to do debugging. Beside, we always required to join multiple tables in order get some "meaningful" result. It could cause performance issue as well. Also, maintenance by technical support also not ease.



However, I see most of the vendor using surrogate key.



Anyone can share?

Thank you
Reply
Greg McGuffey
Greg McGuffey
Strategic Support Team Member (4.8K reputation)
Group: Forum Members
Posts: 2K, Visits: 6.6K
Chan,



One of the primary reasons for surrogate keys is performance. 15 FKs in a table isn't uncommon. A few million records isn't small, but it isn't huge either. 300 tables also isn't uncommon. The design of tables within a relational db is designed to allow relations. BigGrin I.e. FKs all over the place. One of the big reasons to use surrogate keys is to increase performance. It is faster to join one column between tables than to join multiple columns. It is faster to join an int column than any other datatype. Often using an int surrogate key (assuming you don't have a reason not use them, like replication) is a performance optimization.



If you having trouble with performance, then views and stored procedures are the ticket. Yes, they are slower than an indexed query on a single table, but they are potentially the fastest way to pull data from multiple tables. You likely need to look at our indexes and maybe even get good at checking out the query plan and providing hints. I've seen a many table view on a primary table with 30-50 million records return 1000s of results in seconds....using SQL Server 97 after some optimizing (the same query took several hours before optimization). Having a large database means managing indexes, optimizing views and sprocs.



Now, you might actually need to ask another question, which is how normalized should the database be. Whether you use surrogate or natural keys, the use of keys means joins, with int surrogate keys typically being the fastest (I believe SQL Server is optimized for them). However, if you denormalize the database a bit, then you won't have to do as many joins, independent of the type of keys you use. Of course as you denormalize the database, you also have the potential to increase issues if the values need to be updated (i.e. a category is renamed). Another potential way to denormalize is to include keys of ancestors beyond the parent. I.e. if you have a record in Table A with a parent in Table B, whose parent is in Table C, you could include the key of the grand-parent from Table C in for the record in Table A to avoid the extra join (hoped that made sense).



So, I'd say that the question of surrogate vs. natural keys is one that first assumes that your will have normalized (to some degree) data and that you'll be doing joins to get data. Natural keys are promoted as the way to go to keep the design more logical (the keys all means something to humans), while surrogate keys are used to reduce maintenance and to increase performance (they keys are easy for the machine to use).



The degree of normalization can have a big impact on how complex you're data is to work with and maintain. Too normalized and you will have more views and sprocs to deal with and the data schema will be harder to understand, you'll have to do more optimizations to keep performance good. Non-normalized data is a nightmare to maintain and keep accurate. Often slightly denormalized data is the way to go for the best of both worlds. How denormalized really depends on your app.



Finally, performance issues really have more to do with optimizations. Setting up the correct indexes, maintaining the indexes, managing views and sprocs. This can be significant work on large (data set wise), complex databases.



I hope that provided some more information for you or provided some new directions to investigate!
GO

Merge Selected

Merge into selected topic...



Merge into merge target...



Merge into a specific topic ID...




Threaded View
Threaded View
ChanKK - 16 Years Ago
Michel Levy - 16 Years Ago
Edhy Rijo - 16 Years Ago
Greg McGuffey - 16 Years Ago
ChanKK - 16 Years Ago
Edhy Rijo - 16 Years Ago
                     Chan,

One of the primary reasons for surrogate keys is...
Greg McGuffey - 16 Years Ago
Peter Jones - 16 Years Ago
Peter Jones - 16 Years Ago
Trent L. Taylor - 16 Years Ago
ChanKK - 16 Years Ago
Peter Jones - 16 Years Ago
Charles R Hankey - 16 Years Ago
Charles R Hankey - 16 Years Ago
Alex Luyando - 16 Years Ago
             Good comments, Charles! :)
Trent L. Taylor - 16 Years Ago
Charles R Hankey - 16 Years Ago
ChanKK - 16 Years Ago
Michel Levy - 16 Years Ago
ChanKK - 16 Years Ago
Michel Levy - 16 Years Ago
Charles R Hankey - 16 Years Ago

Similar Topics

Reading This Topic

Login

Explore
Messages
Mentions
Search