Surrogate or natural key?


Author
Message
ChanKK
ChanKK
StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)
Group: Forum Members
Posts: 190, Visits: 1.3K
Hi,

Shall anyone share his experience on using surrogate key in database design? I know that surrogate key is not extensible, but I found that is is very troublesome to do debugging. Beside, we always required to join multiple tables in order get some "meaningful" result. It could cause performance issue as well. Also, maintenance by technical support also not ease.



However, I see most of the vendor using surrogate key.



Anyone can share?

Thank you
Michel Levy
Michel Levy
StrataFrame User (319 reputation)StrataFrame User (319 reputation)StrataFrame User (319 reputation)StrataFrame User (319 reputation)StrataFrame User (319 reputation)StrataFrame User (319 reputation)StrataFrame User (319 reputation)StrataFrame User (319 reputation)StrataFrame User (319 reputation)
Group: StrataFrame Users
Posts: 193, Visits: 9K
Hi,

I use surrogate keys in all tables (except a few static metadata tables) in SQL Server, and in VFP since whe have autoincrement field.

Performances are dramatically increased, especially if you build a clustered index on it, and add Integer column for FK in related tables :

  • data stays physically in that index, so you read lines of the child table when the JOIN is running, and so your query involves less I/O on disk.
  • Index size decreases, both on PK and FK.
  • you don't need any cascade on update (declarative or by trigger), as these keys are never updated!

I don't see any trouble in debugging, on the contrary it seems easier for me:
RI errors involve only PK and FK, and Business rules errors never include PK or FK. The same rule is valid on writing SP or UDF.

Edhy Rijo
E
StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)
Group: StrataFrame Users
Posts: 2.4K, Visits: 23K
Michel Levy (05/06/2009)
I don't see any trouble in debugging, on the contrary it seems easier for me:

RI errors involve only PK and FK, and Business rules errors never include PK or FK. The same rule is valid on writing SP or UDF.




Chankk,

I agree with Michael in all cases. In VFP I used to use GUID for my PK/FK since I did not trust its auto increment feature, but using surrogate at the end make my life easier with my designs. I also use xCase for data modeling which helps enforce the PK/FK. Now using SF/MS-SQL with the freedom provided by SF design by using Custom Field Properties you can overcome the need to join table for the purpose of display a description field, even though in some cases dealing with many records you may need to use a view or scalar function to grab the data faster.



As for making it easier for tech support to deal with the data, I would say that is not relevant, these days you will simply create a join query to debug the data record by record if needed and really there is no need to look at the PK/FK in this case. Natural keys, still useful in some cases, but I try to avoid them unless there is a MUST for me to use them.

Edhy Rijo

Greg McGuffey
Greg McGuffey
Strategic Support Team Member (2.7K reputation)
Group: Forum Members
Posts: 2K, Visits: 6.6K
Chan,



I tend to agree with Michael and Edhy. However, I found this interesting discussion at stackoverflow. It provides both sides, links to other articles and some good thoughts.



http://stackoverflow.com/questions/337503/whats-the-best-practice-for-primary-keys-in-tables



Something not mentioned here is that you should put either a unique index or a unique constraint on any natural keys.



In my own experience, I've never been real happy if I used a multi-column natural key...I've always ended up going back and refactoring the db (which isn't really all that fun) and the app to use a single column surrogate key.
Peter Jones
Peter Jones
StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)
Group: Forum Members
Posts: 386, Visits: 2.1K
Hi,



Another vote for surragate keys from me to. I always use GUID's - they simply makes life much easier in general and without them database replication becomes a real problem.



Cheers, Peter
ChanKK
ChanKK
StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)StrataFrame User (248 reputation)
Group: Forum Members
Posts: 190, Visits: 1.3K
Michel Levy (05/06/2009)


Performances are dramatically increased, especially if you build a clustered index on it, and add Integer column for FK in related tables:
  • data stays physically in that index, so you read lines of the child table when the JOIN is running, and so your query involves less I/O on disk.




But more table join is required, doesn't it affect the performance as well? I have a tables that contains about 15 FKs. In order get them all, doesn't it kill the performance?



Edhy Rijo (05/06/2009)


Now using SF/MS-SQL with the freedom provided by SF design by using Custom Field Properties you can overcome the need to join table for the purpose of display a description field, even though in some cases dealing with many records you may need to use a view or scalar function to grab the data faster.





Custom field properties only apply to the application. What about query for reporting tool ? We might use flexible reporting tool such as FoxFire! or StoneFieldQuery which doesn't support SF BO. Also, as mentioned as above, it required more JOIN just to get a code / description field. How is the performance?



Edhy Rijo (05/06/2009)


As for making it easier for tech support to deal with the data, I would say that is not relevant, these days you will simply create a join query to debug the data record by record if needed and really there is no need to look at the PK/FK in this case.




We have more than 300 tables, it definitely require much more extra effort just to maintain these views. Beside, view is usually slower than underlying table AFAIK, unless it is indexed view. Beside, as mentioned, some table contains > 10 FKs, and this table actually contains up to few million records. JOIN will kill.



Somemore, using surrogate key is very difficult for data import via back end. We got to do extra manipulation for end user data (usually excel), add new column just for GUID, and them use some macro or function to "link" child FK to master PK.



Please advice on the issues that are "fighting" in my mind as above.



Thank you
Edhy Rijo
E
StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)StrataFrame VIP (3.7K reputation)
Group: StrataFrame Users
Posts: 2.4K, Visits: 23K
ChanKK (05/06/2009)
Please advice on the issues that are "fighting" in my mind as above




Sorry ChanKK, I have nothing more to add at this time. You pretty much have your Pros/Cons about surrogate keys, so it is up to you and your application needs to decide what will work best for the application and the support team.

Edhy Rijo

Greg McGuffey
Greg McGuffey
Strategic Support Team Member (2.7K reputation)
Group: Forum Members
Posts: 2K, Visits: 6.6K
Chan,



One of the primary reasons for surrogate keys is performance. 15 FKs in a table isn't uncommon. A few million records isn't small, but it isn't huge either. 300 tables also isn't uncommon. The design of tables within a relational db is designed to allow relations. BigGrin I.e. FKs all over the place. One of the big reasons to use surrogate keys is to increase performance. It is faster to join one column between tables than to join multiple columns. It is faster to join an int column than any other datatype. Often using an int surrogate key (assuming you don't have a reason not use them, like replication) is a performance optimization.



If you having trouble with performance, then views and stored procedures are the ticket. Yes, they are slower than an indexed query on a single table, but they are potentially the fastest way to pull data from multiple tables. You likely need to look at our indexes and maybe even get good at checking out the query plan and providing hints. I've seen a many table view on a primary table with 30-50 million records return 1000s of results in seconds....using SQL Server 97 after some optimizing (the same query took several hours before optimization). Having a large database means managing indexes, optimizing views and sprocs.



Now, you might actually need to ask another question, which is how normalized should the database be. Whether you use surrogate or natural keys, the use of keys means joins, with int surrogate keys typically being the fastest (I believe SQL Server is optimized for them). However, if you denormalize the database a bit, then you won't have to do as many joins, independent of the type of keys you use. Of course as you denormalize the database, you also have the potential to increase issues if the values need to be updated (i.e. a category is renamed). Another potential way to denormalize is to include keys of ancestors beyond the parent. I.e. if you have a record in Table A with a parent in Table B, whose parent is in Table C, you could include the key of the grand-parent from Table C in for the record in Table A to avoid the extra join (hoped that made sense).



So, I'd say that the question of surrogate vs. natural keys is one that first assumes that your will have normalized (to some degree) data and that you'll be doing joins to get data. Natural keys are promoted as the way to go to keep the design more logical (the keys all means something to humans), while surrogate keys are used to reduce maintenance and to increase performance (they keys are easy for the machine to use).



The degree of normalization can have a big impact on how complex you're data is to work with and maintain. Too normalized and you will have more views and sprocs to deal with and the data schema will be harder to understand, you'll have to do more optimizations to keep performance good. Non-normalized data is a nightmare to maintain and keep accurate. Often slightly denormalized data is the way to go for the best of both worlds. How denormalized really depends on your app.



Finally, performance issues really have more to do with optimizations. Setting up the correct indexes, maintaining the indexes, managing views and sprocs. This can be significant work on large (data set wise), complex databases.



I hope that provided some more information for you or provided some new directions to investigate!
Peter Jones
Peter Jones
StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)StrataFrame User (450 reputation)
Group: Forum Members
Posts: 386, Visits: 2.1K
Hi Chan,

On balance surragate keys are likely to give better performance than complex natural PK's and are always easier to maintain and use.

Our main transaction table at one site has about 9 million rows and 17 Fk's. I've never wanted to return transaction details and data from all 17 FK's in one query but I will often extract data from 8 to 10 FK relationships for reporting purposes.

I've never seen any perfomance problems (SQL Server 2000 / W2K3 Standard / 4Gb Menory / Dual Xeon Processors (machine about 3 years old). I would expect a query that returns 10,000 records to run in a few seconds. However, that being said we always return data within a date range and the transaction table has a clustered index on date so getting at the transaction information itself with always by very quick. This is were your performance effort needs to go - getting at the transacion data quickly - SQL will always do a good job of associating the related FK info.

Cheers, Peter

Trent Taylor
Trent Taylor
StrataFrame Developer (8.5K reputation)StrataFrame Developer (8.5K reputation)StrataFrame Developer (8.5K reputation)StrataFrame Developer (8.5K reputation)StrataFrame Developer (8.5K reputation)StrataFrame Developer (8.5K reputation)StrataFrame Developer (8.5K reputation)StrataFrame Developer (8.5K reputation)StrataFrame Developer (8.5K reputation)
Group: StrataFrame Developers
Posts: 6.6K, Visits: 6.9K
Thanks for the posts, guys!  Good stuff out here!
GO

Merge Selected

Merge into selected topic...



Merge into merge target...



Merge into a specific topic ID...




Similar Topics

Reading This Topic

Login

Explore
Messages
Mentions
Search