8 users responded in this post

Subscribe to this post comment rss or trackback url
User Gravatar
Michael Vanner said in August 31st, 2006 at 3:21 pm

Great practical advice, even for veterans of DB2 and Oracle development and even moreso maintenance.

User Gravatar
mandhro said in August 31st, 2006 at 7:03 pm

Thanks Michael, I am glad you found it to be useful.

User Gravatar
Greg Underwood said in September 26th, 2006 at 12:49 pm

I’m wondering if it is acceptable to use concatenated keys for intersection tables?

User Gravatar
mandhro said in September 26th, 2006 at 10:32 pm

Greg,

There are two types of tables that people refer to as intersection tables. (1) join tables used to implement many-to-many relationships, and (2) logical intersection entities with actual data.

In case of #1, you would only have two columns, one for each foreign key. In this case, there is no need to define a primary key. If you want to enforce uniqueness of the combination, you should use a unique constraint. I would also recommend creating the join index. However, there is no real disadvantage of declaring both foreign keys as part of a concatenated primary key. The latter would have the same effect in most databases, i.e. index plus unique constraint.

In case of #2, you would have columns holding real business data. In that case, it is a first class entity. Therefore, it should have its own single numeric primary key. The 2 foreign keys should be treated as just that. Again, using join index and unique constraint as applicable.

Hope this helps.

- Zak

User Gravatar
Tom Nourse said in July 19th, 2007 at 9:13 am

Zak,
I would like to offer an additional thought to this discussion, related to performance. With a Primary Key constraint, you get NOT an index PLUS a unique constraint, you get a unique index plus NOT NULL. Note, also when you declare a column to be UNIQUE, either in the CREATE TABLE statement, or by applying a UNIQUE constraint post-creation, it is enforced through a UNIQUE INDEX. This can have serious performance impact.
Let’s consider an example.
In the traditional SALES schema with Customers, Orders, Line_Items, and Products, Line_Items is the intersection table that expresses the many-to-many relationship (case #2 from your discussion). If you go with a surrogate integer key as your PK and create single-column FK indexes on the Orders_ID and Products_ID columns (we ALL know that ALL FK columns should be indexed), you’ll require storage for 3 indexes PLUS the table, and there is no guarantee that a query involving EVEN both an Orders and Product reference in the predicate will utilize either or both indexes.
I propose the following alternatives, one for Oracle particularly, the other for the rest:

1) For intersection only tables (2 FK columns): Use a multi-part Primary Key with the column MOST OFTEN searched on as the leading column. Futhur, create the table as an Index Organized Table (IOT in Oracle). This yields two important performance aspects. First, with an IOT, the table will be stored in an index structure, not a heap-table. All records for a given customer will be stored in the same B-Tree node (disk block) and will require one unique-index search/read to derive all of the ROWIDs to gain access into the second table (again, PK based access usually via Nested Loops which will be very fast). I would also add a non-unique (obviously) index on the trailing column of the multi-part PK, this will allow fast index-based searches of the IOT to generate ROWIDs for the first table (again, very fast). This solution only involves 2 indexes, no table.

2) For intersection tables with attribute data (Line_Items) : You may consider the IOT approach if the table is not very wide (small number of attributes) and you are using Oracle, OR use a multi-part PK, again, put the most oft queried column in the leading position, and a single, non-unique index on the trailing PK column to allow reverse traversing of the relationship. This solution involves a table and two indexes.

Now neither of these may fit the Active Record convention specifically, but yield the best performance in an OLTP system which should be one of the goals of any database supported application.

Tom

User Gravatar
Zak Mandhro said in July 20th, 2007 at 10:16 am

Tom,
Thanks for the input. Rails developers, per Tom’s comments, if you are concerned about extra storage cost of going an all single PK approach, it is possible to do composite PKs in ActiveRecord. See http://www.pjhyett.com/posts/208-composite-primary-keys-are-good

User Gravatar
Dan Herrup said in August 3rd, 2007 at 10:02 am

Great advice Zak, well thought out and well written. An additional thought would be when using a sequence to increment a numeric key, use one sequence application wide to increment the key. Doing so allows you to more easily move data from schema A (production schema) to schema B (test) without duplicating ID values and invalidating uniqueness. Also, it is far easier to maintain than 30 sequences for 30 tables. After all the ID is meaningless to everyone except the database.

Cheers,

Dan

User Gravatar
Zak Mandhro said in August 6th, 2007 at 1:36 pm

Dan,

Thanks for the comment. I agree. The actual value for ID doesn’t matter, it’s there to let the relational database do it’s internal data structure. In the case of Oracle, a single sequence should work in most cases, as long as it’s set to capacity. For MySQL, SQL Server (and DB2?) you don’t have to worry about managing sequences objects.

- Zak

Leave A Reply

 Username (Required)

 Email Address (Remains Private)

 Website (Optional)