Designing Databases with ActiveRecord Conventions

Based on Ruby on Rails Philosophy and ActiveRecord Default Naming Standards

Underlying Philosophy

The goal of IT systems is to provide support the business objectives. IT departments have been guilty of focusing too much on their individual function, such as database administration or Java development, digressing into engineering perfect and losing focus of the bigger IT objectives. New conventions are emerging in the software development industry that are breaking old monolithic standards and promoting a more holistic view of the software development functions, with the goal of efficiently delivering business value.

Databases serve a specific purpose in the enterprise: to store and retrieve data so that user-facing services, such as application and business intelligence tools, can meet the business needs efficiently. In other words, if a database design reduces the development time for building a specific application or report while meeting the business objectives, than it should be designed as such, even if it doesn’t follow old (and often unnecessary) conventions. Ruby on Rail’s ActiveRecord framework supports such a convention; it promotes a common-sense approach to building databases that accelerate application and report development while meeting long-term business objectives.

A note on Oracle limitations

Oracle identifiers (table/column names) are limited to 30 characters, which creates unwarranted tendency to abbreviate identifiers, and create artificial and often cryptic naming conventions. If you are concerned about Oracle’s column name length, you’ll be surprised how much you can fit into 30 characters by using good logical names. In many cases, a very long column name is a sign of normalization problem:

  • FIRST_INVOICE_CREATED_ON
  • CONTRACTOR_POINT_OF_CONTACT

What about Internationalization (I18N)?

The objective of these conventionss is not to eliminate the need for translation of labels. These naming conventions will help you standardize around names that are logical, readable and consistent. Depending on your development environment, it may even help your tool translate your database objects names into your application or report development namespace.

1. Table Names

Convention #1.1 – Naming tables

A database row represents one instance of an entity, while a table is a collection of the same entities. Use plural nouns, e.g. PEOPLE, CONTRACTS, etc for database tables - this follows the common database convention. Use underscore to separate words, e.g. LINE_ITEMS. Do not abbreviate. Do not start table names with a number – this causes errors in certain application platforms. Do not use prefixes like REF_ or LU_ for reference tables, entity names should represent the information being stored, and not try to represent a usage scenario for the information.

Alternatively, you can use the singular form of entity names for tables, thereby following the Object Oriented Programming naming style and avoiding the need for translation from singular to plural and vice versa. ActiveRecord defaults to plural table names, this can be turned off as a global environment setting.

Recommended naming:

  • PEOPLE
  • PERSON
  • LINE_ITEMS
  • ORDER_TYPES, TYPES_OF_ORDERS

Not recommended:

  • LINEITEMS
  • LN_ITEMS
  • LI
  • REF_ORDER_TYPE
  • ORDER_TP_CDS

Convention #1.2 – Naming association tables

For association tables (also known as intersection tables) that resolve many-to-many relationships, use the following naming convention: [name of first entity]_[name of second entity]. This may not always be possible, given Oracle’s limitation. Use it whenever possible, it makes it easier to identify the physical translation of a logical relationship.

Recommended naming:

  • PRODUCTS_CATEGORIES

Not recommended:

  • PROD_CATEGORY_INT
  • PRODUCT_CATEGORY_XREF

Convention #1.3 – Proper use of schemas

Do not put application prefixes on table names for business data – this tightly couples business data to applications. Business data is a function of the business, not the application. There is always a logical name for business entities. Use schemas for the intended purpose of creating namespaces, to logically group relates entities and avoid naming conflicts.

Do not use a master schema that holds all tables, it is an anti-pattern and defeats the purpose of schemas.

Recommended naming:

  • ACCOUNTING.ACCOUNTS
  • PHONE_BOOK.POINTS_OF_CONTACTS

Not recommended:

  • ACCOUNTING_ACCOUNT
  • ACT_ACCOUNTS
  • PB_POCS

There are other obvious advantages of using schemas for logical grouping. For examples, schema level security policies simplify access control, storage settings can be defined at the schema level. In general, schemas and all objects within schema can be administered using schema-level operations.

Use a separate schema for development and testing, this way your test database can be wiped out or stages for automated unit tests and for testing reports. Use “_DEV” and “_TEST” suffix with the schema name to differentiate. Production schema name should match the logical name; it should not have a suffix.

Primary Keys and Foreign Keys

Convention #2.1 – Using numeric primary keys

Always use auto incremented numeric primary key. For consistency and simplicity, I recommend ‘ID’. It is a noun as defined in English language as a synonym to identity . Also, some frameworks, such as ActiveRecord, use ID as the default name for primary key.

Use auto increment (sequences) as a general rule, unless you want the application tier to manage the keys. Even the most obvious business keys can change, for example, DUNS number for identifying partners. If and when the business key changes, all the relationships will break. Business keys are pieces of business data that should have unique constraints.

Using numeric IDs makes database indexes more efficient. Strings, for instance, take more space to store than numbers. This space is saved in index storage, the space saving is propagated to all child tables’ foreign keys.

Recommended naming:

  • ID (INT) AUTO INCREMENT

Not recommended:

  • CUSTOMER_ID
  • TYPE_CODE
  • EMAIL_ADDRESS
  • SSN
  • ISBN
  • DUNS

Convention #2.2 – Using single column keys

This convention is an extension of 2.1. Avoid concatenated keys at all costs. Concatenated keys complicate indexes and all DML statements. They take more space to store and index. The storage need and SQL complexity multiplies with every child table (the entire multi-column key has to be stored in every child table). Moreover, concatenated keys make object relational mapping complicated and error-prone. Follow this convention in conjunction with 2.1, use single integer primary keys – auto incremented whenever possible.

Recommended naming:

  • ID (INT) AUTO INCREMENT

Not recommended:

  • ORDER_DATE + CUSTOMER_ID
  • FIRST_NAME + LAST_4_SSN

Convention #2.3 – Naming foreign keys

This may sound like a violation of convention #2.2 and general principle of using logical names. The logic behind this convention is that all relationships can be named as one of the three types of associations:

  1. Entity A belongs to Entity B (child to parent)
  2. Entity A has zero, one or many Entity Bs (parent to child)
  3. Entity A has many and belongs to many Entity Bs (many to many)

If you use the above listed logical names for foreign key naming, the column name will not tell you which entity it is associating the entity with. This becomes a bigger problem when you have a child table with multiple parents, e.g. an order_items table could have a foreign key to an order and discount type.

ActiveRecord recommends using the following convention: [parent entity name]_ID. The advantage of using this convention is that you can tell the parent entity, direction and type of the association.

There is a downside to this naming convention; the column name does not tell you the nature of the association. For example, an employee can have two circular relationships, “reports to employee” and “mentored by employee.” In these kinds of scenarios, you can either extract the foreign key/keys into an association table, or use the following convention: [nature of association]_[name of parent entity]_ID.

I don’t feel strongly about this convention, feel free to adopt any one standard and use it consistently; keep in mind that several Object Relational Mapping can infer the relationship if you follow the ActiveRecord convention of [parent entity name]_ID.

Recommended naming:

  • ORDER_ID
  • REPORTS_TO_EMPLOYEE_ID

Not recommended:

  • FK_ORDER
  • ORDER
  • REPORTS_TO_EMPLOYEE

Column names

Column names should be the logical name of the attribute. A good rule to follow is that the column name should be obvious to the end users; there should be no need to translate the name before it is presented in a report of application. This simple principal will not only add clarity to your database schemas, it will alleviate the need for maintaining application labels (unless your application supports multiple languages). You will also alleviate the need for meta-data for ad hoc reporting tools.

The remaining conventions are an extension and elaboration of this general principal.

Convention #3.1 – Naming columns

Underscores serve as a friendly visual delimiter for words. Some databases allow spaces and mixed case in table names but most popular databases suppress this capability by default. For a standard approach that can be applied to all databases, use upper-case table names with underscores to separate words.

Frameworks like ActiveRecord can translate logical column names into object attributes and/or user interface labels using the following rules:

  1. First letter of each word will be capitalized
  2. Underscore is translated to a space and used as a delimiter for words

As an example, a column named “FIRST_NAME” will translate to “First Name” on the user interface.

Recommended naming:

  • FIRST_NAME

Not recommended:

  • FIRSTNAME
  • F_NAME
  • FIRST_NM

Convention #3.2 – Avoid table names in column names

The table name specifies the context for the columns. Prefixing column names with table name is redundant. If you find a column name to be ambiguous without the table name, you may want to review your data model for normalization.

Recommended naming:

  • CUSTOMERS.NAME (Table CUSTOMERS Column NAME)

Not recommended:

  • CUSTOMERS.CUSTOMER_NAME
    (Table CUSTOMERS Column CUSTOMER_NAME)

Convention #3.3 – Avoid data type names in column names

Column data type is specified by the column definition in the table. Using type identifiers in the column name are redundant. If you are referring to a code, use convention 2.3. For date/time columns, ActiveRecord recommends using the logical proposition form.

Recommended naming:

  • CREATED_ON (Date)
  • CREATED_AT (Date + time)
  • STATUS_ID

Not recommended:

  • DATE_CREATED
  • CREATE_DATE
  • STATUS_CD

Convention #3.4 – Avoid abbreviations in column names

Do not use abbreviations that are not universal accepted. A good standard is to avoid all abbreviations except those that are defined in one of the official English Language Dictionaries. For example, abbreviations like SSN and ID are acceptable but DT, NUM and CD are not.

Recommended naming:

  • ID (for Primary Key Only)
  • URL
  • FIRST_NAME
  • LAST_4_DIGITS_OF_SSN

Not recommended:

  • IDENT
  • FIRST_NM
  • LST_4_SSN
Jun 16 2006 02:33 pm | rails |

8 Responses to “Designing Databases with ActiveRecord Conventions”

  1. on 31 Aug 2006 at 3:21 pm Michael Vanner

    Great practical advice, even for veterans of DB2 and Oracle development and even moreso maintenance.

  2. on 31 Aug 2006 at 7:03 pm mandhro

    Thanks Michael, I am glad you found it to be useful.

  3. on 26 Sep 2006 at 12:49 pm Greg Underwood

    I’m wondering if it is acceptable to use concatenated keys for intersection tables?

  4. on 26 Sep 2006 at 10:32 pm mandhro

    Greg,

    There are two types of tables that people refer to as intersection tables. (1) join tables used to implement many-to-many relationships, and (2) logical intersection entities with actual data.

    In case of #1, you would only have two columns, one for each foreign key. In this case, there is no need to define a primary key. If you want to enforce uniqueness of the combination, you should use a unique constraint. I would also recommend creating the join index. However, there is no real disadvantage of declaring both foreign keys as part of a concatenated primary key. The latter would have the same effect in most databases, i.e. index plus unique constraint.

    In case of #2, you would have columns holding real business data. In that case, it is a first class entity. Therefore, it should have its own single numeric primary key. The 2 foreign keys should be treated as just that. Again, using join index and unique constraint as applicable.

    Hope this helps.

    - Zak

  5. on 19 Jul 2007 at 9:13 am Tom Nourse

    Zak,
    I would like to offer an additional thought to this discussion, related to performance. With a Primary Key constraint, you get NOT an index PLUS a unique constraint, you get a unique index plus NOT NULL. Note, also when you declare a column to be UNIQUE, either in the CREATE TABLE statement, or by applying a UNIQUE constraint post-creation, it is enforced through a UNIQUE INDEX. This can have serious performance impact.
    Let’s consider an example.
    In the traditional SALES schema with Customers, Orders, Line_Items, and Products, Line_Items is the intersection table that expresses the many-to-many relationship (case #2 from your discussion). If you go with a surrogate integer key as your PK and create single-column FK indexes on the Orders_ID and Products_ID columns (we ALL know that ALL FK columns should be indexed), you’ll require storage for 3 indexes PLUS the table, and there is no guarantee that a query involving EVEN both an Orders and Product reference in the predicate will utilize either or both indexes.
    I propose the following alternatives, one for Oracle particularly, the other for the rest:

    1) For intersection only tables (2 FK columns): Use a multi-part Primary Key with the column MOST OFTEN searched on as the leading column. Futhur, create the table as an Index Organized Table (IOT in Oracle). This yields two important performance aspects. First, with an IOT, the table will be stored in an index structure, not a heap-table. All records for a given customer will be stored in the same B-Tree node (disk block) and will require one unique-index search/read to derive all of the ROWIDs to gain access into the second table (again, PK based access usually via Nested Loops which will be very fast). I would also add a non-unique (obviously) index on the trailing column of the multi-part PK, this will allow fast index-based searches of the IOT to generate ROWIDs for the first table (again, very fast). This solution only involves 2 indexes, no table.

    2) For intersection tables with attribute data (Line_Items) : You may consider the IOT approach if the table is not very wide (small number of attributes) and you are using Oracle, OR use a multi-part PK, again, put the most oft queried column in the leading position, and a single, non-unique index on the trailing PK column to allow reverse traversing of the relationship. This solution involves a table and two indexes.

    Now neither of these may fit the Active Record convention specifically, but yield the best performance in an OLTP system which should be one of the goals of any database supported application.

    Tom

  6. on 20 Jul 2007 at 10:16 am Zak Mandhro

    Tom,
    Thanks for the input. Rails developers, per Tom’s comments, if you are concerned about extra storage cost of going an all single PK approach, it is possible to do composite PKs in ActiveRecord. See http://www.pjhyett.com/posts/208-composite-primary-keys-are-good

  7. on 03 Aug 2007 at 10:02 am Dan Herrup

    Great advice Zak, well thought out and well written. An additional thought would be when using a sequence to increment a numeric key, use one sequence application wide to increment the key. Doing so allows you to more easily move data from schema A (production schema) to schema B (test) without duplicating ID values and invalidating uniqueness. Also, it is far easier to maintain than 30 sequences for 30 tables. After all the ID is meaningless to everyone except the database.

    Cheers,

    Dan

  8. on 06 Aug 2007 at 1:36 pm Zak Mandhro

    Dan,

    Thanks for the comment. I agree. The actual value for ID doesn’t matter, it’s there to let the relational database do it’s internal data structure. In the case of Oracle, a single sequence should work in most cases, as long as it’s set to capacity. For MySQL, SQL Server (and DB2?) you don’t have to worry about managing sequences objects.

    - Zak

Leave a Reply