Designing Databases with ActiveRecord Conventions
Based on Ruby on Rails Philosophy and ActiveRecord Default Naming Standards
Underlying Philosophy
The goal of IT systems is to provide support the business objectives. IT departments have been guilty of focusing too much on their individual function, such as database administration or Java development, digressing into engineering perfect and losing focus of the bigger IT objectives. New conventions are emerging in the software development industry that are breaking old monolithic standards and promoting a more holistic view of the software development functions, with the goal of efficiently delivering business value.
Databases serve a specific purpose in the enterprise: to store and retrieve data so that user-facing services, such as application and business intelligence tools, can meet the business needs efficiently. In other words, if a database design reduces the development time for building a specific application or report while meeting the business objectives, than it should be designed as such, even if it doesn’t follow old (and often unnecessary) conventions. Ruby on Rail’s ActiveRecord framework supports such a convention; it promotes a common-sense approach to building databases that accelerate application and report development while meeting long-term business objectives.
A note on Oracle limitations
Oracle identifiers (table/column names) are limited to 30 characters, which creates unwarranted tendency to abbreviate identifiers, and create artificial and often cryptic naming conventions. If you are concerned about Oracle’s column name length, you’ll be surprised how much you can fit into 30 characters by using good logical names. In many cases, a very long column name is a sign of normalization problem:
- FIRST_INVOICE_CREATED_ON
- CONTRACTOR_POINT_OF_CONTACT
What about Internationalization (I18N)?
The objective of these conventionss is not to eliminate the need for translation of labels. These naming conventions will help you standardize around names that are logical, readable and consistent. Depending on your development environment, it may even help your tool translate your database objects names into your application or report development namespace.
1. Table Names
Convention #1.1 – Naming tables
A database row represents one instance of an entity, while a table is a collection of the same entities. Use plural nouns, e.g. PEOPLE, CONTRACTS, etc for database tables – this follows the common database convention. Use underscore to separate words, e.g. LINE_ITEMS. Do not abbreviate. Do not start table names with a number – this causes errors in certain application platforms. Do not use prefixes like REF_ or LU_ for reference tables, entity names should represent the information being stored, and not try to represent a usage scenario for the information.
Alternatively, you can use the singular form of entity names for tables, thereby following the Object Oriented Programming naming style and avoiding the need for translation from singular to plural and vice versa. ActiveRecord defaults to plural table names, this can be turned off as a global environment setting.
Recommended naming:
- PEOPLE
- PERSON
- LINE_ITEMS
- ORDER_TYPES, TYPES_OF_ORDERS
Not recommended:
- LINEITEMS
- LN_ITEMS
- LI
- REF_ORDER_TYPE
- ORDER_TP_CDS
Convention #1.2 – Naming association tables
For association tables (also known as intersection tables) that resolve many-to-many relationships, use the following naming convention: [name of first entity]_[name of second entity]. This may not always be possible, given Oracle’s limitation. Use it whenever possible, it makes it easier to identify the physical translation of a logical relationship.
Recommended naming:
- PRODUCTS_CATEGORIES
Not recommended:
- PROD_CATEGORY_INT
- PRODUCT_CATEGORY_XREF
Convention #1.3 – Proper use of schemas
Do not put application prefixes on table names for business data – this tightly couples business data to applications. Business data is a function of the business, not the application. There is always a logical name for business entities. Use schemas for the intended purpose of creating namespaces, to logically group relates entities and avoid naming conflicts.
Do not use a master schema that holds all tables, it is an anti-pattern and defeats the purpose of schemas.
Recommended naming:
- ACCOUNTING.ACCOUNTS
- PHONE_BOOK.POINTS_OF_CONTACTS
Not recommended:
- ACCOUNTING_ACCOUNT
- ACT_ACCOUNTS
- PB_POCS
There are other obvious advantages of using schemas for logical grouping. For examples, schema level security policies simplify access control, storage settings can be defined at the schema level. In general, schemas and all objects within schema can be administered using schema-level operations.
Use a separate schema for development and testing, this way your test database can be wiped out or stages for automated unit tests and for testing reports. Use “_DEV” and “_TEST” suffix with the schema name to differentiate. Production schema name should match the logical name; it should not have a suffix.
Primary Keys and Foreign Keys
Convention #2.1 – Using numeric primary keys
Always use auto incremented numeric primary key. For consistency and simplicity, I recommend ‘ID’. It is a noun as defined in English language as a synonym to identity . Also, some frameworks, such as ActiveRecord, use ID as the default name for primary key.
Use auto increment (sequences) as a general rule, unless you want the application tier to manage the keys. Even the most obvious business keys can change, for example, DUNS number for identifying partners. If and when the business key changes, all the relationships will break. Business keys are pieces of business data that should have unique constraints.
Using numeric IDs makes database indexes more efficient. Strings, for instance, take more space to store than numbers. This space is saved in index storage, the space saving is propagated to all child tables’ foreign keys.
Recommended naming:
- ID (INT) AUTO INCREMENT
Not recommended:
- CUSTOMER_ID
- TYPE_CODE
- EMAIL_ADDRESS
- SSN
- ISBN
- DUNS
Convention #2.2 – Using single column keys
This convention is an extension of 2.1. Avoid concatenated keys at all costs. Concatenated keys complicate indexes and all DML statements. They take more space to store and index. The storage need and SQL complexity multiplies with every child table (the entire multi-column key has to be stored in every child table). Moreover, concatenated keys make object relational mapping complicated and error-prone. Follow this convention in conjunction with 2.1, use single integer primary keys – auto incremented whenever possible.
Recommended naming:
- ID (INT) AUTO INCREMENT
Not recommended:
- ORDER_DATE + CUSTOMER_ID
- FIRST_NAME + LAST_4_SSN
Convention #2.3 – Naming foreign keys
This may sound like a violation of convention #2.2 and general principle of using logical names. The logic behind this convention is that all relationships can be named as one of the three types of associations:
- Entity A belongs to Entity B (child to parent)
- Entity A has zero, one or many Entity Bs (parent to child)
- Entity A has many and belongs to many Entity Bs (many to many)
If you use the above listed logical names for foreign key naming, the column name will not tell you which entity it is associating the entity with. This becomes a bigger problem when you have a child table with multiple parents, e.g. an order_items table could have a foreign key to an order and discount type.
ActiveRecord recommends using the following convention: [parent entity name]_ID. The advantage of using this convention is that you can tell the parent entity, direction and type of the association.
There is a downside to this naming convention; the column name does not tell you the nature of the association. For example, an employee can have two circular relationships, “reports to employee” and “mentored by employee.” In these kinds of scenarios, you can either extract the foreign key/keys into an association table, or use the following convention: [nature of association]_[name of parent entity]_ID.
I don’t feel strongly about this convention, feel free to adopt any one standard and use it consistently; keep in mind that several Object Relational Mapping can infer the relationship if you follow the ActiveRecord convention of [parent entity name]_ID.
Recommended naming:
- ORDER_ID
- REPORTS_TO_EMPLOYEE_ID
Not recommended:
- FK_ORDER
- ORDER
- REPORTS_TO_EMPLOYEE
Column names
Column names should be the logical name of the attribute. A good rule to follow is that the column name should be obvious to the end users; there should be no need to translate the name before it is presented in a report of application. This simple principal will not only add clarity to your database schemas, it will alleviate the need for maintaining application labels (unless your application supports multiple languages). You will also alleviate the need for meta-data for ad hoc reporting tools.
The remaining conventions are an extension and elaboration of this general principal.
Convention #3.1 – Naming columns
Underscores serve as a friendly visual delimiter for words. Some databases allow spaces and mixed case in table names but most popular databases suppress this capability by default. For a standard approach that can be applied to all databases, use upper-case table names with underscores to separate words.
Frameworks like ActiveRecord can translate logical column names into object attributes and/or user interface labels using the following rules:
- First letter of each word will be capitalized
- Underscore is translated to a space and used as a delimiter for words
As an example, a column named “FIRST_NAME” will translate to “First Name” on the user interface.
Recommended naming:
- FIRST_NAME
Not recommended:
- FIRSTNAME
- F_NAME
- FIRST_NM
Convention #3.2 – Avoid table names in column names
The table name specifies the context for the columns. Prefixing column names with table name is redundant. If you find a column name to be ambiguous without the table name, you may want to review your data model for normalization.
Recommended naming:
- CUSTOMERS.NAME (Table CUSTOMERS Column NAME)
Not recommended:
- CUSTOMERS.CUSTOMER_NAME
(Table CUSTOMERS Column CUSTOMER_NAME)
Convention #3.3 – Avoid data type names in column names
Column data type is specified by the column definition in the table. Using type identifiers in the column name are redundant. If you are referring to a code, use convention 2.3. For date/time columns, ActiveRecord recommends using the logical proposition form.
Recommended naming:
- CREATED_ON (Date)
- CREATED_AT (Date + time)
- STATUS_ID
Not recommended:
- DATE_CREATED
- CREATE_DATE
- STATUS_CD
Convention #3.4 – Avoid abbreviations in column names
Do not use abbreviations that are not universal accepted. A good standard is to avoid all abbreviations except those that are defined in one of the official English Language Dictionaries. For example, abbreviations like SSN and ID are acceptable but DT, NUM and CD are not.
Recommended naming:
- ID (for Primary Key Only)
- URL
- FIRST_NAME
- LAST_4_DIGITS_OF_SSN
Not recommended:
- IDENT
- FIRST_NM
- LST_4_SSN
Filed by Zak Mandhro at June 16th, 2006 under rails
Great practical advice, even for veterans of DB2 and Oracle development and even moreso maintenance.
Comment by Michael Vanner — August 31, 2006 @ 3:21 pm