Sat, 04 Dec 10

Keeping Rails Migrations Rolling

Migrations allow Rails developers to easily incorporate schema changes as they develop and enhance their web application or service. Rails comes with a set of default rake tasks for migrations, which look for and load a set of time-stamp ordered files. By convention, these file names map to a set of class names, and on these, the migrator invokes the “up” or “down” class method as appropriate.

The migration is thus ruby code, and lets the developer be very flexible in adjusting the schema and adjusting existing database information based on evolving requirements. For non-trivial applications, it is common to have many hundreds of migrations, added over time by different developers.

Migrating Cleanly

As a general guideline, a new developer or a developer cloning the app on a new machine should always be able to migrate cleanly. Thus, either of the two below should work without a hitch.

rake db:drop; rake db:create; rake db:migrate;
rake db:reset

The latter has the same effect as the former, except that db:reset applies the pre-existing schema.rb while the former loads and runs all the migrations individually, creating a new schema.rb file.

In my experience there are some creeping code and environment problems which can break migrations, especially for those that come after you to work on the project. Here is a list of common mistakes or gotchas and how to avoid them to keep your migrations rolling:

0) Ignoring db/schema.rb from source control

In many projects, db/schema.rb is kept out of the source repository. The rationale is that it can always be recreated by running migrations. The perceived danger is if someone down-migrates for testing, and then checks in the stale schema.rb. However, this is less of a worry, since you are trusting your developers to have better sense and run the latest unit tests before committing. (You do have tests, right?!)

The advantage of having a schema.rb is that a new developer can quickly view the database schema, and just run db:reset to get started.

1) Assumptions about the DB and existing columns

This is most likely in the case of projects that deal with a pre-existing database, and migrations start from the point where Rails was used to enhance the product with new features or a semi-independent sub-system. This bifurcates schema knowledge needlessly.

When dealing with legacy databases, you should take the time to use rake db:schema:dump or some manual process to create a 00001_initial_db_schema.rb. You will thank yourself you took an afternoon out to do it.

You should also use rescues judiciously to deal with legacy databases, especially if there are multiple copies which may have drifted from each other (e.g. no indexes on staging/test legacy server) over time.

2) Migrations used to populate data.

Yes, migrations can have any Ruby code, and it is possible to include some nifty programmatic ways to pre-populate admin users, basic settings and other fun stuff.

But migrations is not the place for it. Instead seed data using db/seed.rb and run rake db:seed. Consider gems such as populator, or create rake tasks for common or periodically imported data. Some apps might have an admin interface which can be used for introducing necessary data.

3) Using Model functionality in migrations

Models are most commonly used in migrations to “fix-up” data when modifying column semantics or changing table relationships. This is necessary, and allowed. In fact, Rails even provides the #reset_column_information method on ActiveRecord models to reflect the latest migration.

Models can introduce circular or irreconcilable dependencies amongst your migrations, especially in conjunction with other mistakes.

Let’s say you add a column to the database in a later migration. You change the model with a corresponding validation on that field. You might still be able to run all migrations, unless you are creating dummy model objects. Or if you are doing fixups which involve saves or updates. Or more commonly, if you renamed fields, and therefore the corresponding validations on the renamed fields don’t work on migrations preceding the renaming.

You mitigate this either by using SQL statements with “execute” in the case of data fixups, or by using stripped down “mock-models” as nested classes inside your migration.

4) Using ActiveRecord models in environment.rb, or in initializers

It might be tempting to use your nifty model (e.g. Setting, Tag) or some such as part of environment, configurtation or initialization. Terrible mistake, and usually indicative of teams which never run tests, continuous integration or other deployment mechanisms. Why? Because the Rails environment is initialized not just by the app server, but also by rake tasks, including migrations. Any model used in initializers means that migrations on an empty database can never run.

5) Renaming database tables

If you have to rename database tables, you might want to ensure that you are not using any models (with the old name) in your earlier migrations. If you are, and have to rename, consider 3) above. Or you might want to consolidate earlier migrations and do some cleanups.

6) Inserting migrations or adjusting order

Sometimes, you need to insert a migration other than at the end. Other times, you may want to ensure certain migrations are always run last. For older Rails projects, you might have migration files with simple numerical order, e.g. 022_migration_name.rb. Newer migrations would have a time-stamp.

Be consistent with migration numerals, especially the number of digits (e.g. 0001_early_fixup, or 9999_must_be_last_migration). If using timestamps, create a timestamp pattern that stands out and is used consistently to indicate a fix-up. (e.g. ending in 99 or 77)

7) Dividing migrations into multiple directories.

Ideally, all migrations should be in the db/migrate directory. A well-written plugin or gem would have generate scripts for including its required schema into the application. If your app was componentized and those modules developed in parallel, consider re-syncing schema by adding and checkpointing new migrations.

8) Short-circuiting old or unused migrations

Let’s say a particular migration is no longer necessary or required. Take the time to delete it, including other dependent migrations. Don’t just “return” or - even worse - raise “Error” from migrations you don’t want. This introduces cognitive and processing overload for a new developer, while possibly hiding bugs.

By taking these steps and avoiding the mistakes above, you can enjoy all the benefits of Rails migrations and make life easier for you, your team and future contributors.

Sat, 04 Dec 10