June 2010 Archives

Diagnosing Production Problems: First Law

June 11, 2010

Stephen's first law of diagnosing problems in production: try to replicate in test. (Assumptions: you have a test environment, you use it regularly, and it is reasonably close to production). Sometimes you just can't replicate the problem – for instance, it might be due to an oddity in a customer data file that you're not allowed to run outside of production. In those cases, see if you can use a proxy. For instance, try copying the file and masking the sensitive data, then running it in the test environment (of course, the masking process might cover up the error that is causing all the problems).

Production needs to stay clean, and as developers we need to keep our hands out of it as much as possible. This is particularly true in a highly secure environment with strong separation of duties, wherein you might have to drag a sys admin into the picture just to get to obscure log files, for instance. Replicate the issue, solve it, document it, and make sure everyone else in the company is able to share in the lessons learned.

About this Archive

This page is an archive of entries from June 2010 listed from newest to oldest.

May 2010 is the previous archive.

August 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.



OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.12