Extensive RTv2 Migration Postmortem

Extensive RTv2 Migration Postmortem

Well we've been up and running on our new server with our new club code and forums for 2 weeks now and I figured it was time to take stock of how things went and how things are going. We had planned the move to take place on one of the quietest weekends of the year, the weekend right before Christmas. This was chosen for obvious reasons but after going through the migration, I probably should of made sure I had all my shopping done first. As I've stated before this was really a huge step for us to take, as we were changing and moving the workings of an entire business as well as a very active user community. We had been working on the code, and design for many months, and the last 2 months were focused almost entirely on testing and bugfixing, yet it never seems to be enough.

We had tested our separate bits on local servers, then migrated to a test server that had the exact same hardware and software as our final production server. The forum moderators and the core RocketTheme team found lots of issues during this integration testing, but we were able to squish all those bugs found before our deadline. I just want to thank all those guys who helped in this process publicly, as they did an outstanding job and we simply would not be live today if it were not for them.

On the day of the move we switched the forums to read-only mode and disabled signups on the old RTv1 site, this enabled us to get a good snapshot of the database and filesystem without having to worry about syncing up data in a real-time fashion. We also temporarily shut down the mailserver and let the mailhop server we use spool up any incoming emails. Luckily we were running CPanel on our server as well as our new server, so it was a fairly simple, yet time-consuming job to back up our account on the old box. After the account was fully backed-up we scp'ed the 4GB archive over to the new server, and started the process of 'restoring' the account on the new box. Again this was quite painless process and CPanel performed the task with ease. We were able to then update the RoketTheme MX records to point to the new server, and bam we had mail working with no loss of accounts, emails or anything. At this point we had to tweak some Exim settings to ensure our mail server was functioning 100% correctly, but again this went fairly smoothly.

Now the account had been moved over, we updated our local /etc/hosts entries to convince our local computers that www.rockettheme.com was already on the new server, while everyone else in the rest of the world still pointed to the old server. We backed up the old site into a backup folder, and copied the entire web folder from our integration testing server onto the new www.rocketheme.com server. After that we restored the database to the new server also. At this point we had a fully functional RTv2, but without the existing users or forums. Before any migration could be done I had to run some custom scripts to convert old SMF bridged URLs into the new phpBB3 bridged URL format. The next step was to migrate the prior SMF forum to the new phpBB3 based forum. I had worked with the phpBB3 team on this migrator well over a year ago and during the final testing of RTv2 I had made some minor edits and fixes that were relevant to our fairly heavily modified instance of phpBB3. The migrator was configured to point to the old SMF files (for attachments and avatars etc) as well as the old SMF database (that had been migrated as part of the CPanel account restoration). After fiddling with removing some accounts that caused duplicate row errors, the forum migration ran for an hour or so but resulted in a good migration of users and posts.

I had written a custom migrator component that took the old RTv1 user and expiration data and ported those users over to the new RTv2 format. This script ran very slowly as it had to read the old data, do some error checking on the new site, then create then create the new user and all the appropriate subscription information plus the mandatory Joomla ACL information, one user a time. An unexpected hiccup occurred at this stage as the old users on RTv1 were auto generated based on the payment information and completely ignored the Joomla requirements for what constituted a valid and unique username, fullname, email etc. When creating these users on the new site, these issues raised their ugly heads and a few (not many) accounts failed to get migrated over. If you run across an account that doesn't appear to be in our new system, let us know and we'll migrate you by hand. After this we migrated the old transactions from a log format into our new database format. This part proved very problematic at matching up the transactions to the user. A few days after the migration I removed this linkage completely to retain the data for statistics, while removing the large amount of invalid references that had been created.

The process continues! By now we were into our second day of the migration, I had worked about 24 hours straight at the previous day, and after a refreshing 3 hours of sleep I was back at it. Another part of the migration was to move to a new Affiliate script. The one we had been using was so poorly written that it was choking under the load of so many affiliate users. The task of fixing the old script was so huge, we had decided to move to something simpler and more editable for our needs. The new script we found was very very basic, especially on the admin side and we had spent a good week adding features and functionality to suit our needs. The good thing about this new script is it's very lightweight, very fast, and very flexible. We had written another migrator to move all the existing affiliates over to the new script and that went very fast, and very painlessly. I setup a script to transform any old affiliate URLs into our new format, and even created aliases for old banners so affiliates would have as little hassle as possible with the new move.

By this point, we should have been able to hit the switch, but alas we were tripped up by some very strange and very annoying issues. First something related to the Joomla 1.5 cache, our use of certain pages sitting under SSL, and just the way sessions were expiring, caused the Joomla 1.5 mod_mainmenu module to randomly change your URLs and lose the base URL portion. When this occurred it would render whole sections of the site unreachable. After a few minutes the issue would resolve itself automagically, and it would work again for a few minutes, until it went belly up once more. We worked on this issue for hours, digging down through our modules into the belly of Joomla itself. We narrowed it down to a corruption in the Joomla cache and seems specific to SSL based URL's. We were forced to turn off all Joomla caching to get the menu to work properly. Obviously caching for a Joomla 1.5 site is key due to the number of objects used and the level of complexity that is the 1.5 framework. This lack of caching is undoubtedly causing us some performance impact, but luckly our servers are beefy enough to cover for the lack of caching at least until we find a solution or this bug gets fixed.

Some other miscellaneous issues we had during the migration involved mundane Unix things like hitting the ulimit max process ceiling, and some issues with forms not processing due to suhosin php hardening. These took a little while to isolate, but eventually we found them, and tweaked the settings on the server to resolve the issues. All in all, it took us about 36 hours to get to a point where we could toggle the DNS and start receiving traffic on the new site. I had modified the TTL on the DNS entries a few days before so the switch was almost instantaneous when finally made, and it was a real thrill to watch as the online user count rocketed up, and the forums started to show signs of activity again.

Things ran pretty smoothly with the exception of some account issues due to the independent migration of the forum and the Joomla databases. Those issues are all related to the fact that phpBB3 doesn't have native support for display name and username, but only has a display name type field. We had 'hacked' in a login_name field but where as on Joomla, the display name doesn't have to be unique, on the phpBB3 side, both the login_name and display name has to be so. This will be an ongoing issue for a very small minority of users as we have such a huge userbase, there are a good few users with the same display name. If you have issues or errors when logging in, please contact us. We can easily fix this by modifying your display name so that it's unique. Another issue that has occurred recently is related to user sessions and the fact that the xcache session management we are using sometimes either runs out of space or gets corrupted. When this happens, the session should be restarted and repopulated, but a bug in Joomla 1.5 is causing the registry part of the session to not get restored properly. We have instituted a temporary fix for this, but we'll probably me moving the sessions to the database rather than xcache at least until the issue gets fixed in a future Joomla version.

Since lauch day we've refactored a few areas of the RokClub code that basically powers the RTv2 site, and it is now running quite solidly. The forums have had a few minor fixes, as well as some UI updates to add some requested capabilities that we took for granted in SMF and missed dearly when we moved to phpBB3. Overall phpBB3 has performed fantastically and there are lots of little additional features that we can now enjoy that were just not available for SMF.

That about wraps it up. Sorry this post was so long, but hopefully some of the experiences we've had will help others that are trying to make a similar big move.