I’m not sure how I missed this one given that I do quite a bit of reading online about best practices. My intentions were noble. I wanted to give my SQL Server the perfect environment so that it could flourish and thrive. I wanted to allow the latest and greatest Microsoft OS (Windows Server 2008 R2) to shine on its pretty new C: drive. What I learned in my attempts at greatness is that I dropped my guard. I didn’t have the iron-clad check list that I thought I had. So now, it’s back to the drawing board to come up with my ideal checklist and installation guide for the Operating System and SQL Server 2008 (post date TBD). And if you are really curious about my fatal mistake that cost me a day and a half before I gave up trying to bring the server back up read on.
For the 3rd time, I’ve had a Dell R900 with SQL Server 2008 and Windows Server 2008 R2 start dropping iSCSI drives. When you believe you’ve properly laid out your separation of duties for SQL (E:\SQLData, F:\SQLLogs, T:\TempDB, and X:\SQLBackup) one would hope there wouldn’t be more problems. But for some reason, these ingredients mixed together have caused me quite the headache. I’m not giving up on Windows Server 2008 R2 just yet, as I’m trying to learn from my mistakes and kick this problem’s tail. There are two crucial mistakes I made this time which caused me more downtime and gave me more work to do in order to get back to an operational state. I listed four drive specifications above that I have used for everything but the OS, SQL Install files and Shared Features, and OS Paging File(s). I have had a practice of doing the following:
- The OS on the C: drive.
- The SQL Binaries on the D:\ drive (minus the shared resources that have to go on the C: drive).
- Remove the paging file from the C: Drive, and place a paging file 1.5 times the RAM on the D: drive.
Maybe some seasoned veteran is reading this and laughing at me, and I’m sure I deserve that after the blunder I made. Because I didn’t have a page file of any size on the C: drive, I was unable to analyze the .dmp file to find out what exactly was causing Windows to go into a reboot cycle. I had lost one of the iSCSI connections so SQL was dead because it kind of needs that Tempdb thing. So I panicked and rebooted, but this time the Logs drive didn’t come up. So I started having deja vu from my previous experience with Windows Server 2008 R2, iSCSI, and SQL Server 2008 only to remember something in my mind thinking that a patch might have caused this since I had been running fine for a few days. So I uninstall the patch that was put on earlier that day (slight possibility that it was SP1 but I don’t believe so), and it goes to reboot. Now we are in an endless cycle of reboots, and there’s no turning back. Oh, but let’s count the positives that came out of this. I now have a few future posts in the works (when I can find the time with two young children) ranging from the Microsoft Disaster Recovery Toolset, Operating System Installation Guide and Checklist, a SQL Server Installation Guide and Checklist to recovery options when you have hit a wall.
Let’s get back to the biggest lesson learned here. You really do need a paging file on your boot volume. Coming from the Windows side of things the best document I could find was a technet post by CC Hameed it suggests RAM + 1MB for a complete dump. For something specific to Windows Server 2008 R2 you can look at at these Memory Dump Options. You can also learn how to generate a dump file in Windows Server 2008.
From a SQL Server side, that’s probably not realistic. And you’ve got to look at what your physical resources are as well. I would point you to a great article by Buck Woody (blog|twitter) entitled “The Windows Page File and SQL Server”. I hope to write a follow-up post with an example of this process from one of my servers.
So the takeaway from today is not to end up with the inability to diagnose your problem by not having a paging file on your boot drive. How you determine to size that paging file is going to be up to you and your system. Use the resources I’ve listed and please include new ones that you find. Feel free to add any experiences that you have had that you can’t believe you made or gotchas that others can learn from.