Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
ITMgr
Starting Member
2 Posts |
Posted - 2010-02-20 : 01:22:29
|
Ok, so I have been working on this issue for half the day yesterday and 15 hours today. Let me explain where we have been and where we are at. I am the IT Mgr and I have a staff of 3, so I don't normally do the maintenance on the servers.We have a Windows 2003 Server with MS Sql 2005 SP2 on a HP ML350 with 3 72Gb 15K drives configured in a RAID 5.It all started when the drive in slot 1 turned amber. We called HP and they said they were sending over a drive overnight. Since this is a mission critical app, we took a drive out of an unused server and put it in slot 1. We didn't notice that is was a 72Gb 10K drive.We ran the array utility and it ran to 100%. When we rebooted the server, we noticed that it said it needed the array to be rebuilt. The server started locking up when we were logged in and the Sql server would not connect to the database. It showed the database in SUSPECT state. I ran the commands with allow data loss and turned it to emergency mode. It spit out some errors and I was never able to get it back online.I called HP again to get assistance with the array and they showed that the drive in slot 2 is showing errors, which is why it was still trying to rebuild the RAID array. They said we have to backup the server before we can work on the array issue.For some reason the tape backup stopped backing up the server in December, which I will be having a meeting with my system admin about after we solve this. So, we have no good backup.These are the things we have tried so far.1. Copy the mdf file using explorer to another server. It failed and would just freeze in the copying process.2. Run Acronis boot disk and try to backup the mdf file. It will backup the ldf file, but not the MDF file. It just freezes when trying to backup the MDF file.3. Tried to copy the MDF file in Safe Mode and it locks up the copy process at about 20%.4. Ran Kernel Recovery for DBF in Safe Mode and it Freezes during the process of determing the tables and other information.At this point, I am out of ideas. If I try to rebuild the RAID without the backup, it might not turn out good. Microsoft won't assist because they will only work on MS Sql 2005 SP3.I have talked to some data recovery services and they want anywhere from $6k to $14K to rebuild the RAID and that doesn't guarantee that the database will not be corrupt.I would appreciate any ideas that someone has, software that someone recommends, or service/company that I could reach out to for assistance. This really needs to be working for Monday morning so I have this weekend to get it solved.Thanks,Troy |
|
Kristen
Test
22859 Posts |
Posted - 2010-02-20 : 02:07:09
|
Well ... an honest description, thanks for that." I ran the commands with allow data loss "DBCC CHECKDB, or something else?Pretty much all bets are off after that though ..." the tape backup stopped backing up the server in December"Not good, but I know you know that. When you've got the stable-door closed again put in place a procedure to do a test restore (to a separate, temporary, database). No backup should be regarded as "safe" until that has happened - how often you do it is up to you - once a week minimum I would suggest for a mission critical app."At this point, I am out of ideas."Do you have any SQL Backup files on disk? Maybe they will copy off the RAID OK (they will most probably be smaller than the MDF file, and will not have been undergoing change after they were initially written, which may increase the chance)If your only backup process is to backup direct to tape then it will only be chance if you have a BAK file.(If you do backup direct to tape ask again when you have got this fixed, I am happy to give my opinion on why it is rarely a good approach)Good luck! |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-02-20 : 03:00:38
|
quote: Originally posted by ITMgrIt showed the database in SUSPECT state. I ran the commands with allow data loss and turned it to emergency mode. It spit out some errors and I was never able to get it back online.
If Emergency Mode repair failed (as you seem to indicate here), there is no fix. Emergency Mode repair is the absolute last resort, if it fails, there's nothing else that can be done.I would suggest one of the following:1) Find your latest backups, restore and accept the loss of a couple months of data2) While the DB is in Emergency mode, script out all the objects (some may fail), export all the data (some will almost certainly fail) and recreate the database, accepting the loss of data.Bottom line, this database is not coming back and there's no solution that doesn't include data loss somewhere.A mission critical database with no backups since Dec? Sounds like someone hasn't been doing his job properly. Let me guess, you have no DBA there?btw, if by tape backup you mean copying all the files on the server to tape, any backups that you have of the SQL files will likely be useless. File-system backup is not the way that a SQL Server should be backed up.--Gail ShawSQL Server MVP |
|
|
ITMgr
Starting Member
2 Posts |
Posted - 2010-02-22 : 18:19:14
|
Forgive me for this being so long, but it was quite the experience and it has a happy ending, no make that a miraculous ending.I wanted to give everyone an update since I think some of the information I learned over the last 2-4 days would help anyone else who is facing the problem I was facing of not being able to get to my data because of a corrupt database and no backups.First thanks for replying above, I appreciate it and that is why I posted to this forum because I was hoping to get responses from the more experienced DBA's.I was able to recover my data, but I didn't do it myself, I got in touch with a company named Bravesoft. They helped me into last night to recover. I think the biggest thing for non DBA's is the lack of knowledge that we have in this area, because it really is a unique specialty in IT because of being able to understand the system admin and the developer role together.So, I spent all weekend in my data room, I got about 20 minutes sleep on my office floor on Sunday morning, when I finally started to crash. It was only 20 minutes because my phone was ringing and vibrating from people trying to get statuses as well as people finally calling me back to help me.The .mdf for the database was never able to be pulled off of the server because of read write issues on the 2nd disk. I was able to get into safe mode and disable all of the SQL services which stopped the server from freezing up(I learned the RAID rebuild and SQL don't play nice together on corrupt drives).I had access to every other file on the server including other databases except for the corrupt database which had all of my main data. I went buy a USB drive and was able to copy everything onto it except the main.mdf file and any .ldf files because my USB drive was FAT32. I copied my .ldf files to my test server, including the .ldf file for my main database.I searched and found my last .back file for my main database which was done on December 11th. I then saw that my main.ldf file was 14 Gig in size which pretty much meant that it had not been backed up or deleted since the last time a .bak file was created. When I found these I didn't realize exactly what a life/job saving set of files these are.For 2 straight days I searched the internet for any solution or tool that could possibly help me to get my data back(I will list them below). I have never searched google so much in my life. I consulted with anybody I could find that knew about SQL 2005.On Sunday morning I got a call from Bravesoft after searching for Emergency Remote DBA people in Google. In the course of this event I must have talked to 10 different people/companies and when I told them my issue everyone's conversation always ended with "sorry, your screwed"Bravesoft pretty much told me in the first call that it sounds like they could help me and they wanted to get their team together to go over the options. I spoke to 2 DBA's and 1 Hardware Admin guy, they all just asked questions trying to figure out the best plan of action. I wasn't thrilled with their first suggestion of having to fix the hardware before they could realistically start trying to recover the database, but I knew it was the best answer in the back of my head.They spent about 3 hours connected in and trying to repair/recover/copy the database and they updated me about every hour on my status. About 1 hour before they called me to say it wasn't looking good, I got the call from my boss where he sounded very worried, upset, and concerned about where we stood in our recovery. I told him that I was getting worried too, but that I had a database company looking at the data and a hardware company looking at the hardware and that if they both came back with bad news, we were back to square one and we couldn't be online Monday morning. I told him that if we could get any of the things I had been working on to work, then the recovery would be quick, which was exactly the case.When Bravesoft called the last time, they started out with "it's not looking good", I then asked them if it was possible to recover the data from the .bck file and the .ldf file. I had been researching this and my hardware guy had told me it was possible.At 10:15 Sunday night, they started trying to rebuild off of the .bak and .ldf file. I can honestly say that for the first time since I was told that we had no backup of the database, that I was able to breath. At 12:35pm all of my data was restored to my test server and with ZERO data loss...let me repeat that "MY DATA WAS RESTORED TO MY TEST SERVER WITH ZERO DATA LOSS." We are talking a frickin miracle here, I had emailed Paul Randal and even he said that I was going to have data loss(very cool talking with someone that created a command in MS SQL, that shows you how much of a IT Geek I am).So, my moral is Don't give up until you have to. With the last call before the data restore with my boss I was close to calling my efforts a failure, but it took me all of the time and research to get to the point I was at. Keep chugging away at ideas, your biggest issue is time and money. I downloaded any trial software I could find on data recovery, database recovery, raid recovery, transaction log recovery, just so I could see what my options were for getting any of the data I could off of that drive.So, here are my tools that I researched and my findings.Kernel Recovery for mdf - I actually let this run all night and when I came in, my data was showing as recoverable and I could see the tables in their window. The downside is that even if you buy the full version for $399, you have to rerun the full version and not just enter a key to retrieve your data from the trial version.Raid Reconstructor - I wasn't able to run this because I kept trying other options also and they would freeze the system and I would have to reboot. Looks good though.LivePerson - I went to the MS Sql section of databases and looked for anyone who sounded like they had system admin experience with SQL. For $.50/minute this was money well spent to ask them questions that I didn't know about SQL. When you don't have any SQL resources, you feel very alone on an island, these people provided that bouncing board for my thoughts, ideas, and questions.Progent - They provide remote IT people and charge by the minute, its like the perfect combination of being a geek/prostitute. They weren't able to help me fix my problem, but it was good to get them to look at my issue and answer some questions I had.Apex SQL Log - This is what got me thinking about the transaction log as a solution. I wasn't able to run the software because it took so long and I was still in the testing phase of different ideas. But its awesome to think you can retrieve you data from only the transaction log.BRAVESOFT.com - These guys saved my @ss. I don't want to seem like I am going all commercial on these guys, but they were just awesome. I even asked at one point if they were available all of Sunday night to work until we figure out a solution and they replied with something similiar to "of course, we will do what it takes"So, sorry for the long post, but after spending so many hours surfing the net looking for solutions to my problems including, corrupt hard drive, corrupt database, lack of DBA Skills, and searching Google like an addict, my data is recovered, my system is up, and I am very grateful for everyone's assistance.TroyBTW...1st step after my data was restored? You guessed it, I backed up my databases, made multiple copies on different machines, and ran a tape backup of the server. |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-02-23 : 02:35:15
|
Glad to hear it worked out OK. The huge LDF full of all your transactions since last backup is indeed what it is designed to do (albeit not normally in that way!).Time to get the stable door properly bolted now , come back if you have any questions on that. |
|
|
David Singleton
Starting Member
35 Posts |
Posted - 2010-02-23 : 05:10:35
|
quote: Originally posted by ITMgr ...TroyBTW...1st step after my data was restored? You guessed it, I backed up my databases, made multiple copies on different machines, and ran a tape backup of the server.
Congratulations on recovering the data, but just one point.it was mentioned in an earlier post, but maybe you missed it.A backup is not a backup, till you have done a FULL RESTORE on a different machine and proven that the restored database is fully working. If you backup to tape, then you MUST have a different tape drive to restore to, the machine you restore to needs to be PHYSICALLY not VIRTUALLY a different box.I see all to often a customer that says "Our DB crashed, but no problems we have a backup" only to find that they don't really have a backup.An extreme case was a client that had meticulously made tape backups every night one day the server seriously died form a power surge that burnt everything. They got a new machine, with a new tape drive. When they tried to restore the DB, (from a tape they had tested the day before) the tape was blank. Turns out the tape drive was out of alignment and the tapes created could only be read on the old tape drive.In the end they got a company to pull the guts out of the tape drive and put it in a new box with new electronics, and recovered the tape. But it was an interesting lesson. Even if it's just a cheap desktop, get a machine dedicated to testing backups.David SingletonMicrosoft MVP Dynamics NAV |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-02-23 : 05:20:28
|
I have known of a tape drive with a stepper motor (to move the heads between the tracks) that had failed.It successfully backed up, all on track one , and verified ...... but only the last part of the data was there ...Trial Restore only way to be sure that the data on tape is "known-good".I think it is also worth keeping database BAK files online for a while, and copied to a.n.other machine (as soon as they are made).The chances are that 99% of the time you will want to recover from today/yesterday's backup. And it will be URGENT! LIKE NOW!!!. So getting the tape (which might be offsite, or tape drive busy recovering someone else's stuff) is second-place to being able to just restore from recent BAK files on-disk.If you want to restore old data to analyses looking for fraud ... then you need your tapes ... |
|
|
David Singleton
Starting Member
35 Posts |
Posted - 2010-02-23 : 05:50:39
|
quote: Originally posted by Kristen I have known of a tape drive with a stepper motor (to move the heads between the tracks) that had failed.It successfully backed up, all on track one , and verified ...... but only the last part of the data was there ...Trial Restore only way to be sure that the data on tape is "known-good".I think it is also worth keeping database BAK files online for a while, and copied to a.n.other machine (as soon as they are made).The chances are that 99% of the time you will want to recover from today/yesterday's backup. And it will be URGENT! LIKE NOW!!!. So getting the tape (which might be offsite, or tape drive busy recovering someone else's stuff) is second-place to being able to just restore from recent BAK files on-disk.If you want to restore old data to analyses looking for fraud ... then you need your tapes ...
Yes very good point. I have a lot of problems explaining to clients the difference between backups (which are used for recovery purposes) and Archives (which are used to go back and look at history).Both are important, both have a role, BUT both need to be managed differently.David SingletonMicrosoft MVP Dynamics NAV |
|
|
|
|
|
|
|