Friday 6 June 2014

Recovering from a complete domain-level Active Directory crash


Having an entire Active Directory domain fail is almost unthinkable. Almost. Ignore the odds and prepare for the worst, as Brien Posey shows you how to recover from this unlikely but devastating event.

What could possibly be worse than having a server on your network crash? How about havingevery server on your network become inaccessible because of a massive, domain-wide Active Directory failure? As horrifying as a total domain failure may sound, the solution could be just as scary. You may be able to salvage the domain, but, if you can’t, you may have to strip the domain bare, delete the domain, re-create the domain, and finally, add the servers back into the domain. I’ll show you how to recover from a domain-level Active Directory failure and avoid this worst-case scenario.

Back up now before it's too late
Recovering from a domain-level Active Directory failure is a major undertaking. Make absolutely certain that you have a good backup of your entire Active Directory before you try any of these techniques.
You should also make sure that you’ve tried every other possible diagnostic and repair technique before attempting to work through any of the techniques that I’m about to discuss. These techniques may solve your problem, but they stand an equal chance of destroying Active Directory completely. If you’re sure you’re ready, let’s roll the dice.


Trying to save the domain
As you may recall, various operations master roles are associated with some of the critical Active Directory functions. Some of these roles are forest-based and others are domain-based. A domain can’t function if the domain-level operation master roles aren’t being performed. Therefore, there’s a chance that, rather than having a corrupt Active Directory database, you may simply have a domain controller that’s failing to perform its designated roles.

The first step to recovering from such a problem is to note which servers are performing which roles. You need to make note of the forest-level roles as well as the domain-level roles. If the forest-level roles are being handled by a domain controller within the failing domain, you certainly don’t want to do anything to disrupt the servers that are performing the server-level roles. If the forest-level roles are handled by a domain controller within the failing domain and the rest of your domains seem to be functioning correctly, it’s a good indication that you may simply have a domain-level role failure rather than an entire domain failure.

When the entire domain fails, the other domains usually won’t be able to communicate with the failed domain and also cease to function. For detailed instructions on locating the various server roles, see “Understanding Windows 2000 domain controller operations master roles.”

Once you’ve determined which server holds the domain-level operation master roles, try transferring those roles to another domain controller. If no other domain controllers exist within the domain, try temporarily installing a copy of Windows 2000 Server onto a spare PC and making the spare server a domain controller. Then, attempt to transfer the roles to it.

To transfer a role, open Active Directory Users And Computers. Next, right-click on the domain name and select the Connect To Domain Controller command from the resulting context menu. Now, select the domain controller that you want to transfer the role to from the list of domain controllers and click OK. Finally, right-click on the domain name and select the Operations Master command from the context menu.

You’ll see the Operations Master properties sheet. As you go through each tab on the properties sheet, you’ll see the name of the server that presently holds the designated role and the name of the server that you selected to transfer the role to. To transfer the role, click the Change button. After you’ve transferred each role, click OK to close the properties sheet.

If the domain has serious-enough problems, you may not be able to transfer the roles. If this is the case, you’ll have to seize the roles. The stipulation for doing so is that in order to seize a role, you must take the server that presently holds the roles offline. Once it’s offline, you must never reattach the server to the network or you’ll risk destroying the entire Active Directory. Of course, you can always format the server and bring it back online with a new copy of Windows.

To seize a role, open the command prompt window and enter the following commands:
NTDSUTIL
ROLES
CONNECTIONS
CONNECT TO SERVER servername (where “servername” is the server that you’re going to move the role to)
QUIT


Now, enter one of the following commands to seize the role (the exact command depends on which role you want to seize):
SEIZE INFRASTRUCTURE MASTER
SEIZE RID MASTER
SEIZE PDC


Removing an orphaned domain
Normally, when you use the DCPROMO command to demote a domain controller, the DCPROMO utility will ask you if the domain controller that you’re demoting is the last domain controller in the domain. If it is, after the demotion is complete, the DCPROMO utility will automatically remove all of the metadata related to the domain from Active Directory.

To sum up, DCPROMO should erase the domain from Active Directory automatically after the last domain controller has been demoted. Unfortunately, things aren’t always this easy. It’s possible for Active Directory corruption or a catastrophic hardware failure to shut down domain controllers without you ever having the chance to demote them. If Active Directory thinks that domain controllers still exist within the domain, you won’t be able to delete the domain through the usual method. When Active Directory can no longer recognize or interact with a particular domain because of corruption within Active Directory, the domain is said to be “orphaned.”

To remove an orphaned domain, start NTDSUTIL by opening a command prompt, typingNTDSUTIL, and pressing [Enter]. When the NTDSUTIL prompt appears, type METADATA CLEANUP and press [Enter].

Next, connect to the server you’ll be cleaning up by typing CONNECTIONS and pressing [Enter]. Doing so will display the Server Connections prompt. Next, type CONNECT TO SERVER  servername, where servername is the name of the server you’re connecting to. After you enter this command, you should see two messages. One of these messages states that NTDSUTIL is binding itself to the specified server using the supplied credentials. The next message confirms the connection.

If you don’t receive these messages, try reentering your credentials using the SET CREDS command and then try the CONNECT TO SERVER command once again. If the command still doesn’t work, check your ability to communicate with the target server.

After you connect to the target server, type QUIT and press [Enter]. This will return you to the METADATA CLEANUP prompt. Next, type SELECT OPERATION TARGET and press [Enter]. This will take you to the SELECT OPERATION TARGET prompt. Next, enter the LIST DOMAINS command. The NTDSUTIL command will inform you of how many domains it is aware of in the forest and will display each domain and a corresponding number.

Locate the domain that the failing domain controller belongs to and make note of the number that corresponds to it. Now, type SELECT DOMAIN number and press [Enter]. In this command, you should replace the word number with the number that corresponds to the domain with which you want to work.

Once you’ve selected the domain, you’ll see a confirmation message that tells you what domain you’re attached to and also informs you that you aren’t presently connected to any specific site or server. This is fine because you’re working with a domain-level issue rather than a site-level or a server-level problem. When you get this confirmation message, double- and triple-check to make sure that you’ve selected the correct domain. Otherwise, you could destroy the wrong domain.

When you’re sure that you’re working with the correct domain, enter the QUIT command to return to the METADATA CLEANUP prompt. Finally, type the command REMOVE SELECTED DOMAIN, take a deep breath, and press [Enter]. You should see a confirmation message stating that the domain was deleted.

If you receive an error message, it could be because the domain has been deleted already through the natural process or by another administrator. It’s also possible that you could get an error message if Active Directory still contains computer accounts or domain controller accounts. Simply having computer or domain controller accounts present won’t always make the process fail, but I have seen it happen. If you receive such an error, use the technique that I’ll present in the next section to fix the problem.

You must now complete the process by entering the QUIT command twice, followed by the EXIT command. Remember that you’ve only deleted the reference to the domain from a single domain controller. You must wait for the next replication cycle to complete before the domain will be wiped off of other domain controllers. Once the replication cycle is complete, you’ll be free to begin rebuilding the domain.

Removing orphaned domains that have computer or domain controller accounts present
If my previous technique fails, it could be because computer or domain controller accounts still exist within Active Directory. Normally, you’d use a domain controller within the domain to remove the computer accounts and then do a DCPROMO to demote all of the domain controllers. However, if all of your domain controllers are failing, this may not be an option.

If this is the case, take all of the domain controllers in the failing domain offline. Next, use a domain controller in a different domain to open Active Directory Users And Computers. Remove all of the computer accounts from the failing domain. Remember that since the domain controllers are gone, the computers will never be able to attach to the old domain again once they’re removed from Active Directory. You’ll have to reconstruct the domain and then manually reattach the computers to the new version of the domain.

Once you’ve removed all of the computer accounts, you can use the technique in the Daily Drill Down “Picking up the pieces after a failed domain controller demotion” to remove all of the domain controller accounts from the domain. Once you’ve completed this step, you should have no trouble removing the orphaned domain by using the technique that I discussed earlier.

Conclusion
You may be familiar with the saying, “We had to destroy the village in order to save it,” the point of which applies to failed domains as well as doomed villages. Recovering from a complete domain failure can be a study in futile rescue efforts. You may be able to save the failed domain, but, if you can’t, you may have to completely destroy and then rebuild the domain that’s damaged. By going ahead and destroying the malfunctioning domain, you’re cutting out the corruption in Active Directory. Once you’ve removed the corruption, Active Directory will be healthy once again. You’ll then be free to rebuild the domain.

No comments:

Post a Comment