Understanding Journal wrapper
Journal
wrap errors occur if a sufficient number of changes that occur while FRS is
turned off in such a way that the last USN(Update sequence number counter)
change that FRS recorded during shutdown no longer exists in the USN journal
during startup. The risk is that changes to files and folders for FRS(File
replication service) replicated trees may have occurred while the service was
turned off, and no record of the change exists in the USN journal. To guard
against data inconsistency, FRS asserts into a journal wrap state. Let me simplify
the statement, FRS has an internal database that contains all the files and
folders it is replicating and each of these has a unique global ID (GUID). The database also contains a pointer to the
last NTFS disk operation (in the USN Journal/NTFS Journal) that the FRS service
processed.
If a user
changes a file or folder on a disk, the following happens:
- the operation is picked up by NTFS and an entry is made in the NTFS Journal
- FRS monitors the NTFS Journal for changes and notes that a change has been made to that file
- FRS keeps a record of the last NTFS Journal event that it processed and checks if it has processed it already
- If it hasn’t processed it already, it looks at whether it is a file that it should replicate
- If it should be replicated, the file goes into the normal process of staging, replicating, etc.
- FRS increments the entry in its database about the NTFS Journal event that it has processed so it won’t consider it again
If there
is a situation that the replication files has got few changes and the DC's
doesn't communicate with each other because replications partners was shutdown
for a long time, FRS was not running or because of a communication failure in
the network. When the communication is reestablished, FRS still knows the last
NTFS Journal entry that it processed and it will compare this with the current
NTFS Journal the next time it restarts.
The next
time the FRS service starts, it sees that it has missed NTFS operations on the
disk(It compares the its last processed NTFS operation and current NTFS journal
database). This is when FRS complains it has reached a Journal Wrap state, the
NTFS Journal log has wrapped around and it doesn’t know the current state of
things on the disk.
Identifying the replication errors
1)As
discussed above when there is a replication failure you will see the journal
wrapper errors in the event logs.
2)There
is one more method which you can confirm the replication is failed by using the
native windows component 'replmon'. In
order to check this follow these steps,(Remember that you do not have 'replmon' in Windows server 2008. However you
can install it by following the link
- Start->Run->type 'replmon' ENTER, This will open the replication monitor window.
- Now add the servers one by one to observe the status of replication. Once it is added right click on each server and select the option 'show group policy object status'.
- It will show you a windows that indicates the status of GPO replication for the particular server.
- Here you can find out the servers that has the good SysVol and corrupted as well. When the replication is success on a server it will have a blank 'Synch Status' Column, in addition the 'Version' and 'SysVol Version' columns will have identical numerical values.
- When there are issues in replication you may find the group policy object status as below.
In my case I have three GPOs that are failed to replicate to second
server(Serer-2). Namely that are
'Clients', 'serverlabs test' and 'new group policy object'. A good copy of sysvol will have a
blank 'Synch Status' Column, in
addition the 'Version and 'SysVol Version' columns will have identical
numerical values. Here on the corrupted policies you can see the Synch status
with a cross mark and its Version and SysVol versions are different or ERROR.
Restoring FRS replicas
Please
make sure that you have a full backup of all your domain controllers before
continuing with this process, this will help us to restore from backup if we do
not have a good copy of SysVol in any of the servers. Generally there are two
methods which you can consider to resolve the replication errors.
- Non-authoritative mode restore
- Authoritative mode restore
Let us
consider these one by one.
Non-authoritative mode
restore
This method is used when individual members of FRS replica sets that
are having difficulty like assertions in the FRS service, corruption of the
local jet database, journal wrap errors
or FRS replication failures. It is recommended to perform a
non-authoritative restore before you consider the authoritative restore. Note: Performing the below steps will
reinitiate the replication again from its replication partner. So make sure
that the replication partner of affected server has a good copy of SysVol.
Perform the below step on the server which is affected with replication issues.
- Click Start, and then click Run.
- In the Open box, type cmd and then press ENTER.
- In the Command box, type 'net stop ntfrs'.
- Click Start, and then click Run.
- In the Open box, type regedit and then press ENTER.
- Locate the directory 'HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup'.
- In the right pane, double-click 'BurFlags'.
- In the Edit DWORD Value dialog box, type D2 and then click OK.
- Quit Registry Editor, and then switch to the Command box, type net start ntfrs and close it.
The above steps performs the below actions.
- The value for BurFlags registry key returns to 0.
- An event 13565 is logged to signal that a nonauthoritative restore is started.
- The FRS database is rebuilt.
- The member performs an initial join of the replica set from an upstream partner or from the computer that is specified in the Replica Set Parent registry key if a parent has been specified for SYSVOL replica sets.
- The reinitialized computer runs a full replication of the affected replica sets when the relevant replication schedule begins.
- When the process is complete, an event 13516 is logged to signal that FRS is operational. If the event is not logged, there is a problem with the FRS configuration.
Authoritative FRS restore
This is recommended on if the above steps does not resolve the
issue. The following list of requirements must be met when before you perform
an authoritative FRS restore:
- The FRS service must be disabled on all downstream partners (direct and transitive) for the reinitialized replica sets before you restart the FRS service when the authoritative restore has been configured to occur. So on all affected server and stop FRS service by following the steps:
Open command prompt and type 'net
stop ntfrs' which will stop the services.
- Events 13553 and 13516 have been logged in the FRS event log. These events indicate that the membership to the replica set has been established on the computer that is configured for the authoritative restore.
- The computer that is configured for the authoritative restore is configured to be authoritative for all the data that you want to replicate to replica set members.
Perform these steps to start authoritative restore on the server
which has good copy of SysVol
- Click Start, and then click Run.
- In the Open box, type cmd and then press ENTER.
- In the Command box, type 'net stop ntfrs'.
- Click Start, and then click Run.
- In the Open box, type regedit and then press ENTER.
- Locate the directory 'HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup'.
- In the right pane, double-click 'BurFlags'.
- In the Edit DWORD Value dialog box, type D4 and then click OK.
- Quit Registry Editor, and then switch to the Command box, type net start ntfrs and close it.
- Start the FRS service on all other servers that where having bad copies of SysVol and it will start replication.
Performing the above will make changes as listed below:
- The value for the BurFlags registry key is set back to 0.
- An event 13566 is logged to signal that an authoritative restore is started.
- Files in the reinitialized FRS replicated directories remain unchanged and become authoritative on direct replication. Additionally, the files become indirect replication partners through transitive replication.
- The FRS database is rebuilt based on current file inventory.
- When the process is complete, an event 13516 is logged to signal that FRS is operational. If the event is not logged, there is a problem with the FRS configuration.
Now add the servers to replication monitor and make sure that there
is no replication issues.
Below article will help you with troubleshooting Journal wrap errors