Total Pageviews

Thursday, December 23, 2010

MSExchange Repl 2104 / LogCopy Failed Exchange 2007 on Windows 2008 / 2008 R2 with CCR

Are you struggling to solve multiple SCOM alerts related to "Replication Health Transaction Failures" when running Exchange 2007 on Windows 2008 / 2008 R2 server with Cluster Continuous Replication.

[MP] Microsoft.Exchange.2007 [DN] Domain Name [ADN] Server Name [AN] Replication Health transaction failures. [AD] Some of the Replication Health transactions failed. The initial event reported: Storage group copy health checks failed. The local server is part of a Windows failover cluster and it has the Mailbox server role installed, but no clustered mailbox server is configured. The local server is a Standby server for one or more storage groups with Standby Continuous Replication enabled. SGCopyFailed: Standby Continuous Replication for storage group 'XXXX\ZZZ' is in a 'Failed' state on server 'Server Name'. The error message is: Log file action LogCopy failed for storage group XXXX\ZZZ.

You might also notice below events logged on your Exchange Server

Log Name: Application
Source: MSExchangeRepl
Event ID: 2104
Task Category: Service
Level: Error
Keywords: Classic
User: N/A
Computer: MACHINE
Description: Log file action LogCopy failed for storage group XXXX\ZZZ.

Reason: CreateFile(\\Server\StorageGroupGUID$\LogFile.log) = 2

 The replication service is extremely aggressive in its attempts to copy log files. The replication service is always aware of the next log file in the series that requires copying to the passive node. As part of normal processes the replication service may query multiple times for the presence of this file and make copy attempts. These attempts may result in the replication service querying for a log file that is not fully available. Under Windows 2003 this was not necessarily an issue. Windows 2008 introduces a component into SMBv2 that may cause this to be a problem.

SMBv2 introduces status caching into the LanManWorkstation service. When an application requests information from a file share, the workstation service caches the response from the server hosting the share. Subsequent requests for the same information are returned from cache rather than re-contacting the server hosting the share. Eventually this cache will expire (in our case it expires by the time replication is failed / resumed <or> a switch between replication host names occur). The replication service has received feedback that the log file in question should not be available for copy, attempts to copy it, and receives an older return status that the file is not ready (even though the file does exist on the source at the time the attempt is made). In turn the replication service detects this as an error condition and takes action.

From a Windows 2008 / Windows 2008 R2 perspective this is by design.

To correct these errors on an Exchange 2007 / Windows 2008 <or> Exchange 2007 / Windows 2008 R2 implementation, the following registry keys should be set to a zero (0) value and the nodes rebooted:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Lanmanworkstation\Parameters
FileInfoCacheLifetime [DWORD]
FileNotFoundCacheLifetime [DWORD]
DirectoryCacheLifetime [DWORD]


If the DWORDs are not present they may need to be created. The recommended value is HEX / DEC 0.

The above solution has been reffered from and is documeneted in below blog
http://blogs.technet.com/b/timmcmic/archive/2010/07/11/msexchangerepl-2147-msexchangerepl-2104-msexchangerepl-2127-occurring-on-windows-2008-or-windows-2008-r2-with-exchange-2007-cluster-continuous-replication-ccr.aspx

The SMB2 is explained in detail here
http://technet.microsoft.com/en-us/library/ff686200(WS.10).aspx

This might not apply to you for now, but worth keeping in mind when designing an infrastructure with Exchnage 2007 running on Windows 2008 and using SCOM for monitoring.

After all, who would like to receive false alerts !!!

No comments:

Post a Comment