I have observed the similar situation couple of times in the past one year. Everything is normal from event viewer, server component state, service status, database copy health but the database activation will fail every time when I try to activate it from one of the nodes in 4 node DAG cluster. The databases are easy to do all possible operations in other 3 nodes, but one node. The issue occurs after a reboot, but will not resolve even if we wait for so longer or do another round of reboot.
Errors received are listed below,
An Active Manager operation failed. Error: The database action failed. Error: An error occurred while trying to validate the specified database copy for possible activation. Error:
SERVER:
A server-side administrative operation has failed. The Microsoft Exchange Replication service may not be running on server server.domain.local. Specific RPC error message: Error 0x6ba (The RPC server is unavailable) from cli_RpcsGetCopyStatusWithHealthState [Server: server.domain.local]
[Database: DB01, Server: PAM Server]
Failed to mount database " DB01". Error: An Active Manager operation failed. Error: The database action failed. Error: The Microsoft Exchange Replication service may not be running on server server.domain.local. Specific RPC error message: Error 0x6ba (The RPC server is unavailable) from cli_AmMountDatabaseDirect3 [Database: DB01, Server: PAM Server]
The error is very common, and you will be flooded with multiple resolutions however all will have longer procedure such as (not limited to),
- Verify the health of AD
- Look at the event log for AD replication errors
- Check the integrity of database
- Verify the cluster quorum status
- And so on
Solution Worked for me (both occasions):
Try to move the PAM to different healthy node available in the DAG (not to the activation failed server), and try to repeat the operations. When you only have 2 node DAG, you may have no choice other than trying other possible solutions as we mentioned earlier this post. However, it is worth checking if you have more than one IP assigned to the DAG members in the production network range.
Move-ClusterGroup "Cluster Group" -Node NODENAME
Verify the PAM role has moved post the above command,
Get-DatabaseAvailabilityGroup -Status -Identity DAGNAME | fl name,primaryActiveManager
Once you confirm that the PAM role is currently hold by another healthy node, try to mount/activate database on server where we observed issues. It should just work fine!
Additional Info: I found this when we use more than one IP Address assigned a single network card on Exchange for relay/HA purposes. If we reassign the IPs to other nodes of the cluster, a communication gap between PAM and the node can occur. But, there are no harm in moving the PAM before you take major steps to make the cluster communication restored.
-Praveen
Recently I observed this issue during the migration of Exchange 2010 to Exchange 2016. Following are the symptoms noticed not for all but for very few users (less than 1%).
And similar other symptoms which occurs due to inadequate autodiscover information.
As you may aware, the mailbox move disconnects the Exchange 2010 mailboxes and connects the Exchange 2016 mailbox during the “Completing” stage if migration/move process. Which means, following stale information are present in the infra,
The issue happens (I assume) because of the incomplete replication of information, and possibly due to the conflicts between connected and disconnected mailboxes. The solution in my case was fairly simple,
The above helped me to restore the services back for the user who impacted. Share your experience in the form of comments.
-Praveen
Since the introduction of DAG into Exchange Server architecture, the DR situations became more easy and comfortable to deal with. On this article I have tried to pen down a simple and straight forward process on how to deal with DR scenario.
My LAB Servers:
Exchange Servers (4 nos): EX16-01, EX16-02, EX16-03, EX16-Dr1
AD Servers: 2010adc1 & 2010adc2 (both work as Witness/Alternate Witness servers respectively)
DAG Name: DAG02 (Please note I have configured Exchange Server 2016 DAG without cluster object)
Witness Server Information:
WitnessServer : 2010adc1.lab.ed.com
WitnessDirectory : c:\DAG02
Alt Witness Server : 2010adc2.lab.ed.com
Alt Witness Dir : c:\DAG02
Scenario: 2 Servers in Primary Datacenter & 2 Servers in Secondary datacenter. I plan to bring down 2 servers EX16-02 & EX16-03 and restore the DAG with EX16-01 & EX16-dr1 server.
Assumptions:
Step1: Bring down Servers as planned (Or assume both the server are crashed due to some unforeseen issues at the Primary Datacenter, and we expect these servers to come up after restoring the services at Primary data center).
Shutdown EX16-02
Shutdown EX16-03
Check the cluster node status,
[PS]>Get-ClusterNode
Step2: Terminate the partially or failed Servers (in our case the server which are OFF)
[PS]>Stop-DatabaseAvailabilityGroup -Identity DAG02 -MailboxServer EX16-02 –ConfigurationOnly
[PS]>Stop-DatabaseAvailabilityGroup -Identity DAG02 -MailboxServer EX16-03 –ConfigurationOnly
You may verify the DAG statistics by executing the following command.
[PS]>Get-DatabaseAvailabilityGroup dag02 | fl Name,st*,prima*,oper*,*wit*
Step3: Stop the cluster service on all the nodes which are currently running and will be part of the cluster during the DR operations. It is important to terminate/switch off all failed partially failed servers in the DAG before we proceed to next step.
[PS]>Stop-Service clussvc
Note: Run the above command on all the running members, in our case EX16-01 & EX16-dr1.
Step4: Restore the DAG using Restore-DatabaseAvailabilityGroup command
[PS]>Restore-DatabaseAvailabilityGroup -Identity DAG02
Note: if you have not configured the DAG with alternate witness server prior to this operations, you may provide the alternate witness server values along with the Restore DAG command.
The restore DAG operation may take extended period of time depends on the server resources and network latencies. Please do patient until it returns the results.
Step5: Remove the restriction on DB activation
With the above command, we have finished activating the DR mailbox servers. Now we can remove the activation restriction on the databases, so that the database will start mounting automatically.
[PS]>Get-MailboxDatabaseCopyStatus | Resume-MailboxDatabaseCopy
Validate the status, you should find mailbox databases gets mounted automatically in a while.
You may also verify the cluster node status, by now only the nodes which are active will be listed.
Let’s assume both the failed servers are now back online. Please verify that the cluster services are in disabled & stopped status on both these servers. BTW, restoring services is more easy and quick than failover/switchover procedure.
Step1: Put the Mailbox servers in the restored primary datacenter into a started state.
[PS]>Start-DatabaseAvailabilitygroup -Identity DAG02 -mailboxServer EX16-02
[PS]>Start-DatabaseAvailabilitygroup -Identity DAG02 -mailboxServer EX16-03
The warnings can be ignored safely.
Step2: Run Set-DatabaseAvailabilityGroup to commit the members back to cluster.
[PS] >Set-DatabaseAvailabilityGroup -Identity DAG02
During the course of this operation, the default settings of clusters will be brought back such as witness server, directory and other site settings if any. After the Mailbox servers in the primary datacenter have been incorporated into the DAG, it will take some time to synchronize the database copies.
With that we finished fail back procedure, as the database will be mounted automatically to the primary datacenters as we have not changed its activation preferences.
You may also force to mount database by activating it through EMC or EAC.
Please share you experience in the form of comments, if any.
-Praveen
We often think how to Move Exchange Server 2016 to Different Active Directory Site. This kind of situation arises when you plan to extend the Messaging Infrastructure DR for Site resilience. Replicating small databases are possible by installing Exchange Server in DR site, however it may not be viable option replicate databases over narrow bandwidth if we plan to host bigger databases. From Exchange Server 2010, we can easily shift/move exchange server one active directory site to different AD Site by following simple steps outlined below.
1. Ensure that you have created and replicated required AD site and Subnets for DR/Secondary datacenter.
2. Ensure that the Exchange Server Infrastructure is healthy, and all the databases are replicated properly.
3. It is also recommended to check the AD and Exchange replication help using,
Test-ReplicationHealth from DR Exchange Server & nltest from DR Domain Controller
4. Verify the current AD site status of Exchange Server
5. Now we are good to initiate the server site migration. Before we shut down and physically move the server to different site, it is recommended to set the Exchange Server to Maintenance Mode.
.\StartDagServerMaintenance.ps1 -serverName EX16-DR1 -MoveComment SiteChange
And check the cluster status
Get-ClusterNode | ft name, dynamicweight, nodeweight, state -AutoSize
6. Shutdown and Shift the server to new site (DR/Secondary data center) and Bring up the server with new IP range configuration (subnet should be aligned to DR site).
7. Restart the Server once again to register required DNS entries and other connectivity.
8. Follow the below commands and ensure that the site movement of DAG member is success
Get-ClusterNode | ft name, dynamicweight, nodeweight, state –AutoSize
And if the status is Pause, then everything is fine. If the status is Down, that means some connectivity issues, please continue to troubleshoot.
Get-ExchangeServer | select Name,Site
Ensure that the site has changed, if you find the value that you expect then proceed to remove the server from maintenance mode.
.\StopDagServerMaintenance.ps1 -serverName EX16-DR1
The server status will change to Up from Paused post this step, and it will begin to replicate the delta and eventually the database copies will become healthy.
That’s it, you have completed the AD site change for Exchange DAG Member/Node.
Share you experience in the form of comments, we will deal whichever possible.
-Praveen
You may experience similar problems in version from Exchange 2010, here I have showcased an issue I faced in Exchange Server 2016 infrastructure.
You will see one or more of the following events/status on the server,
This issue can cause due to many factors, and most cases it can be mitigated by Solutions mentioned in the next section.
The reasons for this issues could be one or more of the following, but not limited to
How to mitigate this issue, follow as suggested below!
Generally, this issue can be mitigated in one of the two list options below,
Consider you do not have any database copies with healthy Index State or the only server that host the database.
Stop the following services,
Delete the folder similar to the following from database path/location.
Start the Services in order,
The Exchange Server will initiate the process to rebuild the whole index. Depends on the data/size the process may take extended period of time. Wait for the process to finish, and in 90% of the case it will fix the issue.
If you have one “healthy” Content Index state database copy, reseed the database copy that has the unhealthy copy of content Index. In some cases when you add database copy in the DAG, the content index can go into unhealthy status (such as failed, unknown, failedandsuspended etc). To recover the content Index back to healthy, you can reseed the content index by running the “Update-MailboxDatabaseCopy” command with switch “-CatalogOnly”.
Update-MailboxDatabaseCopy LAB-DB03\EX16-01 –CatalogOnly
If you already tried this command, you may get the warning “Seeding cannot be requested for the same database copy until the failed request has been cleaned up by the server …”. Please say “y” to proceed with the reseed operations.
Verify the ContentIndexState after sometime, you will see a “healthy” state.
If the above steps did not go through fine, and if you receive errors similar to below even if all these services are healthy then it will be not an easy task to bring back your ContentIndexstate to healthy.
WARNING: Seeding of content index catalog for database 'LAB-DB03' failed. Please verify that the Microsoft Search (Exchange) and the Host Controller service for Exchange services are running and try the operation again. Error: An error occurred while processing a request on server 'EX16-03'. Error: An Exception was received during a FAST operation..
In some cases, there are disk errors but in my case the server was recovered from a unexpected failure which lead to run recovery steps on the database. Post the recovery the ContentIndexState was in healthy status in some database copies, but did not work quite fine in others.
Please note that the Content Index State only impact the end user experience, you are still allowed to mount the database on server where the Content Index State is not healthy. Hence the recommended approach is to create additional database and move the mailboxes gradually to get rid of the problematic database.
In my case, I also had the following event (Event ID 40025) logged on the server where this database was active.
Please share you comments or issues.
-Praveen
You may receive the following error when you tried to re-add a recovered Exchange 2016 Mailbox server back to Database Availability Group (DAG).
-
A server-side database availability group administrative operation failed. Error The operation failed. CreateCluster
errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster
operation. Error: Node EX16-01 is already joined to a cluster..
-
You may also see that the following error when trying to run the command "Set-DatabaseAvailabilityGroup". Please note I ran the command from the cluster node which I was trying to add to DAG.
-
The following servers in the Windows Failover Cluster are not in Active Directory: ex16-01. This is usually the result
of an incomplete membership change (add or remove) of the database availabilty group.
-
The above error usually the result of disabled cluster service, or AD replication errors.
It is evident that, some stale entries or communication error stops from adding recovered exchange server to the DAG cluster. Before you start any process to forcefully remove or add, try to do a force replication of Active Directory and repeat the usual procedures. If the results are same, follow the below process.
Run the following command from a healthy DAG Member.
1. Forcefully Remove cluster node which you failed to join the DAG cluster. In my case, I was trying to re-add the node ex16-01 back to the DAG02 Database Availability Group.
Get-ClusterNode ex16-01 | Remove-ClusterNode -Force
Wait for 10 - 15 minutes, so that the replication happens.
2. Run the command Add-DatabaseAvailabilityGroupServer with the node we want to add to cluster.
Add-DatabaseAvailabilityGroupServer -Identity DAG02 -MailboxServer EX16-01
3. Verify the Database Availability Group status,
[PS] C:\Windows\system32>Get-DatabaseAvailabilityGroup DAG02 | fl Name,Started*
Name : DAG02
StartedMailboxServers : {ex16-01.lab.ed.com, EX16-03.lab.ed.com, EX16-02.lab.ed.com}
Following screen shot shows the chronology for your better understanding.
Post this, you may proceed with adding database copies or additional cluster operations.
-Praveen
This video help you understand the process of introducing or installing new version of Exchange, Exchange Server 2016 to existing environment. The LAB I have is with Exchange 2010 and not with Exchange 2013. But the process are the same to introduce the new exchange server.
Ensure you went through the pre-requisites such as the minimum patch level, AD requirements etc. This video walks you through only the installation procedure and the time is not actual.
The actual process may take anytime between 40 min to 90 minutes, do not disturb the process when feels no progress. You may just drag the Command windows to refresh the progress if any.
I preferred to choose command line installation, much faster an easier.
-Praveen
Updates for all versions of Exchange Servers are now available on the Microsoft Download Center. First ever update for Exchange Server 2016, Cumulative Update 12 Exchange Server 2013 and Exchange Server 2007 and Exchange Server 2010 Update Rollups are available for the customers to download and patch.
More Details and Download Links,
Cumulative Update 1 for Exchange Server 2016 - Download CU1 ver 2016
Exchange Server 2013 Cumulative Update 12 - Download CU12 ver 2013
Exchange Server 2010 Service Pack 3 Update Rollup 13 - Download RU13 ver 2010
Update Rollup 19 for Exchange Server 2007 Service Pack 3 (KB3141352) - Download RU16 Ver 2007
As always remember to test these updates to validate the compatibility with existing environment and the integrations if any.
-Praveen
A quick note on Exchange Team Blog (EHLO Blog) has mentioned that Exchange on-premises servers will run into some known issues if .NET Framework 4.6.1 is installed. Considering the impact, MS Exchange team has advised to delay the installation of .NET Framework 4.6.1 which is currently available through Windows Update.
It is worth thinking why MS has allowed it on Windows Update, without the confirmation from Exchange On-Prem team testing. We will yet to hear more updates on this, however it is highly recommended not to install the 4.6.1 update now.
Some of the reference articles for you to go through.
Exchange says no to .NET Framework 4.6.1
On .NET Framework 4.6.1 and Exchange compatibility
-Praveen
By now we are familiar with the concept of DAG. Database availability group is nothing but a set of up to 16 Exchange Server 2016 mailbox servers that provide high availability or automatic database recovery database-level recovery from a database level. It is an improved clustering operation from Exchange Server 2010 onwards. I recommend you to read more about DAG if you are not familiar with, as this article gives only an insight to DAG creation and database copy configuration.
1. Open Exchange admin center (EAC), and navigate to Servers
2. Click on Database Availability Group on the right pane of the console
3. Click on + sign to add DAG
4. Enter the required information such as DAG Name, Witness Server, Witness directory IP address etc. and click on Save button
Note: If you plan to use a domain controller as witness server, please add it to the Exchange Trusted Subsystem security group. In addition, add the Exchange Trusted Subsystem into the Administrators group
This will create a Database Availability Group for you Exchange Server 2016 server.
1. Select the DAG name and click on Manage
2. Save once you add all required servers.
3. Now you can add database for High Availability.
Note: The process will take some time, as the failover cluster feature will be installed during the member addition process. Please wait patiently until the operation is finished.
1. Create Database on one of the DAG Members (follow the usual steps)
2. Select the Database for which you need High Availability, and click on Add database copy from more option.
[Note: You might require to re-login to EMC in case if you did not find the Add database copy option]
3. Follow the wizard and select the server to which you need to replicate the database to.
4. Save the changes, all the required folder and files will be created automatically similar to ver 2010 & 2013.
After the successful addition of database copy, you would be able to see that the database copy on server X1 status is Active Mounted and copy on X2 is Passive Healthy.
Creating the DAG and Adding members to DAG operations are much faster when you use Shell commands, however the through GUI it becomes easy to understand.
I will cover more operation on DBs and DAGs in future posts. Share your suggestions to improve the quality of information.
-Praveen