Troubleshooting Database Mail Failures

Description

Database Mail is a convenient and easy way to send alerts, reports, or data from SQL Server. Failures are not obvious to the us though, and developing a process to monitor these failures alongside other failures will save immense headaches if anything ever goes wrong.

Database Mail: a (very) brief overview

Database Mail is a component of SQL Server that is available in every edition, except for Express. This feature is designed to be as simple as possible to enable, configure, and use.

Database Mail relies on SMTP to send emails via a specified email server to any number of recipients. When configuring, you provide a mail server, credentials (if needed), and then the service is ready to use. We’ll be focusing here on failure reporting and not configuration. If you need help setting up or configuring this feature, check out some of the references at the end of this article.

Once configured, database mail is used as the default method of email delivery by SQL Server unless specified otherwise. This includes emails that originate from SQL Server Agent via job failures or alerts, as well as any emails that we send using sp_send_dbmail. Sp_send_dbmail is a system stored procedure that resides in MSDB, and provides the ability to send emails using a wide variety of features. For example, we can send an email with a simple configuration like this:

EXEC msdb.dbo.sp_send_dbmail

@profile_name = 'Default Public Profile',

@recipients = 'ed@mydomain.com',

@subject = 'Test',

@body = 'This is a test email. Nothing to see here.';

When executed, we get a rather unceremonious (though desired) response:

There are another 20 parameters that can be used as well, ranging from file attachments, queries, importance, BCC, etc…You can also format an email as HTML and include tables and tags to add style or to organize a complex message. We will demo more involved usage of sp_send_dbmail below as we attack failure reporting.

Given all these features, it is possible for things to break along the way, either within a query, reading a profile, connecting to a mail server, or sending the email. We’ll dive into different ways in which Database Mail can fail, how to alert on these failures, and how to respond effectively to emails that have failed to send.

Viewing failure data

To manage Database Mail failures correctly, we need to set boundaries for our research. An email is generated and sent from SQL Server, but once it has been handed off to the mail server successfully, it is no longer in the scope of SQL Server. An email that is delivered to a mail server successfully is seen as a success by SQL Server, even if a failure occurs later on that mail server.

For our purposes, we will limit our research and alerting to features and components within SQL Server. If issues arise in which an email has been sent successfully by SQL Server, but not received by its recipients, then it will be necessary to check the mail server logs to ensure that something else did not go wrong. Invalid recipients, getting caught in spam filters, and server connectivity issues are not uncommon and can result in emails that appear to have sent correctly, but never reached their destination.

An important note to begin with: Database Mail failures reside in a separate log from the SQL Server and SQL Server Agent logs. This is a big deal as it means that most standard error log reports/searches will NOT include these errors by default! As a result, without explicitly logging and alerting on Database Mail failures, you may not know that emails are failing to send until someone downstream complains. As always, it is in our best interest to catch failures of any kind quickly and (if possible) resolve them before they negatively impact others that rely on those processes.

The image above shows how error logs are separated by type, and that the Database Mail log is distinct from other SQL Server and Windows logs. While viewing errors in the GUI is useful when we have a known problem to chase down, it is not an effective form of monitoring as a human staring at a screen waiting for errors is not terribly efficient (or fun!)

All of the work that we accomplish here will be to report on this log, including as much detail as is possible about both the failure, as well as the email that failed to send.

To start, let’s introduce the system views that contain Database Mail data and take a look at some of the date contained within them:

Msdb.dbo.sysmail_profile: For each Database Mail profile that exists, a row will be in this view that provides the profile name, description, last modified time, and last modified user. When using sp_send_dbmail, you must specify a profile that was previously defined in the Database Mail configuration. If you have yet to define any, then check out a guide on configuring Database Mail in the links at the end of this article. The key piece of information in this view is the profile name, as this is what you reference when using Database Mail.

Msdb.dbo.sysmail_event_log: Contains a row per informational or error message returned by Database Mail. You may configure the sensitivity of this collection via the Logging Level setting in the Database Mail configuration. In general, I prefer extended logging as we can very easily filter out informational messages later. The extra details can be useful when troubleshooting a server problem. For example, knowing when Database Mail starts and shuts down can be helpful when diagnosing email, network, or service problems.

If anything unusual happens, it will be logged here. With details of the error message, as well as the process are logged here. More importantly, the mail item ID is also included, allowing us to tie a Database Mail error directly to a specific email that failed to send.

Msdb.dbo.sysmail_faileditems: If a message fails to send, this contains all of the parameters that were passed to sp_send_dbmail, as well as the time of the failure. This is extremely useful for troubleshooting a failure and determining possible causes for the message not sending.

Another great use for this data is that with it, you can reconstruct a failed message and resend it! While automating a resend process could be a bit risky, the ability to aggregate a set of failed messages and resend them en masse is an immense time-saver and avoids the need to manually hack through the alerts, reports, and queries that were included in those messages.

Msdb.dbo.sysmail_help_queue_sp: This system stored procedure returns the status of the Database Mail queue:

1	EXEC msdb.dbo.sysmail_help_queue_sp @queue_type = 'Mail';

The key takeaway from these results is the length of the queue. If this number is zero, then the queue is empty and Database Mail is likely idle. If the queue is greater than zero, then that indicates that there are more messages to process and the service has yet to catch up. If this number is growing larger over an extended period of time, then this could be indicative of a problem with the service or an excessively large flood of messages to Database Mail.

With these building blocks, we can put together a solution that will check for Database Mail Failures, log them, and alert us when detected.

Automating failure alerting

Before diving into code, let’s put together an outline of how we will build our solution. In order to collect data on, log, and alert on failures, we’ll want to follow a process similar to this:

Create a table to store error and failed message details.
Create a stored procedure that will check for and log Database Mail failures
1. Log all failed items to the table above.
2. If any failed items were logged above, the email an operator about them.
3. Flag that failed item as sent, so we do not resend repeatedly.
Create a job that periodically calls the stored procedure above.
Create a view that allows easy reconstruction of the original Database Mail command, using the various components collected via the alerting process.

Create a table to store database mail failure data

We want to include two distinct sets of data within this table:

The details of the failure, including time and error message.
The details of the email itself, including contents, attachments, queries, etc…

By having both of these components, we can not only fix the cause of the failure, but we can also validate the email that was to be sent and resend it, if needed.

CREATE TABLE dbo.database_mail_failure

( database_mail_failure_id INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_database_mail_failure PRIMARY KEY CLUSTERED,

error_time_utc DATETIME NOT NULL,

error_time_local DATETIME NOT NULL,

error_description VARCHAR(MAX) NULL,

mail_item_id INT NOT NULL,

mail_profile_id INT NOT NULL,

mail_recipients VARCHAR(MAX) NOT NULL,

mail_recipients_cc VARCHAR(MAX) NULL,

mail_recipients_bcc VARCHAR(MAX) NULL,

mail_subject VARCHAR(MAX) NULL,

mail_body VARCHAR(MAX) NULL,

mail_body_format VARCHAR(20) NULL,

mail_importance VARCHAR(6) NULL,

mail_sensitivity VARCHAR(12) NULL,

file_attachments VARCHAR(MAX) NULL,

attachment_encoding VARCHAR(20) NULL,

query VARCHAR(MAX) NULL,

query_database VARCHAR(100) NULL,

attach_query_result_as_file BIT NULL,

query_result_header BIT NOT NULL,

query_result_width INT NULL,

query_result_separator VARCHAR(1) NULL,

exclude_query_output BIT NULL,

append_query_error BIT NULL,

mail_send_request_added_to_queue_time_utc DATETIME NOT NULL,

mail_send_request_user VARCHAR(100) NOT NULL,

mail_send_request_removed_from_queue_time_utc DATETIME NULL,

has_email_been_sent_to_operator BIT NOT NULL);

CREATE NONCLUSTERED INDEX IX_database_mail_failure_error_time_utc ON dbo.database_mail_failure (error_time_utc);

CREATE NONCLUSTERED INDEX IX_database_mail_failure_error_time_local ON dbo.database_mail_failure (error_time_local);

This table contains every column from sysmail_faileditems, as well as some error details from sysmail_event_log. The indexes on error times allows us to more efficiently search for failures in the future, if this table gets large.

This amount of detail may seem excessive, but only with all message details are we capable of fully understanding what email was to be sent, to whom, and all of its details. Oftentimes, many of these columns will be NULL as the messages may be simple emails with a subject, body, and a few recipients.

Create a stored procedure to log failures

With a place to save failure information, we can now create a stored procedure that will populate this table with details whenever a Database Mail failure occurs. Functionally, our tasks are simple and will not take that much code to complete.

Our proc definition will include a single parameter:

CREATE PROCEDURE dbo.monitor_database_mail_failures

@minutes_to_monitor SMALLINT = 1440

BEGIN

SET NOCOUNT ON;

This parameter allows us to determine how far back in time to look for failures. We should choose a time frame large enough to account for timeouts, delays, and maintenance, but not so long that it might pull noise from old/stale items, or noise that may result from any MSDB cleanup of old items you may perform. We’ll choose 1 day (1440 minutes), but feel free to adjust higher or lower as needed.

First, we’ll determine the UTC offset from local time. Since times stored in the system views are in local server time, we need to convert to UTC in order to store our times in UTC. We’re choosing UTC over local time to make the code and data more portable:

DECLARE @utc_offset INT;

SELECT

@utc_offset = -1 * DATEDIFF(HOUR, GETUTCDATE(), GETDATE());

Whenever accessing data, remember that times are in UTC and need to be converted if you’re looking to view them in your local time zone. The benefit will be that data can be read from anywhere without question as to its time zone or locale. That being said, we’ll include local time as well, for convenience.

INSERT INTO dbo.database_mail_failure

(error_time_utc, error_time_local, error_description, mail_item_id, mail_profile_id, mail_recipients, mail_recipients_cc, mail_recipients_bcc,

mail_subject, mail_body, mail_body_format, mail_importance, mail_sensitivity, file_attachments, attachment_encoding,

query, query_database, attach_query_result_as_file, query_result_header, query_result_width, query_result_separator,

exclude_query_output, append_query_error, mail_send_request_added_to_queue_time_utc, mail_send_request_user,

mail_send_request_removed_from_queue_time_utc, has_email_been_sent_to_operator)

SELECT DISTINCT

DATEADD(HOUR, @utc_offset, sysmail_faileditems.last_mod_date) AS error_time_utc,

sysmail_faileditems.last_mod_date AS error_time_local,

REPLACE(REPLACE(sysmail_event_log.description, CHAR(10), ' '), CHAR(13), ' ') AS error_description,

sysmail_faileditems.mailitem_id AS mail_item_id,

sysmail_faileditems.profile_id AS mail_profile_id,

sysmail_faileditems.recipients AS mail_recipients,

sysmail_faileditems.copy_recipients AS mail_recipients_cc,

sysmail_faileditems.blind_copy_recipients AS mail_recipients_bcc,

sysmail_faileditems.subject AS mail_subject,

sysmail_faileditems.body AS mail_body,

sysmail_faileditems.body_format AS mail_body_format,

sysmail_faileditems.importance AS mail_importance,

sysmail_faileditems.sensitivity AS mail_sensitivity,

sysmail_faileditems.file_attachments,

sysmail_faileditems.Attachment_encoding AS attachment_encoding,

sysmail_faileditems.Query AS query,

sysmail_faileditems.execute_query_database AS query_database,

sysmail_faileditems.attach_query_result_as_file,

sysmail_faileditems.query_result_header,

sysmail_faileditems.query_result_width,

sysmail_faileditems.query_result_separator,

sysmail_faileditems.exclude_query_output,

sysmail_faileditems.append_query_error,

DATEADD(HOUR, @utc_offset, sysmail_faileditems.send_request_date) AS mail_send_request_added_to_queue_time_utc,

sysmail_faileditems.send_request_user AS mail_send_request_user,

DATEADD(HOUR, @utc_offset, sysmail_faileditems.sent_date) AS mail_send_request_removed_from_queue_time_utc,

0 AS has_email_been_sent_to_operator

FROM msdb.dbo.sysmail_faileditems

LEFT JOIN msdb.dbo.sysmail_event_log

ON sysmail_faileditems.mailitem_id = sysmail_event_log.mailitem_id

LEFT JOIN msdb.dbo.sysmail_profile

ON sysmail_profile.profile_id = sysmail_faileditems.profile_id

WHERE DATEADD(HOUR, @utc_offset, sysmail_faileditems.send_request_date) > DATEADD(MINUTE, -1 * @minutes_to_monitor, GETUTCDATE())

AND sysmail_faileditems.mailitem_id NOT IN (

SELECT database_mail_failure.mail_item_id FROM dbo.database_mail_failure WHERE database_mail_failure.mail_send_request_added_to_queue_time_utc > DATEADD(MINUTE, -1 * @minutes_to_monitor, GETUTCDATE()))

This query will pull data from the database mail views in MSDB that we discussed earlier and deposit it into the database_mail_failure table. Note that the bit has_email_been_sent_to_operator is set to 0. We will change this to 1 later on, after an alert has been sent.

Now that we have logged the failure, we can compose an email that will alert us of it. You may be asking, “What if email is down, how will we get that alert?” We’ll address that shortly as it is a valid question, and one that should be asked for any alerting system.

DECLARE @profile_name VARCHAR(MAX) = 'Default Public Profile';

DECLARE @email_to_address VARCHAR(MAX) = 'ed@test.com';

DECLARE @email_subject VARCHAR(MAX);

DECLARE @email_body VARCHAR(MAX);

IF EXISTS (SELECT * FROM dbo.database_mail_failure WHERE database_mail_failure.has_email_been_sent_to_operator = 0)

BEGIN

SELECT @email_subject = 'Failed Database Mail Alert: ' + ISNULL(@@SERVERNAME, CAST(SERVERPROPERTY('ServerName') AS VARCHAR(MAX)));

SELECT @email_body = 'At least one database mail failure has occurred on ' + ISNULL(@@SERVERNAME, CAST(SERVERPROPERTY('ServerName') AS VARCHAR(MAX))) + ':

<tr>

<th bgcolor="#F29C89">Server Error Time</th>

<th bgcolor="#F29C89">Error Description</th>

<th bgcolor="#F29C89">Mail Recipients</th>

<th bgcolor="#F29C89">Mail Subject</th>

<th bgcolor="#F29C89">Mail Body Format</th>

<th bgcolor="#F29C89">Mail Attachments</th>

<th bgcolor="#F29C89">Query</th>

<th bgcolor="#F29C89">Query Database</th>

</tr>';

SELECT @email_body = @email_body + CAST((SELECT CAST(DATEADD(HOUR, -1 * @utc_offset, database_mail_failure.error_time_utc) AS VARCHAR(MAX)) AS 'td', '',

database_mail_failure.error_description AS 'td', '',

database_mail_failure.mail_recipients AS 'td', '',

database_mail_failure.mail_subject AS 'td',

database_mail_failure.mail_body_format AS 'td', '',

database_mail_failure.file_attachments AS 'td', '',

database_mail_failure.query AS 'td',

database_mail_failure.query_database AS 'td'

FROM dbo.database_mail_failure

WHERE database_mail_failure.has_email_been_sent_to_operator = 0

ORDER BY database_mail_failure.error_time_utc ASC

FOR XML PATH('tr'), ELEMENTS) AS VARCHAR(MAX));

SELECT @email_body = @email_body + '</table></body></html>';

SELECT @email_body = REPLACE(@email_body, '<td>', '<td valign="top">');

EXEC msdb.dbo.sp_send_dbmail

@profile_name = @profile_name,

@recipients = @email_to_address,

@subject = @email_subject,

@body_format = 'html',

@body = @email_body;

This alert checks to see if any unsent failures exist, and if so will compose an email with the most pertinent information from the table and send it over to me. Not that the mail profile and recipients are hard-coded here. Feel free to add these as stored proc parameters if you have need to change them often.

Lastly, we will set the has_email_been_sent_to_operator to 1 to signify that these alerts have been passed on to an operator:

UPDATE database_mail_failure

SET has_email_been_sent_to_operator = 1

FROM dbo.database_mail_failure

WHERE database_mail_failure.has_email_been_sent_to_operator = 0;

Create a SQL Server Agent job

With an alerting stored procedure available for use, we’ll create a job that runs every 15 minutes and checks for failed Database Mail messages:

Create a view to reconstruct messages

Within our database_mail_failure table are all of the parameters from a typical use of Database Mail. As a result, we can use that information to reconstitute a command that could then be used to resend the original message. This provides us with a great deal of convenience in the event of an emergency. We can use this to resend failed emails, or at least review messages before resending.

In the event that a large number of failures occur at once, this allows us to avoid a laborious manual process in order to accomplish a this task. The following view returns every column from database_mail_failure, as well as one additional column called database_mail_query, which contains an sp_send_dbmail command using the details from the failed message:

CREATE VIEW dbo.v_failed_database_mail_message_detail

WITH CTE_DATABASE_MAIL_QUERY AS (

SELECT DISTINCT

CAST('EXEC msdb.dbo.Sp_send_dbmail

@profile_name = ''' + CAST(database_mail_failure.mail_profile_name AS NVARCHAR(MAX)) + ''',' +

CASE WHEN database_mail_failure.mail_profile_name IS NOT NULL THEN '

@recipients = ''' + CAST(database_mail_failure.mail_recipients AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.mail_recipients_cc IS NOT NULL THEN '

@copy_recipients = ''' + CAST(database_mail_failure.mail_recipients_cc AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.mail_recipients_bcc IS NOT NULL THEN '

@blind_copy_recipients = ''' + CAST(database_mail_failure.mail_recipients_bcc AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.mail_body IS NOT NULL THEN '

@body = ''' + CAST(REPLACE(database_mail_failure.mail_body, '''', '''''') AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.mail_subject IS NOT NULL THEN '

@subject = ''' + CAST(REPLACE(database_mail_failure.mail_subject, '''', '''''') AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.mail_body_format IS NOT NULL THEN '

@body_format = ''' + CAST(database_mail_failure.mail_body_format AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.mail_importance IS NOT NULL THEN '

@importance = ''' + CAST(database_mail_failure.mail_importance AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.mail_sensitivity IS NOT NULL THEN '

@sensitivity = ''' + CAST(database_mail_failure.mail_sensitivity AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.file_attachments IS NOT NULL THEN '

@file_attachments = ''' + CAST(REPLACE(database_mail_failure.file_attachments, '''', '''''') AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.query IS NOT NULL THEN '

@query = ''' + CAST(REPLACE(database_mail_failure.query, '''', '''''') AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.query_database IS NOT NULL THEN '

@execute_query_database = ''' + CAST(database_mail_failure.query_database AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.attach_query_result_as_file IS NOT NULL THEN '

@attach_query_result_as_file = ' + CAST(database_mail_failure.attach_query_result_as_file AS NVARCHAR(MAX)) + ',' ELSE '' END +

CASE WHEN database_mail_failure.query_result_header IS NOT NULL THEN '

@query_result_header = ' + CAST(database_mail_failure.query_result_header AS NVARCHAR(MAX)) + ',' ELSE '' END +

CASE WHEN database_mail_failure.query_result_width IS NOT NULL THEN '

@query_result_width = ' + CAST(database_mail_failure.query_result_width AS NVARCHAR(MAX)) + ',' ELSE '' END +

CASE WHEN database_mail_failure.query_result_separator IS NOT NULL THEN '

@query_result_separator = ''' + CAST(database_mail_failure.query_result_separator AS NVARCHAR(MAX)) + ''',' ELSE '' END +

CASE WHEN database_mail_failure.exclude_query_output IS NOT NULL THEN '

@exclude_query_output = ' + CAST(database_mail_failure.exclude_query_output AS NVARCHAR(MAX)) + ',' ELSE '' END +

CASE WHEN database_mail_failure.append_query_error IS NOT NULL THEN '

@append_query_error = ' + CAST(database_mail_failure.append_query_error AS NVARCHAR(MAX)) + ',' ELSE '' END AS NVARCHAR(MAX)) AS database_mail_query,

database_mail_failure.*

FROM dbo.database_mail_failure)

SELECT

CASE

WHEN RIGHT(CTE_DATABASE_MAIL_QUERY.Database_Mail_Query, 1) = ','

THEN SUBSTRING(CTE_DATABASE_MAIL_QUERY.Database_Mail_Query, 1, LEN(CTE_DATABASE_MAIL_QUERY.Database_Mail_Query) - 1)

ELSE CTE_DATABASE_MAIL_QUERY.Database_Mail_Query

END AS database_mail_query,

CTE_DATABASE_MAIL_QUERY.mail_item_id,

CTE_DATABASE_MAIL_QUERY.mail_profile_name,

CTE_DATABASE_MAIL_QUERY.mail_recipients,

CTE_DATABASE_MAIL_QUERY.mail_recipients_cc,

CTE_DATABASE_MAIL_QUERY.mail_recipients_bcc,

CTE_DATABASE_MAIL_QUERY.mail_subject,

CTE_DATABASE_MAIL_QUERY.mail_body,

CTE_DATABASE_MAIL_QUERY.mail_body_format,

CTE_DATABASE_MAIL_QUERY.mail_importance,

CTE_DATABASE_MAIL_QUERY.mail_sensitivity,

CTE_DATABASE_MAIL_QUERY.file_attachments,

CTE_DATABASE_MAIL_QUERY.attachment_encoding,

CTE_DATABASE_MAIL_QUERY.query,

CTE_DATABASE_MAIL_QUERY.query_database,

CTE_DATABASE_MAIL_QUERY.attach_query_result_as_file,

CTE_DATABASE_MAIL_QUERY.query_result_header,

CTE_DATABASE_MAIL_QUERY.query_result_width,

CTE_DATABASE_MAIL_QUERY.query_result_separator,

CTE_DATABASE_MAIL_QUERY.exclude_query_output,

CTE_DATABASE_MAIL_QUERY.append_query_error,

CTE_DATABASE_MAIL_QUERY.mail_send_request_added_to_queue_time_utc,

CTE_DATABASE_MAIL_QUERY.mail_send_request_user,

CTE_DATABASE_MAIL_QUERY.mail_send_request_removed_from_queue_time_utc,

CTE_DATABASE_MAIL_QUERY.error_description,

CTE_DATABASE_MAIL_QUERY.error_time_local,

CTE_DATABASE_MAIL_QUERY.error_time_utc

FROM CTE_DATABASE_MAIL_QUERY;

While that is quite a bit of TSQL, it’s mostly formatting, NULL handling, and cleanup of the original details in order to produce a valid sp_send_dbmail command. We can test the view with a query that checks based on error date:

SELECT

FROM dbo.v_failed_database_mail_message_detail

WHERE error_time_local BETWEEN '1/1/2018' AND '5/1/2018';

The result shows our new column added to the beginning of the column list:

We can pull out the first column for a query in the list and view the entirety of its text here:

EXEC msdb.dbo.Sp_send_dbmail

@profile_name = 'Default Public Profile',

@recipients = 'not_a_real_email_address',

@body = 'testbody',

@subject = 'testsubject',

@body_format = 'TEXT',

@importance = 'NORMAL',

@sensitivity = 'NORMAL',

@attach_query_result_as_file = 0,

@query_result_header = 1,

@query_result_width = 256,

@query_result_separator = ' ',

@exclude_query_output = 0,

@append_query_error = 0

There it is! The original email message that failed. We can tell now that it failed due to the @recipients parameter receiving an (intentionally) invalid email address. From here, we can take the necessary troubleshooting steps and choose to resend the email, if necessary.

Test the process

One last step for us is to fail an email and validate that the process works correctly. How does the alert look and does it contain all of the information we are looking for?

The above email was the result of my sending two emails using sp_send_dbmail and providing invalid email addresses for the @recipients parameter. The resulting alert is formatted into an HTML table using data from database_mail_failure. A convenience here is that we can reconstruct the email or any related data from the failed message at any time from that table. This provides some level of insurance against additional alerting problems that make receiving the message above problematic.

But, what happens if email is down?!

The question that you have been thinking about since the start of this article is about to be answered! If email is unavailable, and we are alerting on failed messages via email, then how will this work!? There are several ways to attack this problem, and ideally, we would address all of them:

Monitor & alert the mail server effectively

First and foremost, we should have monitoring and alerting configured for our mail server. If email is an important channel of communication for production database server events then it must be monitored. If the mail server becomes unavailable or ceases to send/receive messages for any significant amount of time, then the appropriate people should be notified in order to fix it. Odds are if the email is down that many other important alerts are also not being received.

Monitor SQL Server Agent

SQL Server Agent jobs are all executed via the SQL Server Agent service. This is a Windows process that should be monitored and alerted on if it stops or becomes unresponsive. If SQL Server Agent is used for any monitoring, alerting, or data processing, then it should receive a similarly high level of priority with regards to ensuring that it is up and running at all times. If it goes down for any reason, then we’d benefit from immediate alerts to prevent downstream processing from failing, or worse, never running.

Consider other communication methods

While SQL Server relies heavily on email for alerting, you can integrate other monitoring tools that your organization uses into alert data in order to trigger texts, phone calls, or other types of communications.

For example, the database_mail_failure table could be monitored for new entries, and in addition to email, send a text to an on-call resource. Oftentimes, repeated failed messages will be indicative of a bigger problem that may be affecting a larger cross-section of an organization, whereas a single failure may simply be the result of a bad email address or email parameters.

Implement a secondary mail server

If email is a critical alerting component, then adding high availability to it can be a great way to avoid havoc if the mail server fails. Once a secondary server is available, you can create a second mail account and profile on your SQL Servers and automatically swap the default profile to it when needed.

Aggregation and summarization of alerts

If an organization manages many SQL Servers, there can be a benefit to concatenating messages across all servers and sending out a summary periodically. This strategy is more involved but has a beneficial side effect of offloading an alert email to another server. If the mail server used by this server is different from the failure’s originating server, then we have a somewhat roundabout way to avoid losing insight into failures.

That is a lot of work!

While these are all valid approaches, we realistically do not need to consider all of them. If email is monitored and maintained sufficiently, then the odds of a prolonged outage are low and most organizations are OK with that risk with the understanding that an operator can respond to and resolve a mail server issue quickly and efficiently if needed.

Still, addressing single-points-of-failure is important, and accepting that email is often a bottleneck for alerting and monitoring helps in avoiding long-term outages due to lack of knowledge of a mail server problem. We often consider no news as good news in the world of alerting, but no news can also mean that alerting processes are broken, and silence is the result.

Retrieving failed messages later

Our process leaves failed message details in the dbo.database_mail_failure table indefinitely. As a result, it’s possible to return to it at any time and review what is in there. If we wanted more flexibility in alerting, we could separate the alerting portion of failed message data collection into a separate process that reruns until successful.

Alternatively, if a failed message email fails, then another failed message alert will be generated. While not a fun situation, this would at least ensure that we do not completely lose insight into mail server problems.

Generally speaking, failed Database Mail messages should be rare, but if they happen very frequently on your servers, you may wish to add some cleanup into the database_mail_failure table. Simply add a DELETE statement at the end of the process that removes rows that have been alerted on, and where the error date is older than some acceptable amount of time.

Conclusion

Database Mail is a useful tool that allows emails to be generated from SQL Server quickly and efficiently. It also is a separate component of SQL Server, complete with its own error log and configuration.

This feature can fail for many reasons, including mail server outages, network configuration changes, invalid emails or parameters, or downstream problems in the mail server.

Setting up alerting against Database Mail, storing failed message data, and being able to quickly resend messages can save immense time and remove the chance that a process failure is resulting in an extended outage of email from SQL Server.

Downloads

Failed database mail process

Author
Recent Posts

Ed Pollack

Ed has 20 years of experience in database and systems administration, developing a passion for performance optimization, database design, and making things go faster.He has spoken at many SQL Saturdays, 24 Hours of PASS, and PASS Summit.This lead him to organize SQL Saturday Albany, which has become an annual event for New York’s Capital Region.

In his free time, Ed enjoys video games, sci-fi & fantasy, traveling, and being as big of a geek as his friends will tolerate.

View all posts by Ed Pollack

Description

Database Mail: a (very) brief overview

Viewing failure data

Automating failure alerting

But, what happens if email is down?!

Conclusion

Downloads

Related posts: