Mail Gateway Troubleshooting
From Halon Security
Mail systems are somewhat complex. It's not always straight forward how to design and configure a mail system; and there is a lot of things that can go wrong. This guide will help you debug and solve some of the most common problems.
Contents |
License Problems
The SPG/VSP requires a subscription license (called "commercial") for the Commtouch (real-time spam and virus defense) and Kaspersky (proactive and signature-based virus defense) to work. The appliance also has a max number of users (that is, e-mail recipients) that may be set to a specific number, of unlimited.
Exceeding the number of users
Many mail servers are configured to accept any e-mail address (such as xyz123@your-company.com, even if such a user does not exist). In order for the license not to exceed, this has to be disabled. The phenomenon are sometimes referred to as catch-all. Also make sure that the e-mail server does not "trust" the SPG/VSP: sometimes servers accept any e-mails from computers with internal addresses. The user count is reset whenever the appliance is restarted.
Disabling catch-all in Exchange 2007
Go to your recipient connector and disable anonymous users, and restart the Exchange server. Restart the VSP/SPG appliance and no unknown users should be allowed and the recipient count should be reset to a sane value.
Network Problems
These kinds of problems are often related to network issues or misconfiguration. Just like any other device in a network the SPG/VSP must have network access with default routes, and DNS configured. If not; you probably cannot expect the unit to work as intended. In order to verify if the unit has sufficient network access you can go to Diagnostics → Troubleshooter. In order for the SPG to even try to accept a incoming connection it must have a Incoming Listener configured for that IP-address (see the Mail Gateway documentation).
It's also a good idea to verify has a green light next to it in the Web Interface (Network → Addresses) indicating an active network link status. The first thing you should always check when debugging a network is that the Ethernet cable is connected; since it's normally the last thing you would expect to be missing.
DNSQuery failed for domain ....
If the unit fails to resolve a domain name inside the HSL Scripting Language it will report an "error" (they have a INFO level). But not all of these "errors" are actually errors, some of them may even be a sign that things are working as they should. If you're using DNS blacklists and you see something like this in the log.
DNSQuery failed for domain 123.123.23.45.list.dsbl.org: hostname nor servname provided, or not known
In the way that DNS Blacklists works, if it cannot find the "hostname" the IP-address is not considered blocked. So for all the "good" IPs there should be a "DNSQuery" failure in the log.
Configuration Errors
Invalid Configuration
If a configuration is considered invalid during the start up procedure (for whatever reason), the SPG will load a default configuration in its place. The action to take when this happens is to try to revert to configuration back to your latest configuration revision and examine the reason why it's no longer valid. If the error makes sense to you; export the revision, correct the error and try to import it again.
Inaccessible Storage Device
If the user defined storage could not be initalized during start up, eg. broken disk or inaccessible network storage, the SPG will commit a new configuration revision where the storage is memory:// and all incoming mail listerners are disabled. The action to take when this happens is to contact support/replace the disk or resolve the problem with the network storage. Then configure Mail Gateway → Storage to your preferred storage location and reboot; verify that it's up and running again and start all your incoming mail listeners to return to normal operation.
Scripting Errors
If a runtime scripting error occures the currect flow process will fallback to the default action for each flow. The Mail Flow will use "Deliver()" and the Access Flow will use "Allow()". These errors will appear in the system log and will describe the reason why they where trigged. The action to take when this happens is to examine your script where it fails and read the latest documentation about the function in the HSL manual.
Receiving Mail
If you can connect to the SPG; but it refuses to accept the mail. It's a very good idea to connect to the unit using telnet to see the exact error message given. The two most common errors are Recipient and DATA failures.
The SPG does not get any Mail
If you have updated the MX record it might take a hours or days before mail starts to arrive, since all DNS record are cached for a various amount of time. If you have multiple MX records at different priorities you cannot always be sure that the primary MX record is used. Some spammers actually use the last MX record since it's very common to have a backup MX which point directly at the mail server.
"Relay access denied"
If the recipient address is denied; it might be because of two reasons. The domain is not defined or the user is not in a active recipient database. In order to solve these problems read the Mail Gateway documentation.
If the SPG complains about no sufficient storage it has run out of storage, these are very uncommon and should almost never be encountered. There is really no solution for this except waiting for the unit to work the queues and deliver mail. To temporary solve this problem you can always delete mail manually from the queues but thats not a permanent solution. In order the minimize the amount of spam you may want to consider using a harder anti-spam policy (maybe using GlobalView or RBLs) or a bigger/external storage device. This failure may also occur for an individual user if his quarantine is full.
DATA Command failure
These failures can occur when the SPG runs out of storage, these are very uncommon and should almost never be encountered. There is really no solution for this except waiting for the unit to work the queues and deliver mail. To temporary solve this problem you can always delete mail manual from the queues but thats not a permanent solution. In order the minimize the amount of spam you may want to consider using a harder anti-spam policy or a bigger/external storage device. This failure may also occur for an individual user if his quarantine is full.
Quarantine
I cannot submit the form in the Quarantine Report (but the link works)
All mail clients have different security approaches on what is considered dangerous, and some of them do not allow forms and thats the reason we also provide a link in the mail.
Mail Processing
Mail are stuck in the Incoming Queue
If a mail gets stuck in the Incoming Queue; and cannot be processed; I may be because our SPG cannot handle the mail for some reason. There is not much you can do except delete it and possible notify the sender.
The flow does not work
In order for the flow to work the script has to be correct, It's not be possible to save or use a script that is completely wrong; but we cannot foresee or prevent runtime errors. These errors gets reported to the system log and the mail action falls back to deliver; which may explain why the flow does not work.
Delivering Mail
Mail are stuck in the Outgoing Queue
This problem may or may not be a problem in the SPG configuration. Its recommended to force a new try of the mail (using the retry-button) and look for errors in the System Log and in the Mail Server log if possible.
