Making CAVA work with SMB2 on your VNX

vnx-promo-bannerAs more and more people start to deploy a new VNX and switch to an advanced windows server operating system, I am seeing a higher utilization of the SMB2 protocol for cifs.  With this increase, comes new problems.  Recently I had noticed a rather peculiar notification in the server logs in regards to CAVA.  CAVA was reporting the error “FILE_NOT_FOUND” on scans when the file existed.  It would present itself as something like this:

 

2012-04-29 08:49:47: 81878122528: VC: 3: 32: Server ’192.168.1.156′ returned error ‘FILE_NOT_FOUND’ when checking file ‘\root_vdm_2\CIFS\Test\1234.exe’

 

The standard troubleshooting confirmed that the file did exist.  I even back traced it from the CAVA server through the “check$” share and did not have any problems with the file.  So why was CAVA reporting errors like this so often?  It turns out the problem was not with CAVA itself, but with an “enhancement” introduced as part of SMB2.

 

As part of the SMB2 protocol, the Microsoft Redirector uses a local cache for directory metadata.  This cache is usually cleared after 10 seconds.  What this does, in instances of file systems with a high rate of change, is cause an inconsistency with what the CAVA server sees when it goes to scan a file.  The CAVA server will actually read from the cache and error out when the file is not found in it.  This then causes the error that I pasted above.

 

Of course with a problem, comes a work around.  This was identified and placed into the latest VNX Event Enabler release notes, but I will provide it for you here:

 

  1. Open the Windows Registry Editor and navigate to HKLM\SystemCurrentControlSet\Services\LanmanWorkstation\Parameters.
  2. Right-click Parameters and select New > DWORD Value.
  3. For the new REG_DWORD entry, type a name of DirectoryCacheLifetime.
  4. Set the value to 0 to disable DirectoryCacheLifetime.
  5. Click OK.
  6. Restart the machine.

 

A simple registry change on each CAVA server and a reboot will allow you to set the cache lifetime value to 0 and thus there will be no more caching.  After this change you should not see any more problems caused by SMB2.

Understanding the EMC VNX/Celerra AntiVirus Agent (CAVA): Part 2 – Common Errors

This is part 2 of my CAVA blog post series. In this post, I will go through common error messages you could see in the output of server_viruschk. For those of you haven’t already, please check out part 1 where I go line by line through the output of the server_viruscheck command.

 

Most of these errors have to do with the account used for CAVA. This account is set as the “Log on as” option for EMC Cava in the “services” section of windows.

 

 

OBJECT_NAME_NOT_FOUND:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        OFFLINE at Sat Aug 20 20:28:33 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: , ntStatus: OBJECT_NAME_NOT_FOUND
                     AV Engine:
                     Server Name: cava.thulin.local
                     No signature date


Description: ntStatus: OBJECT_NAME_NOT_FOUND means that the cava service is not running on the server.

Solution: Start the EMC CAVA service under the services menu on the AV server.

 

ERROR_AUTH 5:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        ERROR_AUTH 5 at Sat Aug 20 21:00:10 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
                     AV Engine: Symantec AV
                     Server Name: cava.thulin.local
                     Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)


Description: ERROR_AUTH means that when cava when to connect to the “check$” folder on the cifs server, it ran into an error. In this case, ERROR_AUTH 5 means that the account does not have the viruschecking privilege.

Resolution: Check to make sure that the EMC CAVA process is running under the cava network user and not the Local System account. If this is correct, verify that you gave the CAVA network account the Viruschecking Privilege in the MMC snap in.

 

AV_NOT_FOUND:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        AV_NOT_FOUND at Sat Aug 20 20:29:59 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
                     AV Engine: Unknown third party antivirus software
                     Server Name: cava.thulin.local
                     Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)


Description: AV_NOT_FOUND means that CAVA cannot find a running AV process. By default, cava uses a privilege called “Debug Program Rights” to search for the following applications running in memory: SpntSvc.exe, rtvscan.exe, Mcshield.exe, InoRT.exe, SWEEPSRV.SYS, SavService.exe, NTRtScan.exe, and kavfs.exe

Solution: First check to make sure your antivirus software is installed and running. If this is true, then make sure the CAVA account has the Debug Program Rights. By default, this privilege is granted to all local administrators, so add the cava account to the local administrators folder.

 

INVALID_PARAMETER:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        OFFLINE at Sun Aug 21 17:08:28 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: , ntStatus: INVALID_PARAMETER
                     AV Engine:
                     Server Name: cava.thulin.local
                     No signature date


Description: ntStatus is throwing an error trying to connect from the Cifs server to the Cava server. This error is caused when the CIFS server specified for CAVA is not joined to AD.

Resolution: Join the cifs server to AD and restart CAVA.

 

ERROR_AUTH 64:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        ERROR_AUTH 64 at Sun Aug 21 18:16:05 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
                     AV Engine: Symantec AV
                     Server Name: cava.thulin.local
                     Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)


Description: ERROR_AUTH 64 is because there is a kerberos skew error.

Resolution: Make sure the time on the cava server is within 5 minutes of the data mover.

 

ERROR_AUTH 86:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        ERROR_AUTH 86 at Sun Aug 21 17:25:31 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
                     AV Engine: Symantec AV
                     Server Name: cava.thulin.local
                     Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)

Problem: ERROR_AUTH 86 is caused when someone changes the password of the CAVA user in AD, but the cava software is using the old password.

Resolution: Update the password used for the cava account on each cava server. If you attempt to restart cava without updating, cava will fail to start with a logon failure error.

 

ERROR_AUTH 1265:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        ERROR_AUTH 1265 at Sun Aug 21 16:04:33 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
                     AV Engine: Symantec AV
                     Server Name: cava.thulin.local
                     Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)

Description: ERROR_AUTH 1265 is caused when the cava user account has expired in AD. You can verify this if you attempt to login to a remote desktop with the cava user’s credentials.

Resolution: Have a domain admin reset the CAVA account and change it to never expire to keep this problem from returning.

 

ERROR_AUTH 1326:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        ERROR_AUTH 1326 at Sun Aug 21 17:49:37 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
                     AV Engine: Symantec AV
                     Server Name: cava.thulin.local
                     Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)

Description: ERROR_AUTH 1326 occurs when the cava user’s password has expired in AD.

Resolution: Change the cava account password and have a domain admin set it to never expire.

 

ERROR_AUTH 1331:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        ERROR_AUTH 1331 at Sun Aug 21 17:09:45 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
                     AV Engine: Symantec AV
                     Server Name: cava.thulin.local
                     Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)

Description: ERROR_AUTH 1331 is when the cava account object is disabled or logon hours have been put in place to deny logon.

Resolution: Have a domain admin enable the cava account object in AD and confirm that the cava account can logon at all hours of the day.

 

ERROR_AUTH 1909:

server_2 :
10 threads started.
1 Checker IP Address(es):
192.168.1.101        ERROR_AUTH 1909 at Sun Aug 21 17:57:17 2011 (GMT-00:00)
                     MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
                     AV Engine: Symantec AV
                     Server Name: cava.thulin.local
                     Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)

Description: ERROR_AUTH 1909 occurs when the cava user account has been locked out due to too many invalid logon attempts.

Resolution: Have an AD admin reset the lockout status on the cava network user.

 

This should cover most of the common errors you will find when cava is running. You may have to check the server logs on cava to see them in the event that cava is turned off. If you have experienced a problem and my resolution does not fix it, please let me know and also open a case with EMC Celerra support.
 
On a side note, I want to also recognize Daniel Morris for his blog posts on CAVA. I urge you to read the following links to get a good understanding as well.

http://blog.planetchopstick.com/2010/10/18/what-is-emc-cava-celerra-anti-virus-agent/

http://blog.planetchopstick.com/2011/05/03/cava-considerations-and-basic-setup/

http://blog.planetchopstick.com/2011/05/05/cava-troubleshooting/

Understanding the EMC VNX/Celerra AntiVirus Agent (CAVA): Part 1 – server_viruschk

CAVA is one of the few parts of the Celerra/VNX that cannot be configured and monitored from the GUI.  Most, if not all, of the information you need about cava can be found in the command line.  Over the course of a few posts, I will start with a fully working cava setup, and then work backwards to break it so you can see common implementation problems and possible performance bottlenecks.  In this first post of the series, I will go line by line through the output of server_viruschk so that you can understand just what the output is saying.  For reference, this is the output I will be working with:
[nasadmin@UberCS ~]$ server_viruschk server_2
server_2 :
 10 threads started.
 1 Checker IP Address(es): 192.168.1.101     ONLINE at Thu May 26 19:41:13 2011 (GMT-00:00)
                        MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
                       AV Engine: Symantec AV
                       Server Name: cava.thulin.local
                        Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)

 1 File Mask(s):
 *.*
 5 Excluded File(s):
 ~$* >>>>>>>> *.PST *.TXT *.TMP
 Share \\UBERCIFS\CHECK$.
 RPC request timeout=25000 milliseconds.
 RPC retry timeout=5000 milliseconds.
 High water mark=200.
 Low water mark=50.
 Scan all virus checkers every 10 seconds.
 When all virus checkers are offline:
 Shutdown Virus Checking.
 Scan on read disable.
 Panic handler registered for 65 chunks.
 MS-RPC User: UBERCIFS$
 MS-RPC ClientName: ubercifs.THULIN.LOCAL

 

I will now go line by line starting with the first one.
  1. 10 threads started.
    • This is the number of threads for cava.  Each thread represents a file that can actively be scanned.  Cava will process up to 10 files at once to distribute across your available cava servers.  Any additional files will be put into a holding queue until cava can get to them.  This limit here is so that we don’t overwhelm the av software running on each cava server.  This limit is adjustable by the support lab if it is determined that this will solve a performance issue.
  2. 1 Checker IP Address(es):
    • This line tells you have many cava servers you have defined in your viruschecker.conf file.  In this example, I only have 1 server defined, but you should be running at least 2 servers at a minimum.
  3. 192.168.1.101                                  ONLINE at Thu May 26 19:41:13 2011 (GMT-00:00)
    • This line tells you the IP address of your cava server as well as the status and the last time we checked it.  If that line says anything other than ONLINE, there is a problem with the connection from the windows server to the celerra and that server will not be used for scanning.  More information on possible errors will be in a later post.
  4. MS-RPC over SMB, CAVA version: 4.8.5.0, ntStatus: SUCCESS
    • This has 3 pieces of useful information.  The first is the connection method we use to send commands to the cava agent.  In this case, we are using the MSRPC protocol.  Older clients may use the ONCRPC protocol, but this is not supported on 64 bit systems.  The next part tells you the version of cava you are running.  As of writing this, i am using the latest version (VNX Event Enabler 4.8.5).  Like above where we reported the connection from windows back to the celerra, the ntStatus section reports the status of our initial connection to the windows server.
  5. AV Engine: Symantec AV
    • This tells you the AV software we detected to use for CAVA.  This can be helpful if you have more than AV engine installed on the client.  In my case, I am using Symantec Endpoint.
  6. Last time signature updated: Tue May 17 05:55:23 2011 (GMT-00:00)
    • This is the last time you updated your AV definitions
  7. 1 File Mask(s):
    • The number of file masks you have set to scan for.  In this case, it’s just 1 mask.
  8. *.*
    • This is the file masks you have in place.  Any files that match the entries here will be processed for scanning.  In this case i have *.* (everything with a . in it), but you can cut down a lot of traffic if your only scanning for certain file types.
  9. 5 Excluded File(s):
    • This is how many file exclusion filters you have in place.  In this case i have 5.
  10. ~$* >>>>>>>> *.PST *.TXT *.TMP
    • These are the file filters i have in place.  There are a number of files that AV software just can’t scan (like database files).  I also have in place ~$* and >>>>>>>> to ignore Microsoft Office temporary files as they can become locked temporarily while being scanned and cause a loss of data in the office application.
  11. Share \\UBERCIFS\CHECK$.
    • This is the beginning of the UNC path that will be sent for file scan requests.  This is determined from the CIFSserver line in the viruschecker.conf and will change depending of if you defined it with the ip, netbios name, or FQDN.  The check$ folder is a hidden folder created just for CAVA.  The only account that can access this is the one granted the virus checking privilege.
  12. RPC request timeout=25000 milliseconds.
    • This is the amount of time we will wait for a file to be scanned before trying again.
  13. RPC retry timeout=5000 milliseconds.
    • This is the amount of time we wait for an acknowledgement of each RPC command.
  14. High water mark=200.
    • I spoke before about how we process 10 files at a time, and that addition files are put into a queue.  The high watermark is when we allocate additional resources to cava to process through AV files faster.  Hitting this high limit can cause a performance impact to your cifs servers, so try not to let the queue get this bad.  In my case, i have set the limit to the default of 200.
  15. Low water mark=50.
    • Just like the high watermark, this is a lower limit that starts to indicate that files are queuing up too fast.  This won’t cause a performance problem, but is an indicator of a possible problem to come.
  16. Scan all virus checkers every 10 seconds.
    • Every 10 seconds we will check the status of each cava server to make sure it’s still online and ready to take requests.
  17. When all virus checkers are offline:
    Shutdown Virus Checking.

    • This is the action we will take when all the cava servers are not marked as ONLINE.  This will shutdown cava so that files don’t continue to be queued and hit a high watermark.  The other options is to do nothing (a setting of ‘no’) or to shutdown cifs (what i like to call paranoia mode).
  18. Scan on read disable.
    • This means that scan on read is not enabled and that we are only processing scan on write.  If scan on read was enabled, the cutoff date and time would be listed in this place.
  19. Panic handler registered for 65 chunks.
    • This is mostly just for debug information and how many internal failures cava would survive before causing a panic.  Every process on the celerra has a panic handler and this information is of no use to basic cava troubleshooting.
  20. MS-RPC User: UBERCIFS$
    • Earlier i talked about how we use the MS-RPC protocol to connect to the cava agent servers.  This is the username we will use for the SMB connection.  In this case, we are using the compname of the cifs server for cava.
  21. MS-RPC ClientName: ubercifs.THULIN.LOCAL
    • This is the FQDN of the cifs server we are using for cava which is used as part of the MS-RPC process.
This concludes my line by line explanation of the cava output.  I hope you understand the output of cava a bit better.  In future posts on cava  Iwill talk about some of the different information you might see when there is an error as well as the output of the -audit option.  Please feel free to ask questions in the comment section below.