Quantcast
Channel: Data Protection Manager - File Protection forum
Viewing all articles
Browse latest Browse all 520

Is anyone successfully protecting Storage Server 2008 R2 with SIS volumes?

$
0
0

We are having major issues with our DPM 2010 server (running on 2008 R2 SP1 with DPM version 3.0.7707.0) ever since we started protecting our new 3-node Dell NX3000 cluster running Storage Server 2008 R2 SP1 with SIS enabled volumes.  Our DPM server started freezing within 8 hours of a power cycle shortly after creating new PG's for the new cluster.  This turned out to be a deadlock issue with SIS on the DPM side which required installing a hotfix. 


Unfortunately, after installing the unreleased hotfix our DPM server has still had a multitude of problems:  inconsistent volumes; missing volumes after disk management tasks (rescan disks, create new slice, add new volumes); synchronizations failing; backups to tape failing after a very small amount of data written; etc. 


We have three DPM servers protecting a variety of LAN and WAN servers.  Only one of these DPM servers protects SIS enabled shares on a storage server cluster and it is the only server having issues.  We have also replicated some of our problems on a separate test environment. The DPM server with the issues was rock solid for over a year protecting non-SIS data sources. 

The DPM server that is having the problems:

  • Dell R510 w/ 48GB RAM, PERC H700 and H800 controllers
  • Qty 48 - 2TB disks (12 internal, 36 external on MD1200's)
  • DPM storage pool consists of eight ~8.5TB volumes
  • TL4000 w/ four LTO-5 drives (off two dual-port 6Gb SAS controllers)


This server used to protect our file server cluster which ran on 2008 SP2 with approximately 27TB of data spread across 6 volumes (8TB, 8TB, 8TB, 8TB, 4TB, 4TB).  Everything worked great and we somehow did not run into the issue of trying to backup "millions of files" to tape.  Disk IO was not a problem and the server could easily stream 4 jobs to tape.  Our backup to tape window was less than a day. 


When we migrated to the new Storage Server cluster, we split up our large volumes into many smaller mounted volumes.  Each department was given their own volume and share.  So far we have 45 mounted volumes with SIS enabled and saving quite a bit of storage space.  Volume size ranges from 128GB to 8TB.  (5x128GB, 6x256GB, 2x384GB, 9x512GB, 2x768GB, 7x1024GB, 3x1536GB, 5x3072GB, 5x4096GB, 1x8192GB)


The NX3000 nodes have 96GB RAM; 10Gb NICs for iSCSI and LAN; and storage is on our EMC SAN.  I loaded each node with 96GB since during our tests with storage server we knew SIS would eat quite a bit of RAM.  We had also hit the file cache bug on our old cluster so better safe than sorry.  So far there have not been any issues on the file cluster side.  The cluster is configured in an active/active/test setup with three cluster resource groups.  Most of our data is split between two resource groups and the "test" resource group has a few non-production shares.  


A run-down of the problems we've hit since adding SIS is as follows:


When DPM creates new disk volumes (either by creating a new PG or expanding replica/RP volumes), volumes may go missing.  The disk partitions are still online, however, the disk label is blank instead of "DPM_vol-<guid>."  A re-scan of the disks may bring the volume back or it may cause others to go offline.  Re-scanning volumes causes the same problems as above.  It is like playing russian roulette.   When the volume comes back after being missing it is marked as inconsistent which requires a check that usually fails.


Synchronizations often fail with:

  • DPM failed to communicate with the protection agent on DPMSERVERNAME because the agent is not responding. (ID 43 Details: Internal error code: 0x8099090E)


Tape jobs fail with:

  •  (ID 2019 Details: An existing connection was forcibly closed by the remote host (0x80072746))
  •  The protection agent on DPMSERVERNAME was temporarily unable to respond because it was in an unexpected state. (ID 60 Details: Internal error code: 0x809909B0)
  •  DPM failed to communicate with the protection agent on DPMSERVERNAME because the agent is not responding. (ID 43 Details: Internal error code: 0x8099090E)


Our configuration may or may not be larger than many here, but I'd love to hear from those of you who are or have tried to protect Storage Server with DPM.  Happy?  Sad?  Same boat as us?

Thanks,

-Wayne




Viewing all articles
Browse latest Browse all 520

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>