Notes on the Ask SGI Session - 3rd December 2013 Q: Since upgrading to SLES11 SP2, ISSP 3.0 and other associated SGI packages, we see that xfsdump's "dumping directories" phase is running much slower, 5x or more. This is especially noticeable when backing up IS220-resident filesystems. Backups which used to take a few hours can now take more than a day. Is this a known problem, and is there a fix for it? A: There is a small slowdown in bulkstat processing at SP2, but not of this size. And it should not target the "dumping directories" phase of xfsdump. Please send us a metadump so we can work on the problem. Q: DMF is supported on RHEL as well as SLES these days. If a new site has no preference, which do you recommend and why? A: SGI has no preference. But sites should note that SLES has a 6 month support overlap between releases, which RHEL doesn't have. Q: We were pleased to see that some of the items from last year's wish-list have been implemented, at least in part. Do you have any plans to work on other items in that list? A: ISSP 3.4 and 3.3 will provide more visibility and control over the DMF request queue. There will also be tools that help the site understand the workflow through the system. There are some incremental features that help in ISSP 3.0 and 3.2. In ISSP 3.0, DMF 6.0 provides the ability to raise recall request priority. ISSP 3.2 will provide ability to cancel recall requests, and raise/lower recall priority within a VG. Q: I can see the feature "PERFTRACE_METRICS" in the dmf.conf file and I can think it is a good feature to have. Are there any *penalties* incurred by turning it on? A: No. Q: Are there any plans to allow recalls from tape to have a priority which allows them to interrupt migrations to a different tape? We have a small number of drives. A: You may be able to take advantage of the MAX_PUT_CHILDREN Drive Group parameter once you've upgraded to 3.0 (DMF 6.0). Q: Background: We've recently had 5 files where the primary copy on T10000C tapes was corrupt and the second copy on LTO5 tapes was valid. Files are gzip so "gunzip --test" soon validates the files. I haven't logged a case because we just don't have the log files going back over 17 months to determine what happened and all of the DMF hardware and software has since been upgraded. However it set me thinking about checksums and trying to historically identify when a copy of a file was changed. Q 1. Could we have a high level overview of how DMF validates copies of files as they are written (dmput) or merged or dmmove. Does it do checksums on the chunks, the file? Perhaps just a refresher in the discussion I think we had last DMFUG about managing silent data corruption and some of the techniques being developed and adopted within DMF. Q 2. Future enhancement. Could you store a checksum/md5 (appropriate value) in the dmcatadm/LS database for every data copy related to a bfid. That would serve several purposes, dmaudit verifymsp could use it for a simple check and flag any records that had inconsistent checksums in the copies for each bfid, highlighting a fault fairly quickly. A: Difficult to imagine a scenario where the data on one copy only would be corrupted but still have a valid checksum. Perhaps it was introduced during a merge or dmmove? Would require access to system log files, which are no longer available unfortunately. Q: The Copan MAID PSM GUI shows useful information. Is there a plan to provide a CLI interface, so scripts can be written to make use of this information? A: No. But the successor product to the Copan 400 is simpler. Q: Do you plan to provide DMF clients for Windows, to allow users to perform dmget/dmput equivalents through the Windows GUI? This is required for success in the commercial market. A: We can't provide the full functionality. It would be good to know the minimum requirements. Errata? The above is from the notes made at the time. Please let me know what I missed or got wrong. Peter Edwards