Notes on the Ask SGI Session - 3rd December 2013

Q:  Since upgrading to SLES11 SP2, ISSP 3.0 and other associated
    SGI packages, we see that xfsdump's "dumping directories" phase is
    running much slower, 5x or more.  This is especially noticeable
    when backing up IS220-resident filesystems.  Backups which used
    to take a few hours can now take more than a day.  Is this a
    known problem, and is there a fix for it?

A:  There is a small slowdown in bulkstat processing at SP2, but not of
    this size.  And it should not target the "dumping directories"
    phase of xfsdump.  Please send us a metadump so we can work on
    the problem.


Q:  DMF is supported on RHEL as well as SLES these days.  If a new
    site has no preference, which do you recommend and why?

A:  SGI has no preference.  But sites should note that SLES has a 6
    month support overlap between releases, which RHEL doesn't have.


Q:  We were pleased to see that some of the items from last year's
    wish-list have been implemented, at least in part.  Do you have
    any plans to work on other items in that list?

A:  ISSP 3.4 and 3.3 will provide more visibility and control over
    the DMF request queue.  There will also be tools that help the
    site understand the workflow through the system.

    There are some incremental features that help in ISSP 3.0 and 3.2.
    In ISSP 3.0, DMF 6.0 provides the ability to raise recall request
    priority.  ISSP 3.2 will provide ability to cancel recall requests,
    and raise/lower recall priority within a VG.


Q:  I can see the feature "PERFTRACE_METRICS" in the dmf.conf file
    and I can think it is a good feature to have.  Are there any
    *penalties* incurred by turning it on?

A:  No.


Q:  Are there any plans to allow recalls from tape to have a priority
    which allows them to interrupt migrations to a different tape?
    We have a small number of drives.

A:  You may be able to take advantage of the MAX_PUT_CHILDREN Drive
    Group parameter once you've upgraded to 3.0 (DMF 6.0).


Q:  Background: We've recently had 5 files where the primary copy on
    T10000C tapes was corrupt  and the second copy on LTO5 tapes was
    valid.  Files are gzip so "gunzip --test" soon validates the files.
    I haven't logged a case because we just don't have the log files
    going back over 17 months to determine what happened and all of
    the DMF hardware and software has since been upgraded.  However it
    set me thinking about checksums and trying to historically identify
    when a copy of a file was changed.

    Q 1.  Could we have a high level overview of how DMF validates
    copies of files as they are written (dmput) or merged or dmmove.
    Does it do checksums on the chunks, the file?  Perhaps just
    a refresher in the discussion I think we had last DMFUG about
    managing silent data corruption and some of the techniques being
    developed and adopted within DMF.

    Q 2.  Future enhancement.  Could you store a checksum/md5
    (appropriate value) in the dmcatadm/LS database for every data
    copy related to a bfid.  That would serve several purposes,
    dmaudit verifymsp could use it for a simple check and flag any
    records that had inconsistent checksums in the copies for each
    bfid, highlighting a fault fairly quickly.

A:  Difficult to imagine a scenario where the data on one copy
    only would be corrupted but still have a valid checksum.
    Perhaps it was introduced during a merge or dmmove?
    Would require access to system log files, which are no longer
    available unfortunately.


Q:  The Copan MAID PSM GUI shows useful information.  Is there a
    plan to provide a CLI interface, so scripts can be written to
    make use of this information?

A:  No.  But the successor product to the Copan 400 is simpler.


Q:  Do you plan to provide DMF clients for Windows, to allow users
    to perform dmget/dmput equivalents through the Windows GUI?
    This is required for success in the commercial market.

A:  We can't provide the full functionality.  It would be good to
    know the minimum requirements.


Errata?

    The above is from the notes made at the time.  Please let me know
    what I missed or got wrong.
							Peter Edwards