DAOSTune best practices

Since every server has a unique distribution of object sizes, it's important for you, the administrator, to understand that distribution. This section discusses best practices of how to use DAOSTune to arrive at an informed decision for setting the DAOS minimum object size.

1. Deciding which databases to scan for objects

Depending on your specific needs, you can use the scanType option that is most suitable for your circumstances.

  • To apply DAOS to all databases, use the -scanType all argument.

    This instructs DAOSTune to scan all databases on the server. It is also the default if scanType is not specified.

  • If you already have DAOS enabled on the server, but would like a suggestion for different potential DAOS minimum object sizes, it's best to use the -scanType daos argument.

    This instructs DAOSTune to scan only those databases which are DAOS enabled. If you scan with this argument, DAOSTune prints a special line in the bucket table denoted with “#” character separators and ending with a “<” character. This is the estimate given the currently set DAOS minimum object size.

  • If you want an estimate for certain templates only, use the -scanType template <template name> argument.

    This instructs DAOSTune to scan only databases with the template specified. For example, if you want mail databases scanned, an argument such as -scanType template StdR14Mail could be useful.

  • If you want an estimate for only specific databases or specific directories, use the -scanType indirect <filename> argument.

    This instructs DAOSTune to scan only the databases listed in the .ind file. For example, suppose there is a local file named daos.ind with the following contents:
    • myDB1.nsf
    • myDB2.nsf
    • mailOld/
    • mailNew/
    • records/businessA.nsf

    You could use this file by specifying -scanType indirect daos.ind, which instructs DAOSTune to scan the three .nsf files and the two mail directories for objects.

2. Running DAOSTune scans

Now that the scanType has been determined, you can consider the minimum size of objects to include in the scan. DAOSTune defaults to including objects of 100,000 bytes (~100Kb) or larger. You should also consider the number of threads to run DAOSTune. For shortest runtime, favor a higher number of threads. For lower resource consumption, favor a lower number of threads. DAOSTune defaults to using 1 thread, but you can specify any number between 1 and 30 (inclusive).

  • It's most efficient to save the results of a scan by using the -run collect <filename> argument.

    With this, DAOSTune only does the scanning and saves a CSV-style file for later analysis, so that you can scan once and analyze many more times without having to rescan.

  • It's best to first run a scan with the following arguments:

    -scanType <myType> -minObjectSize <mySize> -thread <myThreads> -run collect <myDAOSTuneFile>

    Where myType is the argument that you determined best in section 1, mySize is the preferred minimum size of objects to be included in the scan, myThreads is the preferred number of threads to use, and myDAOSTuneFile is the file where the results of the scan will be stored.

    This does not print a table, and the resulting file is not meant to be human-readable. See the following section.

3. Running DAOSTune to display bucket tables

Once DAOSTune was run once with scanning and the results have been collected into a CSV-style file, you can reference that file to display bucket tables of different types. Use the -run analyze <filename> argument to retrieve the CSV-style file to display the bucket table.

Now, you can rerun DAOSTune with the arguments

-run analyze <myDAOSTuneFile> -bucketType <myBucketType> -buckets <myBuckets>

where myDAOSTuneFile is the same file generated in section 2.

You can rerun DAOSTune many times with different bucketType values and different numbers of buckets. The process of displaying the bucket table is much quicker than the scanning process. It's best to experiment with different bucket tables to gain insight into the distribution of objects across the server.