Interpreting bucket table output

This topic provides an example to help you understand DAOSTune's bucket table output.

The following output was created using the following arguments:

-scanType all -minObjectSize 100000 -numThreads 30 -bucketType equalCount -buckets 20 -bytePrint pretty -logLevel 2.

Buckets Table (equal count distribution)
Column names and their interpretations are as follows:
Minimum object size
This is the first column. If you set the server’s DAOS minimum object size to this value, the values in this row give an estimate about the size and savings of the server’s databases and of DAOS.
The -bucketType equalSize argument organizes this column into equal intervals.
Total (size)
This column denotes the total disk space used. The total is equal to the sum of the next two columns, the database size plus the DAOS size.
DB (size)
This column denotes the total size of all objects in databases if the row’s minimum object size is used. The DB size accounts for only those objects stored in databases and not those stored in DAOS.
DAOS (size)
This column denotes the total size of all objects in DAOS if the row’s minimum object size is used. The DAOS size accounts for only those objects stored in DAOS, and not those stored in databases.
Saved
This column denotes the size savings if the row’s minimum object size is used. The savings are calculated by de-duplicating objects. For example, if a server has the same 15 MB object in three separate instances, taking up 45MB of space, the savings would be 30 MB because de-duplication by DAOS would effectively remove two instances of that object.
%
This column denotes the size savings expressed as a percentage of the original full-size estimation. The percentage of savings is based on the total size in the NO DAOS row. The last entry in this column is the maximum possible savings.
.nlo count
This column denotes the count of objects in DAOS if the row’s minimum object size is used.
The -bucketType equalCount argument organizes this column into equal intervals.
disk space % max saved
This column denotes the percentage of savings achieved with respect to the total possible savings. If the Savings column maxes out at a certain threshold, that threshold would be the total possible savings. If a bucket is full of single .nlo files with no de-duplication, that bucket would have 0% savings.
By definition, the final bucket is always 100.00%.
nlo count % max nlo
This column denotes the percentage of .nlo files in DAOS with respect to the total possible count.
By definition, the final bucket is always 100.00%.

The first row, labeled NO DAOS, has only the Total (size) and DB (size) columns listed. This is because if DAOS is not in use, then all the objects will be in local databases. This first row can be thought of as being the control case where no changes are made. In this case, all objects are stored in databases; nothing is in DAOS. If you add up the DB (size), DAOS (size), and Saved columns of any other row, they will all add up to the Total (size) entry of the NO DAOS row.

The row delimited with “#” characters and ending with a “<” character, if present, denotes the estimation based on the server’s current DAOS minimum object size. If the server doesn't already have DAOS enabled, this line will not appear in the chart.

Any rows ending with a “*” character denotes that this row is close to the 80/20 rule. We generally recommend setting the DAOS minimum object size to a value that captures 80% of the total possible savings while storing only 20% of the total possible .nlo files. Heuristically, it would make sense to set your DAOS minimum object size according to the estimate given by the “*” row appearing in the middle of other “*” rows.

Here are two rows from the previous example bucket table with explanations.

Example 1: Row 5
Minimum object size Total (TB) DB (TB) DAOS size (GB) Savings % .nlo count %Savings %Count
931273 1.772 1.182 603.944 734.898 28.8% 85330 80.1% 25.0%*

Because this table was made using the -scanType all argument, it is an estimate of all databases on the server. If you set the server’s DAOS minimum object size to 931273, the total of on-disk space of server databases drops from 2.490 TB without DAOS to 1.772 TB. Databases are storing 1.182 TB worth of objects and DAOS is storing 603.944 GB; this effectively saves 734.898 GB of space accounting for a 28.8% reduction in space used. If this minimum object size is used, this results in 85330 objects being stored in DAOS as .nlo files. Storing all of the server’s objects in DAOS saves the most space, but this estimate saves 80.1% of the total possible savings - therefore, it covers most of the possible savings. In fact, it can achieve this level of savings while storing only 25.0% of the total possible .nlo files. The “*” character at the end of the line means that this row is close to the 80/20 rule.

Example 2: Row 17
Minimum object size Total (TB) DB (TB) DAOS size (GB) Savings % .nlo count %Savings %Count
137430 1.614 0.965 664.790 896.887 35.2% 273056 97.8% 80.0%

This row is found much lower in the bucket table. If you sets the server’s DAOS minimum object size to 137430, not much more savings is gained when compared to Example 1. Notice that the .nlo count more than triples, but you get only around 162GB more savings with Example 2. Going from not using DAOS to the 931273 minimum object size in Example 1, the total disk space is reduced from 2.490 TB to 1.772 TB, but using the minimum object size of 137430 from Example 2 only reduces it to 1.614 TB, which is not much better considering so many more .nlo files are needed.