How to Optimize the Axigen Search Indexing

In this article we provide the steps for improving the search performance for large mailboxes.

Solution

  1. Check the current indexing configuration — these settings will be updated in step 2

    // shows the current indexing parameters (globally)

    <#> config server
    <server#> show searchIndexThreshold
    SearchIndexThreshold = 2000 (auto-update = no)

    SearchIndexThreshold = 2000 refers to number of messages that would trigger the Search Index update
    auto-update = no shows that the auto update of the search index service is disabled

    <server#> show searchToxicChars
    SearchToxicChars = "\00-\-/:-@ET[-\7F"
    +OK: command successful

    SearchToxicChars refers to the characters that are not included in the search indices and, if someone would search for them, he would start searching in every email. One of the reasons why these chars are excluded is to minimize the search index size.

    Please note that the index is capitalized by default so, if an user is searching for a word with capital letters the result will be the same with the one performed with lower ones.

    This parameter contains characters or ASCII hexadecimal codes (used for example for non-printable characters). For a detailed view of the toxic chars you can check http://www.asciitable.com/ where for example 00 = null char

    // shows the index statistics (globally)

    <server#> config processing
    <server-processing#> show statistics

    A sample output:

             <server-processing#> show statistics
             Processing statistics:
             Parameter Name                    instantaneous  1_min_avg 5_min_avg
             15_min_avg
             ================================================================================
             == Server ==
             Queue size                                 2.00 0.45         0.44       0.36
             Asynchronous jobs                         10.00 10.61        10.55       9.96
    
             == Mail Search ==
             Running mail-search jobs                   1.00 0.23         0.85       0.38
             Pending mail-search jobs                   0.00 0.00         0.09       0.03
             Mail-search throughput(jobs/s)             0.00 0.33         0.73       0.45
             Mail-search throughput(msg/s)           2662.00 272.21
             837.01     447.37
             Mail-search indexed hits (msg/s)           0.00 0.00         0.13       0.04
             Mail-search indexed misses (msg/s)         0.00 0.00         72.44      24.19
             Mail-search indexed errors (msg/s)         0.00 0.00         0.00       0.00
             Mail-search iterative hits (msg/s)         3.00 0.73         0.26       0.08
             Mail-search iterative misses (msg/s)    2659.00 164.63       80.18      26.72
             Mail-search iterative errors (msg/s)       0.00 0.00         0.00       0.00
    
             == Indexing ==
             Running sort-indexing jobs                 2.00 2.50         2.45       2.32
             Pending sort-indexing jobs                 1.00 1.66         1.59       1.55
             Sort-indexing throughput(jobs/s)          22.00 27.56        27.31      26.31
             Sort-indexing throughput(msg/s)            0.00 11.98        6.84       4.43
             Running search-indexing jobs               0.00 0.00         0.00       0.00
             Pending search-indexing jobs               0.00 0.00         0.00       0.00
             Search-indexing throughput(jobs/s)         0.00 0.00         0.00       0.00
             Search-indexing throughput(msg/s)          0.00 0.00         0.00       0.00
    
             == Email reminders ==
             Running email-reminders jobs               1.00 2.91         2.73       2.57
             Pending email-reminders jobs               0.00 0.00         0.00       0.00
             Email-reminders throughput(jobs/s)         5.00 4.56         4.33       4.03
             Email-reminders throughput(msg/s)          0.00 0.00         0.00       0.00
    
             == Enqueuing ==
             Messages registered for filtering          0.00 0.00         0.00       0.00
             Enqueued messages throughput(msg/s)        1.00 0.38         0.43       0.33
    
             == Filtering ==
             Messages awaiting filtering                3.00 0.81         0.90       0.73
             Filter calls throughput(calls/s)           3.00 1.23         1.55       1.15
    
             == Delivery ==
             Messages registered for delivery           0.00 0.00         0.00       0.00
             Messages awaiting delivery                 3.00 0.70         0.69       0.55
             Delivery throughput(msg/s)                 1.00 0.36         0.43       0.33
             Local delivery throughput(msg/s)           0.50 0.25         0.33       0.24
             Remote delivery throughput(msg/s)          0.50 0.13         0.11       0.09
    
             == Dequeuing ==
             Messages registered for cleanup            0.00 0.00         0.00       0.00
             Messages awaiting cleanup                  1.00 0.30         0.32       0.27
             Dequeued messages throughput(msg/s)        1.00 0.36         0.43       0.33
    
             +OK: command successful
          

    To check the usefulness of the index we added the following statistics displayed in the context of processing (command show statistics):

    Mail-indexed search hits - the number of messages returned by index that caontain the searched string
    Mail-misses indexed search - the number of messages returned by the index that proved to be false clues (did not contain the searched string)
    Mail-indexed search errors - the number of messages returned by the index that could not be validated (storage errors)
    Mail-iterative search hits - the number of messages that contained the searched string obtained by iterative scan of the entire folder (ex: without using an index)
    Mail-search iterative misses - the number of messages that did not contain the string searched but were processed by iterative scan of the entire folder
    Mail-search iterative errors - the number of messages that could not be validated (storage errors) but should have been processed through iterative scan of the entire folder

    NOTE: The statistical parameters in the processing returns an average number of messages per second for various times (1 min, 5 min, etc.). To find out the total number of processed messages in that period you need to multiply the returned value with the number of seconds of the interval (For example: If the statistics for 5 minutes is 0.97, then in the last 5 minutes were 0.97 * 300 = 291 processed messages).

  2. Update the indexing configuration

    Because the storage will be highly stressed for purging the current messages metadata and during of the re-index operation we highly recommend to do this step when you don't have a large amount of clients connecting to the mail server (like during a scheduled maintenance window).

    // updates the indexing parameters to the correct / new values (globally)

    <#> config server
    +OK: command successful
    <server#> set searchIndexThreshold 1
    +OK: command successful
    <server#> set searchIndexAutoUpdate yes
    +OK: command successful
    <server#> set searchToxicChars "\00-\1F\60-\7A\7E-\7F"
    +OK: command successful
    <server#> commit
    committing changes and switching back to previous context.
    +OK: command successful
    <#>

    Because the value of searchToxicChars influences the construction of the index, after this value is changed the index must be rebuilt; this is not automatic, so you need to execute the purge disposablemetadata searchindexes command in the domain context.

    <#> update domain example.com
    +OK: command successful
    <domain#> purge disposablemetadata searchindexes

    +OK: command successful
    <domain#> purge disposableMetadata normalizedMessages

    +OK: command successful
    <domain#>

    OR (recommended)


    <domain#> purge disposableMetadata all

    +OK: command successful
    <domain#>

    Note: The purge command may be executed at user level or Public Folder level.

    The parameter searchIndexAutoUpdate must be set before the first sign in of an account, otherwise it will not be considered until after the account in question has been removed from memory by the memory manager; also, if indices were deleted after login, they will not be rebuilt until after the server restart (or after the memory manager cleans it) therefore we recommend to restart Axigen after you complete these steps.

    NOTE: When everything is finished, perform a full compact on the domain, object, and message storages.

OS: LinuxWindowsFreeBSDMACOpenBSDNetBSDSolaris
Distros: WindowsDEB based distros amd64FreeBSD 7.xWindows x64OpenLDAP 2.4.xRPM based distros x6432bit Windows