Feature

Germain has the ability to categorize data/facts into groups. This helps identify a “new” problem among millions of problem e.g. new application crash? new error?

Use Cases

One example where this is extremely useful is error detection and categorization. This mechanism allows to group similar errors into a single category which can be pivoted and displayed on a dashboard. For a system administrator investigating errors, this mechanism can help in the following scenarios:

  • I can see how frequent similar errors are occurring - If I have 500 errors to review, its useful to know if 490 are actually the same thing.

  • Significantly reduces the effort in reviewing errors - since I can just review unique categories instead of individual errors. For example, on one of our example environments in a single day Germain collected 86 crashes, only 10 of these were unique errors, and only 6 were new (not seen before).

  • I can concentrate on the errors that are occurring most frequently or affecting the most users.

  • I can easily identify 'new' errors - these are errors where we have't seen anything similar in the past.

Configuration

For generic application built with standard tech (e.g. java, goLang, php, c++, etc), you won’t need to customize this “categorization”, however you will want to customize this whenever you need to identify more complex error that involve the analysis of several data sources. This functionality can be easily customized by writing a Javascript processor.

  • Germain Workspace > Left Menu > Analytics > Categorization

A new categorization definition can be created by using the plus icon, and existing definitions can be edited by clicking on a row in the table:

In the above configuration, the categorization will consider facts from the 'Siebel Component Crash' KPI for categorization (other facts will be ignored). Facts are categorized by passing them through the Javascript script configured, this script performs some analysis of the fact (in this case, analyzes the crash stack trace and error code) to produce two values; an exact match and a fuzzy match. The exact match must match for facts to be considered in the same category, the fuzzy match must match within the configured match threshold (based on a string distance function) to be considered the same category.

Preconfigured Objects

This new error categorization mechanism is currently preconfigured for generic technologies like Angular, Java, .Net, React, http://Salesforce.com , Siebel CRM, etc. Here are a few examples.

  • Java app
    For java application, Germain analyzes the “path” of an exception/failure and finds the new ones vs the ones that have been reoccurring. Most other APMs only tell you that there is another “nullpointerexception” but don’t help you distinguish between “UserService.getCurrent” and “BusinessLogicService.performLogic” making it very hard to deal with as they often are millions.

    Java Exception 1:

    Java Exception 2:

  • Siebel CRM
    For Siebel CRM for instance, Germain now distinguishes the Siebel Object Manager crashes that 1)affect business end-users from the ones that don’t 2) are “new” from the ones that are already “known” and reoccurring.

    Other APMs will tell you that there is “another” crash but won’t tell you whether the crash is new or not and whether it impact business or not. Most customers spend significant amount of time analyzing these crashes, many ignore them claiming they don’t affect business end-users (which may or not be true). Some just restart siebel enterprise hoping these crashes will go away, but they don’t. Meanwhile business keeps on being impacted.

Siebel Component Crash portlet - shows the count of all crashes occuring during one day (287 crashes in total).

Siebel Component Crash Count By Category portlet - shows the crashes pivoted by category. From this portlet we can clearly see that the vast majority of crashes are caused by the same issue (249 occurrences).

New Siebel Component Crash portlet - shows the distribution of 'new' crashes over time.

Unique Crashes portlet - shows the count of categories during one day - this represents the number of unique crashes that were collected during the day (in this example, from 287 crashes, there are only 7 unique categories).

  • This above business impact analysis and identification of a new OM crash comes as a result of Germain analyzing 1 to 3 Siebel files.

    • Siebel Error:

    • Siebel FDR file:

    • Siebel Core file:

  • Web App error
    For web application, built in any javascript (angular, react, etc), Germain analyzes the types of errors and finds the new ones and their business impact.

    Most other APMs only tell you that there is another “Uncaught TypeError” but don’t distinguish between “…reading ‘toString’…” and “…reading 'b'….”

    • Javascript Message/Error 1:

    • Javascript Message/Error 2:

Examples

Siebel Crash categorization

importClass(com.germainsoftware.data.Checksum);
importClass(com.germainsoftware.apm.analytics.categorization.siebel.SiebelCoreCrashParserUtils);

log.info('FactID: {} - Categorizing Mask: {}, Details: [{}]', fact.id, fact.mask, fact.details);

var errorCodesWithoutCoreCrash = {
    'SBL-SMI-00062': true,  // Internal: No more process (multithreaded server) slots available
    'SBL-SVR-09127': true,  // Internal: Fail to initialize the shared memory resource for the process
};

try {
    var supportingFactIdentifiers = fact.details ? fact.details.split('|') : [];
    var coreCrash;
    var knownErrorCode;
    for (var i=0; i<supportingFactIdentifiers.length; i++) {
        var identifier = supportingFactIdentifiers[i].split(';');
        var factClassName = identifier[0];
        var factId = identifier[1];
        var factType = identifier[2];
        log.info('Supporting Fact: FactClass: [{}], FactId: [{}], FactType: [{}], EntCrash?: {}, CoreCrash: {}', 
            factClassName, factId, factType, (factType === 'Siebel:Enterprise Crash'), (factType === 'Siebel:Core Crash'));

        if (factType === 'Siebel:Enterprise Crash') {
            var errorCode = identifier[3];
            if (errorCode && errorCodesWithoutCoreCrash[errorCode]) {
                knownErrorCode = errorCode;
                log.info('FactID: {} - Known Error: FactClass: {}, FactId: {}, Error: {}', fact.id, factClassName, factId, errorCode);
            }
        } else if (factType === 'Siebel:Core Crash') {
            log.info('FactID: {} - Fetching Core Crash', factId);
            coreCrash = queryService.find(factClassName, factId);
            if (!coreCrash) {
                log.info('FactID: {} - Failed to find Core Crash: FactClass: {}, FactId: {}', fact.id, factClassName, factId);
            } else {
                log.info('FactID: {} - Core Crash: FactClass: {}, FactId: {}, Details: [{}]', fact.id, factClassName, factId, coreCrash.details);
            }
        }
    }

    log.info('FactID: {} - Known Error: {}, Core Crash: {}', fact.id, knownErrorCode, (coreCrash ? coreCrash.details : 'undefined'));
    if (knownErrorCode) {
        // If this crash is caused by an error that is known to not generate a Core Crash,
        // perform matching based only on error code, since this is all we have
        log.info('FactID: {} - Categorizing based on Error code', fact.id, knownErrorCode);
        result.exactMatch = Checksum.calculate(knownErrorCode);
        result.fuzzyMatch = '';
    } else if (coreCrash) { 
        // If we have a Core Crash, use that to determine similarity of crashes
        log.info('FactID: {} - Categorizing based on Core Crash', fact.id);
        var uninformativeFunctions = [
            { fileName: '/app/siebel/siebsrvr/lib/libsslcosd.so', method: '+0x7185' },
            { fileName: '/app/siebel/siebsrvr/lib/libsslcosd.so', method: '+0x797e' }
        ];
        var ignore = {};
        uninformativeFunctions.forEach(function(f) { ignore[stackElementId(f, false)] = true; });

        // Remove recursion in the stack trace, and any generic functions that are not distinctive
        var stacktrace = SiebelCoreCrashParserUtils.parse(coreCrash.details);
        stacktrace = SiebelCoreCrashParserUtils.removeRecursion(stacktrace);
        stacktrace = stacktrace.filter(function(s) {
            return s.fileName && s.method
                && s.fileName.indexOf('/app/siebel/') === 0
                && !ignore[stackElementId(s, false)];
        });

        // Categorize stacktraces based on:
        // 1. Exact match of top 5 function calls
        // 2. Fuzzy matching of top 10 function calls (including location)
        var stacktraceId = stacktrace
            .map(function(s) { return stackElementId(s, false); })
            .slice(0, 5)
            .join('|');
        result.exactMatch = Checksum.calculate(stacktraceId);
        result.fuzzyMatch = stacktrace
            .map(function(s) { return stackElementId(s, true); })
            .slice(0, 10)
            .join('|');
    }

    if (!result.exactMatch) {
        log.info('FactID: {} - Unable to categorize due to lack of information', fact.id);
    }
} catch (error) {
    log.error('Error during categorization: {}', error.message);
}

function stackElementId(element, includeLocation) {
    if (includeLocation) {
        return element.fileName + ';' + element.method + ';' + element.location;
    }
    return element.fileName + ';' + element.method;
}
CODE

Service: Analytics

Feature Availability: 2022.1 or later