Auditing Section Research Summaries Space

A Database of Auditing Research - Building Bridges with Practice

This is a public Custom Hive  public

research summary

    Financial Statement Fraud Detection: An Analysis of...
    research summary posted October 22, 2013 by Jennifer M Mueller-Phillips, tagged 06.0 Risk and Risk Management, Including Fraud Risk, 06.01 Fraud Risk Assessment 
    Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms
    Practical Implications:

    The results of this study are useful to practitioners as guidance when selecting and implementing the appropriate classification algorithms for constructing fraud detection models.  Improved fraud detection models will assist auditors in client selection, planning their audit, and performing analytical procedures.  Additionally, the SEC can use the results of this research to identify and target investigations of companies suspected of fraudulent reporting.

    For more information on this study, please contact John Perols


    Perols, J. 2011 Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms. Auditing: A Journal of Practice and Theory 30 (2): 19-50.

    Analytical auditing, financial statement fraud, fraud detection, fraud predictors, classification algorithms.
    Purpose of the Study:

    In an effort to increase the detection of fraudulent financial reporting recent research has tested the usefulness of various statistical and machine learning algorithms for predicting if fraud is present in a firm.  This study continues that research by evaluating the effectiveness of six commonly used machine learning and statistical models for detecting fraudulent financial reporting using different assumptions of the ratio of fraud firms to non-fraud firms in the sample data, and the costs associated with misclassifying a fraud firm or non-fraud firm.  Specifically, the author seeks to examine the following questions with this study:

    What classification algorithms are the most useful for predicting fraud under different assumptions about (a) the probability that a given firm is a fraud firm, and (b) the cost of misclassifying a firm as a fraud firm or a non-fraud firm?
    What probability of being a fraud firm and what cost of misclassifying a firm as a fraud firm or non-fraud firm should be used when training the classifier algorithms?
    What predictors of fraud are useful for the classification algorithms?  That is, what indicators do the algorithms use to predict if fraud is present in a given firm?

    Design/Method/ Approach:

    Six classification algorithms were selected from Weka, an open source data mining tool.  The six algorithms selected were: J48 (a decision tree learner), SMO (a support vector machine, or SVM), MultilayerPerception (an artificial neural network, or ANN), Logistics, stacking, and bagging.  The probability that a given firm year was fraudulent was manipulated at three levels 0.006 (which is based on past research that indicates that around 0.6 percent of all firm years are fraudulent), 0.003 to represent a low condition, and 0.012 to represent a high condition. Since the cost of misidentifying a fraud firm as a non-fraud firm (false negative) is much greater than the cost of misidentifying a non-fraud firm as a fraud firm (false positive), the author also manipulated cost of misclassification by using different levels of the ratio between the cost of a false positive and the cost of a false negative identification when training the classification algorithms.  The dataset used in the analysis consisted of 51 fraud firms and 15,934 non-fraud firm year observations.  Data were primarily collected from the period between the fourth quarter of 1998 through the end of 2005.  Finally, the author evaluates 42 financial statement fraud indicators that past research has shown to be present in different fraud situations.  The predictors included items such as the accounts receivable to sales ratio, whether a big 4 auditor was used, the firm’s Altman Z score, etc.

    • Overall, logistic regressions as well as SVM algorithms appear to perform best relative to the other algorithms examined in this study when examined under assumption conditions thought to exist in practice.
    • Using logistic regression, 9 of the 42 fraud indicator variables were found to be good predictors: auditor turnover, Big 4 auditor, total discretionary accruals, accounts receivable, meeting or exceeding analyst forecasts, allowance for doubtful accounts, inventory to sales, value of a company’s issued securities to market value, and unexpected employee productivity.
    • Across all of the algorithms, only 6 predictors were used in three or more of the algorithms: Big 4 auditor, auditor turnover, accounts receivable, total discretionary accruals, unexpected employee productivity, and meeting or exceeding analyst forecasts.
    Risk & Risk Management - Including Fraud Risk
    Fraud Risk Assessment