1. Introduction
Fuzzy systems have been applied in a variety of problems with great success. One key factor is that the fuzzy rules database can be easily designed, in order to emulate a human rational decision making process just as experts usually do while facing hard jobs. Thus, in essence any process that requires human judgment can be translated into simple rules in a fuzzy system, provided that variables can be used in fuzzy sets or linguistic terms. One example is the prediction of a customer’s default delay in payment, which seems to be a very simplistic and intuitive process and can be indeed modeled into a set of rules based on an expert’s knowledge. In addition, the facility of implementing a fuzzy system can speed up the analysis of huge customer databases, since the usual manual process of analyzing each customer can be automated by a system which in theory has the same ability to infer as the human mind does.
This chapter shows the whole design of the fuzzy system to predict the customers’ default rate in small and medium-sized businesses, and how this information can be used to provide a better cash flow estimate. The chapter is structured as follows: in section 2 we present the current economy scenario and why the default is a big problem; in section 3 we analyze some tools that are used to mitigate the risks; in section 4 we explain in details how the fuzzy approach can be exploited in this case; in section 5 we show the design of the fuzzy system; in section 6 we show some results from the simulations of this system; and finally in section 7 we discuss the results and conclude this chapter.
2. The default in the microeconomics
The default in the retail sector is a concerning problem in the modern world [1]. According to the formal definition, the default is a broader term. Technically it means any failure of some entity, natural or legal, to meet its legal obligations by not paying invoices of loan, services, bonds or wholesales [2]. The term default also applies to the failure of a government to repay its national debt; in that case it is national or sovereign default. In the case of customer default, the concerns are on rent, mortgage, consumer credits, utility payments or funding. While in the first case, the debt is related to the macroeconomic scenario, that means financial crisis over a whole country or continent, the latter is more related to customer’s profiles and microeconomics. That led to the development of risk and credit analysis [3].
The default has a strong effect in developing companies, as well as in small and medium sized business. Mortgage and interest rates could be strongly affected by the customer default rate. Since the whole economy is tightly linked, the default represents a break in this chain, leading, in large scale, to a national level crisis.
When the default happens, it starts a shortcoming in a company’s finance, and that means a loss for the provider. Some tools such as financial protection insurance and risk scores may remedy to a certain point, but in most cases they are insufficient to recover from the main problem [4]. However, if one could forecast or predict how many customers would delay payment or how much would be default, the companies would have the chance to prepare itself against a possible low cash flow.
Analysing per sector, it is known that service based industries are usually more affected by defaults.
2.1. Reasons for default
In order to better understand this problem, we should take into account the reasons for consumer default. The recent economic growing and integration speeded up the development of many enterprises, and much of this has been accomplished by the mechanisms of credit and financial leasing [6]. The credit offers expanded to small businesses and so have been to ordinary workers, thus increasing the economic activity [7]. In developing countries such as Brazil, many families from lower and middle classes turned out to actively participate in the economy as voracious consumers [4]. Consequently the debt of Brazilian families rose from 15% in 1992 to over 40% in 2012 [5]. Moreover, by analysing the reasons for the debt, it can be easily perceived that this index is not likely to be lowered, but limiting credit seems also not to be a good option [7]. However, the consumer default is correlated to some behaviours that can be detected in risk analysis systems [8]. Therefore by understanding the reasons for default, this problem can be more controllable.
Upon a report issued by Central Bank of Brazil [9], the causes for consumer default vary from bad financial habits (compulsivity, expenses greater than revenue) to financial problems (unemployment, little wages, crises, default from their clients).
Risk Analysis tools take into account as much information as possible from clients in order to evaluate the risk score of a given client. It is known that when a customer faces problems, he or she is more likely to overdue the bills or even not to pay at all. Likewise when customers always pay their bills without delay, it is a good sign they are less likely to delay.
2.2. The effects of default in the economy
The customer default prediction remains a concern for many enterprises, owners, investors and business men all over the world. Every default implies that some party is losing money, since a good or a service has been offered for free without any compensation. In large scale this leads to economy shrinking and inflation [10]. For small companies the effects may be even more drastic due to its small budget. Every small business is oriented to observe its cash flow, but in fact when a customer fails to pay its debt to the company, the cash flow accuracy is severely affected. So there is a need to estimate a percentage of default from its clients. The first and main consequence for small business is that it may not be able to meet its obligations, although it has a quite good financial planning. A second consequence is that the billing department will be overloaded since many bills remain unpaid and the company itself may fail to operate.
3. Existing tools for mitigating the risks
The defaults cannot be prevented, but can be forecasted. Whenever a lender wants to issue credits to a borrower, he may perform an analysis on the financial statements of the borrower in order to assess its capability to comply with its debts [11]. However many aspects may be not considered in traditional risk and credit analysis, since many risk analysis are conducted by means of likelihood and probabilities [2]. On the other hand, there is a call for simpler and quicker risk analysis [12, 13].
Although most of these tools address the credit analysis in the form of a loan, the same procedure applies to any customer that is buying goods/services from a supplier [14], especially if it is in the form of leasing or even contracting. Since we are dealing here with the problem of predicting default, our goal is to forecast when a customer will not pay his/debt causing a default. To that end, the methodologies fall particularly on risk and credit analysis, bankruptcy prediction and probability of default.
3.1. Risk and credit analysis
In recent decades, a number of objective, quantitative systems for scoring credits have been developed. The risk of credit is assessed by comparison of accounting ratios of potential borrowers with industry or trends in the financial variables. The banks are provided with many of these ratios, since they are the main credit providers, but that information is not always available to enterprises. Traditional credit risk analyses are implemented in expensive expert systems whose development is very time-consuming. On the other hand simpler forms to grant credit may be achieved by the use of reduced models, such as Balanced Scorecard, Jarrow-Turnbull, among others [15].
Balanced Scorecard, also known as BSC, is actually a management technique aimed at assessing an enterprise’s performance from four perspectives: financial, customer, internal processes, and learning. A balanced score of these indicators makes a system that helps the enterprise to select and focus strategies to achieve goals in the near future. The customer and financial perspective of this analysis composes a good index to evaluate the risk of servicing a given client [12]. But unfortunately this is not always enough to define strategy, and there should be also other performance indicators to determine the risk in a more accurate way.
A reduced form of risk model was published by [16] which is an extension of the Merton model [17] to a random interest rates framework. In this model, risk is modeled as a statistical process. The value of risk is evaluated using a continuous probability of default, estimated in two approaches: Deriving Point in Time (PIT) or Through the Cycle (TTC). The main difference between these approaches regards to internal and external factors. The term PIT applies to probabilities of default that are dependent of general credit conditions or external factors, while TTC applies to probabilities of default that are not subjected to external factors [18].
3.2. Bankruptcy prediction
One benefit of risk analysis is that it allows the prediction of bankruptcy for a given entity. One of the oldest methods for bankruptcy prediction was published in 1968 by Altman. His formula is used to predict the probability of bankruptcy within 2 years by using Z-scores. The Z-score is a linear combination of four or five coefficient-weighted common business ratios.
where:
T_{1} is the Working Capital / Total Assets
T_{2} is the Retained Earnings / Total Assets
T_{3} is the Earnings Before Interest and Taxes / Total Assets
T_{4} is the Market Value of Equity / Book Value of Liabilities
T_{5} is the Sales or Revenue / Total Assets
Z is the score which denotes where an entity will face bankruptcy or not. The Bankruptcy threshold varies on the entity’s activity, but in general it is defined as follows
Altman Z-Score [19] model was found to be 72% accurate in predicting bankruptcy two years before the event with only 6% of false negatives. It is still well accepted by auditors, management accountants and financial directors for load evaluation. However, this model is not recommended for use with financial companies such as banks or factoring, because the balance sheets of companies are usually opaque and the model does not address off-balance sheet items. For prediction of default for financial companies, the Merton Model is used.
Although additional methods for bankruptcy prediction have been developed by taking into account more data, their practicability turned out to be expensive, since it depends on a lot of data to be collected [20].
3.3. Probability of default
Given that many methods of risk and credit analysis, and bankruptcy prediction are based on stochastic models, we are now focusing on the measures for evaluating the probability of default. Most of methods exploit logistic regression functions as well as inversed probability distribution formulas.
The Probability of Default may be used in two ways: to address the causes of default; to predict and prevent new cases of default. Camargos et al [7] performed a survey to find conditioning factors that lead small business to default, as depicted in figure 4.
This survey has been conducted in an important Brazilian Program for encouragement of entrepreneurship among small-sized businesses. The method used to assess the risk of default was the logistic regression:
The equation has only one dependent variable X_{1}, as the variable influencing on defaut. A threshold value of 0.5 is selected to determine whether a case is to be classified as compliant or default. Considering the probability as an input for the binary logistic regression variable Y, and then rearranging the coefficients, we obtain a linear logarithmic model:
where:
P(X) is the probability of default according to the set of variables X
β_{0} is a model bias constant
β_{i} is coefficient for the variable Xi
X_{i} variable taken into account in the model
Other models include bivariate probit model by Jacobson and Roszbach [21], to estimate default probabilities and the effects of default-risk-based acceptance rule changes on a bank’s portfolio. Katchova and Barry [6] used the distance-to-default approach to determine the Value at Risk (VaR). All these models use logistic regression functions on multiple variables. By investigating these models amongst others, Odeh et al [22] applied a conceptual model for predicting default in agricultural loans, assuming the expected loss is expressed as a result of three components.
where
EL is the expected loss in monetary units
PD is the probability of default in percentages
LGD is the percentage of loss from the loan volume suffered by the granting institution
EAD is the loan amount plus accrued fees
Usually the Probability of Default is expressed in terms of N customers, so the equation 5 can be rearranged in the form:
where now
EL^{P} is the expected loss on a specific portfolio
PD_{i} is the probability of default for a specific loan
N is the total of granted loans
Combining the logistic regression (eq. 4) with the conceptual model (eq. 6), we can express the maximum likelihood estimation as in the equation:
where
PD_{i} is the probability of default as stated in the equation 6
B is a vector of coefficients
X is a vector of explanatory variables and ε is a stochastic error
The coefficients may be determined empirically and vary from many aspects taken from the enterprise’s assets. Odeh et al [22] evaluated these methods by using data from Farm Credit System, and found that credit default predictions are really sensitive on data.
3.4. Recent approaches
One of the recent technologies that has evolved and been used are the expert systems. Not only they have been used considerably since the 1980’s in financial institutions for decision making tasks, the prediction of default has also been an issue the experts systems have been used for [23]. In addition, computing intelligence techniques, such as Genetic Algorithms, Fuzzy C-Means, and Mars, have also been exploited [24] due to its capability of learning from an expert. The use of neural networks, neuro-fuzzy and fuzzy logic has also grown in recent decades, because they better handle on imprecise information and there is no pure analytical model of the market [25].
Furthermore, the database containing hundreds of financial operations represent an implicit knowledge that is available for modeling and prediction. By means of data mining [14], many customer behaviors can be analyzed based on past values. Thus, more reliable and developed models can be accomplished by the use of artificial intelligence.
4. A fuzzy approach
Fuzzy Systems have already been used in a variety of problems, not only regarding risk and credit analysis, but also bankruptcy and default prediction. A Fuzzy approach combines an easy design fully based both on an expert’s opinion and on data history. Zirakja and Samizadeh [8] performed a risk analysis in e-commerce (EC) activities in a more broad vision, including the projects’ risk, by relying on experts’ opinions to build a fuzzy decision support system (FDSS). Martin et al [24] implemented a fuzzy system to predict bankruptcy by using expert knowledge applied in fuzzy rules with a classification rate of 88% in a single model. In a hybrid model, by using neuro-fuzzy and genetic algorithm, the classification rate was 73,6% but with more input variables.
Fuzzy logic arises as a good tool to emulate expert rules since they don’t require too much effort for modeling as other traditional methods do. A fuzzy system can emulate rules of type:
where conditions and consequences are fuzzy propositions built by linguistic expressions:
The expressions 1 and 2 define “immediate“ propositions, and the expressions 3 and 4 define combined propositions. Since they operate over fuzzy variables, they need to be defined in linguistic terms or fuzzy sets. Fuzzy sets usually take the form of membership functions.
(9) |
Fuzzy expressions are built using boolean operators such as NOT, OR and AND. These expressions are combined to form relations R. A fuzzy relation is defined against two universes U and V, as U x V being a subset of the Cartesian product of those, so that R: UxV
Therefore, Fuzzy rules can be defined in fuzzy operations as in the equation.
where
R^{(l)} is a Fuzzy rule of index l
x_{i} is an input fuzzy variable of index i
A_{i}^{l} is an input fuzzy set of index i in a rule l
y is an output fuzzy variable
B^{l} is an output fuzzy set in a rule l
which in turn can be represented by membership functions
where
µ_{R}^{(l)}(X) is the membership function of the rule
µ_{A1}^{l}(x_{i}) is the membership function of the input variable of index i on fuzzy set A_{i}^{l}
µ_{B}(y) is the resulting membership function of the output variable y on fuzzy set B in rule l
min is the minimum operator
max is the maximum operator
sup is the supremum operator
4.1. Fuzzy system structure
A Fuzzy System usually has:
Input Variables (with their respective Fuzzy datasets);
Output Variables (the diagnostics values);
Rule Base: determines outputs for each combination of input fuzzy values;
Inference Machine: applies fuzzy operations;
Fuzzy Sets: Linguistic Terms for each Variable;
Crisp Values: Numeric values taken from real world.
Figure 6 shows the structure of a basic model of fuzzy system, consisting of four components: Input Fuzzification, Rule Database, Inference Machine and Defuzzification.
A Fuzzy system can be defined in the following operations:
Input Fuzzyfication: transform the real world crisp values into fuzzy values.
Fuzzy Operation: Applies Fuzzy Operators Min or Max in input Variables according to available rules if they should be inclusive (AND) or exclusive (OR).
Aggregation: These operators can group several found output values provided that several rules may have triggered.
Defuzzification: transform the output found fuzzy values into real world crisp values.
In this chapter we are dealing with the application of a fuzzy system in predicting the default, so the details on these operations are beyond the scope, and for further information the reader is suggested with the references [26, 27].
4.2. Reasons to apply fuzzy logic
Fuzzy systems are relatively simple to create and deploy, and it is fully based on human experts’ evaluation. Fuzzy has been applied in many fields involving decision processes which require some sort of judgment. The human mind abstracts real world variables in an imprecise manner forming semantic networks [28]. These semantic networks define relations that can be expressed with linguistic terms just as experts do. Therefore any activity requiring an expert opinion or judgment can be modeled in fuzzy logic rules without the need of an existing theoretic model to lie upon.
In small and medium-sized companies, the financial/collect department usually takes decisions regarding granting credit or not. Without any supporting tool, the decision is taken purely by an expert’s experience or opinion. The same applies for predicting cash flow based on client’s past financial transactions. Based on a given customer’s history, it can be inferred whether this customer will pay on time or default. This kind of analysis can be performed by an expert, but as a company’s portfolio grows, the task of analyzing becomes more time-consuming and then needs to be automated, and fuzzy systems emerge as a good option to automate this type of analysis [29].
5. Fuzzy system development
By taking into account all the previous information, we designed a system capable of predicting the default rate based on historical records of customers. The methodology used in this design was the same used in the work of [27], which consisted of the following procedure.
According to literature, the default is influenced by many aspects of the customers, but many of them are unknown to the provider, unless they are declared. However, simple models of probability of default can be able to yield good results using statistical measures. So, to make this system more applicable, we took into account only the minimum amount of information a collection or billing department would have regarding customers’ transactions. Thus in this work, we considered the database consisting only of customer invoices in the form of table.
5.1. Fuzzy variables
According to the database depicted in table 2, we defined the following input variables for the fuzzy system.
Average Payment Delay (APD)
Amount Owed (AO)
Maximum Payment Delay (MPD)
Maximum Amount Owed (MAO)
Time as a Client (TC)
Number of Default Cases (NDC)
A formal definition of each variable is outlined in the following equations:
where
PDij is the Payment Date of the Invoice j of the Client i
DDij is the Due Date of the Invoice j of the Client i
N is the number of issued Invoices
t is the current Date
For specific purposes of this work, a default is considered to be when an invoice is not paid before the due date.
Upon consultation with experts in the collect department, we found the following terms for each of the input variables.
The output variables are the values we want to predict, namely when and how much is customer going to pay. That can be express in two ways: Expected Amount/Date of receipt; Probability of Receiving a certain amount within a period of time. Since here we are considering only internal factors, this kind of prediction is through the cycle (TTC). According to Basel II Parameters [10], the simplest approach to estimate the probability of default is logistic regression, taking historical database as a basis for estimation.
Thus, given a date, the probability distribution of payment can be expressed by the following equation:
where
PDR is the Probability of Receipt or Payment
EDR is the Expected Date of Receipt (in days)
EAR is the Expected Amount to Receive (in monetary units)
NDC is the Number of Default Cases
A, B, C and D are coefficients
Upon experiments and linear regression we found the coefficients to be.
It can be seen that there is a relation between the next payment date and the probability. So the output variables were chosen to be the next payment and expected amount to be paid.
Output Variables | Linguistic terms | |
EAR | Expected Amount to Receive | None, Little, Enough, Integral |
EDR | Expected Date of Receipt | Near, Reasonably Near, Far, Never |
However, from these outputs the probability of receiving over time t can also be derived, according to the equation.
where
PP(t) is the probability of payment over time t
PDR is the probability distribution function
EDR is the expected Date of Receipt
E is the remaining part of the probability distribution function PDR, independent of EDR
Thus, we can state the variables PPW and PPM with parameter values for t of 7 and 30, respectively. The probabilities can also be defined in fuzzy sets.
Output Variables | Linguistic terms | |
PPW | Probability of Payment in a Week | Null, Very Low, Low, Medium, High, Very High |
PPM | Probability of Payment in a Month | Null, Very Low, Low, Medium, High, Very High |
Likewise, the expected date of payment can be derived from the quantile equation, which is the inverted probability density function.
where ER is the Expected date of payment resulted from the probability distribution PP(t).
5.2. Fuzzy set limits
We defined the fuzzy set limits upon querying against a huge database containing over 5 years of financial records, in such way that each set should have the same number of clients belonging to it. To that end, we had to rearrange the database to group the results per client.
We defined the Gaussian function as a membership function for each set, on input and output. After querying the dataset, we defined the sets’ limits as can be show in the table and figures.
where c is the center of the function, and σ is the variance. Then, we defined as the set’s limits c±σ.
5.3. Fuzzy rules
As performed in the work of [27], we have built the fuzzy rules upon querying the database shown in table 6 for each combination of the input sets. That would give 729 rules. But before querying a database, we cut some combinations that would never happen in practice or could be intuitively disposed. Some examples are the following rules:
If APD is long and MPD is short and...
If AO is high and MAO is short and...
If TC is new and NDC is high and...
By cutting infeasible rules, the rules database has been reduced to 288 rules. The outputs for each rule, both for expected date and amount of receipt, have been determined upon querying the history database. Nevertheless, some situations never happened, so we had to decide the output for these rules by asking the experts. That procedure trimmed down the rule database to only 53 rules
For a given rule, we found an average difference between the due date and the payment date.
where
APDi is the average payment date for the client i;
STDi is the standard deviation for the difference between due dates and payment dates of the client i.
After querying the database for any given rule, we built out a histogram of each fuzzy output variable corresponding to that rule. Table 8 shows a histogram found for the following rule:
“if APD is Middle and AO is Low and MPD is Long and MAO is Low and TC is Old Known and NDC is Low”
None/Near | Little/Reasonably Near | Enough/Far | Integral/Never | |
Expected Date of Receipt | 3 | 2 | 1 | 0 |
Expected Amount of Receipt | 0 | 0 | 1 | 5 |
The output set was chosen as the one that the rule result better fits into, which is Near for Expected Date of Receipt and Integral for Expected Amount of Receipt.
5.4. Database preparation
In order to have a separation between the rules database development and the validation, we defined distinct periods for querying and for validation. The database in the form shown in table 6 has been cut into these two periods, 2 years and a half each, forming a new database grouped by period. This database is shown in table 9.
The validation period was replicated multiple times in order to perform a continuous validation from first until the last date of the period. For each date, a snapshot of the database of table 6 was taken in order to simulate the Fuzzy Prediction.
5.5. Further fuzzy settings
The fuzzy system was implemented using Mamdani [26] as the inference machine, because of its simplicity in processing the rules and values and ease to be implemented in this case. Then it was deployed on an important Brazilian Financial Accounting System whose aim was to infer how much of the accounts receivable could be received within a week or a month, and what would be the default rate. The fuzzy system was set up as follows.
6. Results and simulation
After defining and validating the rules database, we performed a simulation of prediction the default rate in a period of 2 years and a half. Since we have a probability as an output, we had to apply the Monte Carlo method to generate random numbers and get real results from the simulations and confront them against the real values [30].
6.1. Simulation procedure
We used the database shown in table 10 to perform simulations on any record for every invoice which was supposed to be paid. The fuzzy system would give an expected date and amount to be received. So we applied for a given record some calculations using equations 19, 21 and 22 to get probabilities of payment within a day, a week and a month. With the Monte Carlo method, we have gotten a number of random values to be applied in the probability distribution as shown in the equations 21 and 22. If that random number would be greater less than the corresponding probability value, calculated in the equations 21 and 22, it means the debt has been paid.
The algorithm for the simulation was defined as follows.
Then, we performed simulations to predict:
6.2. Prediction of the default rate
The default rate is assumed to be the percentage of invoices that are delayed or paid after the due date:
where DR(t) is the default rate at the date t.
Then we applied the Fuzzy system to give an expected percentage of invoices that were about to be paid after due date, and compared to what happened in fact. We repeated the experiments 100 times, in order to have more accurate values. The results are outlined in table 12.
The following plots show how the default rate is predicted from the Fuzzy system over time. One data series is the actual default rate per month, and the other is the average prediction after simulating a 100 times.
By applying this procedure, one can infer the revenue over a period, using the expected amount to be paid as the output value in addition to the expected date of payment. The expected revenue is then set to be:
where
PERi is the Predicted revenue for the period i
PEPRi is the Predicted revenue from past periods of period i
DRi is the default rate for period i
ERi is the original expected revenue for period i
6.3. Forecasting revenue
We processed the results from the default rate prediction and built several snapshots out of the simulations to forecast the revenue over each period by taking into account recent real records up to the current period.
The results are outlined in the plots shown in figure 12.
6.4. Long term simulations
To calculate how much the enterprise expects to receive in the long term, we performed random simulations on the probabilities given by the fuzzy system for the whole period. By checking the history for each client until the present moment in the simulation, and an estimative of when this client will pay is obtained. For validation we compared the predicted default rate against the real default rate that happened in the period. The system was validated with a 12 month period simulation using past values for 100 times. This strategy was able to give a prediction of the default rate with an 80% accuracy.
As can be seen, the fuzzy system has learned the expert’s knowledge, therefore acting as a process expert and releasing them from the task of analysing, judging, and change the chosen values, then becoming able to do other activities.
7. Discussions and conclusion
The results produced by this initiative show how the default issue can be addressed by the use of Fuzzy Systems. The default in the economy is a serious problem, and although this problem cannot be solved easily, the facility to predict it can prevent bad clients to buy services for which it is not able to pay. Moreover, the Fuzzy System can be used to infer and forecast a more accurate cash flow, instead of traditional approaches.
One important advantage of this fuzzy system to forecast defaults is that it needed just a little piece of information to predict when a given customer would default a payment and under which probability. The simulation using quantitative techniques such as Monte Carlo method turned out a good estimation because of the stochastic nature of this process. Many models of the probability of default rely on statistical methods to infer the probabilities. This is an interesting option when there is little data available on the customers to forecast default or bankruptcy by taking into account TTC probabilities.
The system has been applied in an Accounting System having aided financial analysts with predictions on cash flow and liquidity. One drawback of this system though is the lack of good predictions on new clients’ transactions, but even in these cases the predictions are within the margin established by the fuzzy sets. However these results can be improved by performing risk and credit analysis or taking into account more information from the clients in the fuzzy system.