Open access peer-reviewed chapter - ONLINE FIRST

New Attributes Extraction System for Arabic Autograph as Genuine and Forged through a Classification Techniques

By Anwar Yahya Ebrahim and Hoshang Kolivand

Submitted: June 6th 2020Reviewed: February 10th 2021Published: April 9th 2021

DOI: 10.5772/intechopen.96561

Downloaded: 29

Abstract

The authentication of writers, handwritten autograph is widely realized throughout the world, the thorough check of the autograph is important before going to the outcome about the signer. The Arabic autograph has unique characteristics; it includes lines, and overlapping. It will be more difficult to realize higher achievement accuracy. This project attention the above difficulty by achieved selected best characteristics of Arabic autograph authentication, characterized by the number of attributes representing for each autograph. Where the objective is to differentiate if an obtain autograph is genuine, or a forgery. The planned method is based on Discrete Cosine Transform (DCT) to extract feature, then Spars Principal Component Analysis (SPCA) to selection significant attributes for Arabic autograph handwritten recognition to aid the authentication step. Finally, decision tree classifier was achieved for signature authentication. The suggested method DCT with SPCA achieves good outcomes for Arabic autograph dataset when we have verified on various techniques.

Keywords

  • Arabic autograph verification
  • adaptive window positioning
  • (DCT + SPCA) method
  • feature selection
  • classification techniques

1. Introduction

Handwritten autograph plays an important role in modern life as it is routinely used in every sphere of human activity. Couto [1] utilizes a lexical similarity technique for each entity identified. This frequently makes it unattainable to differentiate between a forged signature and a signature created under influence. Chung [2] applied Fuzzy groups to handle uncertainty. Although there are contributing studies in this area, research often failed to take into account the influence of contributing factors such as distractions and singers’ stress which may affect the signatures being signed [3, 4]. It is widely used for authenticating financial and business transactions [5, 6]. There are online and offline authentication systems. In contrast, online signature systems require special hardware such as pressure tablets. These devices extract dynamic information including pressure, signer’s speed, and the static image of signature. Unfortunately, both online and offline signatures can easily be imitated or forged, leading to false representation or fraud [7]. Yang [8] used learned dictionary to check samples. This method has been successfully utilized in image recognition lately. According to Alattas [9], financial institutions are interested to benefit from the reliability and safety of offline signature-recognition systems. Another major reason is that online authentication systems require more complex processing and high-tech gadgets than off-line systems. Offline autographs are usually presented on a piece of paper, which is the norm in documentation. Currently, there is a need for efficient online and offline systems to ascertain the genuineness of personal autographs. Authentication of handwritten autographs usually consists of a series of procedures. These processes are pre-processing (where images are enhanced, binarized, divided into fragments and other related operations), feature extraction (features of the signatures are extracted as raw forms), feature selection or reduction (extracted features are reduced for efficiency), identification and authentication of the signatures against the signature database based on the selected features. A good verification outcome can be performed by likening the strong features of the taster against the autograph of a signer sample utilizing suitable techniques or classifiers [10]. Methods depend on local tests, which concentrate on the analysis of the essential features of different scripts [10, 11, 12]. Some studies utilized evolving curves which do not move away to near by features decreasing the superfluous fragmentation [13]. Based on the available gap in the literature, in this paper, we propose a new process to identify and authenticate Offline-Arabic signatures. This method uses a combination of techniques including adaptive window positioning procedure for autograph attribute extraction and feature selection method for reduced features and selection of important features. In this paper, enhanced Discrete Cosine Transform (DCT) and, Spars Principal Component Analysis (SPCA) method is used to extract attributes. Further, these extracted features are reduced to the best features only. In this research, in order to classify genuine and forged signature two types of classifiers: 1) Decision Tree and 2) Support Vector Machine (SVM) are applied. The classification outcomes of Decision Tree and SVM are compared to choose a better classifier.

Advertisement

2. Proposed scheme

In this part, we introduce an offline Arabic signature identification system based on classification techniques. The procedure consists of four phases: pre-processing, features extracting, selected feature by (DCT+ SPCA) technique, and matching. The complete process begins with acquiring the images of signatures to undergo a pre-processing stage, and then identification and verification process, which are illustrated in Figure 1.

Figure 1.

Proposed methodology.

2.1 Pre-processing

In this step, data are acquired and signature images are pre-processed. For the purpose of this study, Arabic signature is used as the data consisting of 500 true samples and 250 forged samples was used. True samples were obtained from 50 different persons. Every signer was asked to sign 10 times using common types of pens. The 10 signatures collected from each person were used as follows: six of these signatures were selected at random for system learning and the remaining four were used for system testing in addition to “ve forged” samples.

2.1.1 Arabic signature database

This study employed the Arabic signature database created by “Anwar Yahya Ebrahim” as the Arabic signature samples for testing the proposed method. The Arabic signatures are booked on A4 size paper and then scanned at 300 dpi, 256 gray level images. The dataset contains encompasses signatures from persons scanned signatures were collected from the signer Anwar Yahya Ebrahim et al. Each signatory has 10 signatures to predict a response of which, 6 are genuine signatures and 4 are forged signatures. There are enough signatures to ensure sufficient samples for both training and testing. Where 7 of the samples are assigned to the training set, and the rest 3 to the testing set from both classes.

The distribution of the number of genuine and forgery samples for different signatories is illustrated in Figure 2. Arabic Signature images are then pre-processed in order to improve the quality of images. Noises, such as irrelevant data, are removed from the features to develop the achievement of identification. These samples are then converted into binary samples before feature extraction process [14, 15, 16, 17].

Figure 2.

Examples of genuine signatures and their respective forged counterparts found in the Arabic signatures.

2.2 Feature extraction

Adaptive window positioning technique is then applied to separate Arabic autograph images into small segments or sub-images. This makes the process of removing redundant data easy and facilitates the comparison of segmented fragments. A 14x14 segment size is chosen for the images for an optimum output [18]. Further, a group of features (form measures) from the approaches are extracted, which represents the signature image in a feature space. To analysis data accurately, a variety of observations as well as a value of significant individual features are needed to be organized. Such the data can be given and analyzed by machines or humans.

The goal of form representation is to get form measures. These measures are used as classification features in models. Moreover, sub-images are presented from the set of obtained features [19].

The attributes are then normalized using a feature matrix. The normalization process is very important. This is because when attributes are in various ranges, higher ratios may dominate lower values, which may distort the results. Normalization places the attribute ratios within the same scales and ranges to enable comparison. The projection and profile features are normalized by using window height, while the other descriptors are normalized by their maximum possible respective values. After normalization, each feature of the main window is composed to form a vector. This scales and translates each feature individually to a fixed range on the training set, which is a number between zero and one [20].

2.3 Features selection

This study proposes two fusions of features namely, Discrete Cosine Transform⨁ Spars Principal Component Analysis and (DCT⨁SPCA). The former is introduced represent the high pass in vertical, diagonal and horizontal directions, respectively in signature images whereas the latter is proposed to discriminate between genuine and forged of Arabic signatures. The reason to combine DCT and SPCA features is that both are transformed based features so due to homogeneity they are best choice for combining. Fusion combines the useful information from both images. The motivation to combine these both features are numerous similarities found in DCT and SPCA features. This proposed technique uses the high pass signature images to extract the necessary information for the signature verification.

Succeeding the feature selection, the twelve DCT features and the eight SPCA features are extracted. These features are then fused in order to classify signatures into genuine and forged classes. Suppose twelve DCT features are represented by α1,α2,α3,,α12and eight SPCA features are represented by βSPCA1,βSPCA2,βSPCA3,,βSPCA8. These both subsets of features can be combined by concatenating DCT features with SPCA features to form a single features vector (DCT⨁SPCA) of 20 features as shown below in Eq. (1).

DCT=α1α2α3α4α5α6α7α8α9α10α11α12

and

SPCA=βSPCA1βSPCA2βSPCA3βSPCA4βSPCA5βSPCA6βSPCA7βSPCA8
DCTSPCA=α1,α2,α3,α4,α5,α6,α7,α8,α9,α10,α11,α12,βSPCA1,βSPCA2,βSPCA3,βSPCA4,βSPCA5,βSPCA6,βSPCA7,βSPCA8E1

This set of 20 features represents one signature.

2.4 Classification

In this step, the model is presented based on training and testing. The various performed sub-steps are as follows:

3. Signature alignment

In order to perform a meaningful comparison of images of different lengths, we applied Extreme Points Warping (EPW) method [21]. EPW method modifies a shape using peaks and valleys as pivoting points, rather than warping the whole shape. The algorithm fixed the optimum linear alignment of two vectors by using the smallest overall dimension between them. The distances were recalculated between feature directions at each iteration. The alignment was considered to achieve optimal status in case the average dimension between feature vectors attained a low value. The dimension between two signature samples was calculated as the median of the dimensions between the fully aligned feature vectors.

3.1 Enrolment

For enrolment to the system, 54 signatures were selected from each user for training. Each pair of Arabic signatures was aligned to determine their distance, as described in the previous section. Using these aligned distances, the following measurement were evaluated:

  1. Median dimension to the farthest sample (dmax).

  2. Median dimension to the nearest sample (dmin).

The training group of Arabic signature images was used to determine the threshold parameter in order to distinguish dubious group from the genuine class.

4. Training

The 2-dimensional feature vectors (Pmin, Pmax) and normalize the feature values by the matching averages of the reference set (dmin, dmax) were obtained using the EPW algorithm. These were calculated based on Eqs. (2) and (3) to represent the allocation of the feature group.

Nmax=dmax/PmaxE2
Nmin=dmin/PminE3

Normalization of information ensures the genuineness or forgery of signatures in the training set. We trained a decision tree classifier to recognize the genuine and forged signatures in this normalized feature area (Figure 3). To facilitate comparisons, two classifiers were used: The Tree classifier and SVM classifier were applying the 2-dimensional attribute vectors. A linear classification was made by choosing a threshold ratio separating the two classes within the training set. This threshold was used in the verification process.

Figure 3.

The different stages of the pre-processing phase, (a) gray sample, (b) binary and converted sample, (c) with boundary box sample, (d) resized image, (e) windowing image.

4.1 Classification based on SVM

For offline Arabic signature verification and identification Support Vector Machines (SVM) was used. Important features in the Arabic signature images were extracted and the samples were confirmed with the assistance of Gaussian empirical law. SVM was applied to record corresponding results for comparing all signatures from database with the test signature. The suggested method is tested on Arabic signatures containing 500 samples of 50 users and the outcomes are obtained to be encouraging. In a high dimension feature area the principle of SVM, depends on a linear isolation where information were mapped to take into consideration the final non-linearity of the issue. SVM classifier [22, 23] was trained with corresponding result vectors for each distance. This is to obtain a good level of generalization capability. To establish the rating of signers’ relationship to the inquiry samples, firstly we used these processing points and then we combined the results of the entire samples.

4.2 Decision tree classification

Evaluation of Tree Classification (Bagged Trees) technique was used in the same way and on the same samples from Arabic signatures as SVM. MATLAB 2014 bagged tree classification and trees software were used in the training and classification simulation. To predict a reaction, the decision procedure in the decision tree from the root (starting) node (feature) down to a leaf (feature) node was followed. Responses were included in the leaf feature. Decision trees granted responses, such as ‘true’ or ‘false’. Decision Tree was created to perform classification [20, 24]. The described steps are presented in Algorithm 1.

Algorithm 1
Step 1: Start the first with all input features and then examine all potential binary divides on each predictor
Step 2: choose a divides with good optimization standard
Step 3: If the divide leads to a child node with less than the least leaf parameter), choose a divide with the better optimization standard. Subject to the least feature constraint
Step 4: put the divides and reiterate recursively for the two child (features) nodes
Step 5: If it made up of only observations of one category a (feature) node is perspicuous. Therefore, the node is fewer than minimum parent observations

5. Outcomes and discussion

In this section, we discuss the outcomes of the suggested methodology on some of samples from the Arabic signatures.

5.1 Pre-processing

The input image in RGB color space was first converted to grayscale image as displayed in Figure 3(a) represented Gray image. Then, the image was smoothened with median filter and converted to binary as shown in Figure 3(b). Further, the image was passed from boundary box to find the boundaries of the text area as presented in (c), while in (d) the image was resized to apply the adaptive windowing algorithm to divide it into fragments as shown in (e).

5.2 Feature extraction

In this phase, we represent the sub-images from a set of features. The outcome of the feature extraction is shown in Table 1(a). Initially, these features were not normalized. The values shown in Table 1(a) represent the frequencies of the designs extracted from each box. Higher ratios mean there is a more specific model with the genuine autograph, which suggests that the Arabic signatures are highly similar to the test signature. The features were then normalized using a composed matrix of features. The projection and profile features were normalized using window height, while the other descriptors were normalized by their respective maximum possible value. Normalization places different feature values in the same ranges as shown in Table 1(b). After normalization, each normalized feature of main window were concatenated into a single feature set, which represent each window by a vector. This process can standardize all features by scaling each feature to a given range.

(a) Un-normalized features
F10F9F8F7F6F5F4F3F2F1
3.0002.00203.00001.00013.0001.00001.00006.00001.00013.000
1.00003.00001.00001.00001.00001.00001.00008.00001.00001.0000
1.00004.00001.00001.00001.00001.00001.00009.00001.00001.0000
1.00003.00001.00001.00001.00001.00001.000011.00001.00001.0000
1.00006.00001.00001.00001.00001.00001.00001.200001.00001.0000
1.00008.00001.00002.00001.00002.00002.000010.00002.00001.0000
2.00002.64632.00003.00002.00002.64632.646310.00003.00002.0000
3.00003.92813.00001.00003.00003.92813.92819.00001.00003.0000
4.00002.4914.00001.00004.00002.4912.4916.00001.00004.0000
3.00001.86713.00001.00003.00001.86711.86713.00001.00003.0000
6.00001.32056.00001.00006.00001.32051.32051.00001.00006.0000
8.00002.64638.00001.00008.00000.33670.33671.00001.00008.0000
1.00000.90799.00801.32051.00000.8360.8391.00001.00009.0900
F20F19F18F17F16F15F14F13F12F11
3.00002.00203.00004.0882.61063.00001.00013.0001.00006.0000
1.00003.00001.00004.7641.0573.00001.00001.00001.00008.0000
1.00004.00001.00004.4721.55233.00001.00001.00001.00009.0000
1.00003.00001.00007.3520.05231.00001.00001.00001.000011.0000
1.00006.00001.00005.3360.14691.00001.00001.00001.00001.20000
1.00008.00001.00003.1521.60212.90662.00001.00002.000010.0000
2.00002.64632.00003.3761.00001.69743.00002.00002.646310.0000
3.00003.92813.00006.4241.00001.00001.00003.00003.92819.0000
4.00002.4914.00002.41.00005.00001.00004.00002.4916.0000
3.00001.86713.00000.0241.00004.00001.00003.00001.86713.0000
6.00001.32056.00000.3041.00005.00001.00006.00001.32051.0000
8.00002.64638.00000.0562.00006.00001.00008.00000.33671.0000
9.00801.02003.00100.4641.32051.00001.32051.00000.8361.0200
(b) Normalization
F10F9F8F7F6F5F4F3F2F1
0.6900.570.8240.6350.0410.610.680.720.830.65
0.0870.3450.7600.6220.7060.1570.1560.3950.7600.67
0.5230.8750.88870.27940.4720.5230.09210.8750.88870.2794
0.05230.4770.31090.64460.3520.05230.45850.4770.31090.6446
0.14680.3220.55770.4240.3960.14680.10980.3220.55770.424
0.60210.85810.90660.60120.1520.60210.05820.85810.90660.6012
0.25310.64630.69740.68310.3760.25310.48020.64630.69740.6831
0.34510.92810.77840.15760.4240.34510.20930.92810.77840.1576
0.66490.4910.92620.06210.4110.66490.67160.4910.92620.0621
0.81890.86710.98620.45850.0240.81890.11610.86710.98620.4585
0.66330.32050.92570.10980.3040.66330.59740.32050.92570.1098
0.7940.33670.91650.05820.0560.7940.41850.33670.91650.0582
0.62130.89380.91290.48020.1990.62130.05950.89380.91290.4802
F20F19F18F17F16F15F14F13F12F11
0.2990.6110.5470.7250.5810.9100.2990.6110.7250.581
0.67270.7410.04840.620.760.01570.67270.7410.620.76
0.3860.7050.3480.27940.4720.5230.3860.7050.27940.472
0.86510.9250.68830.64460.3520.05230.86510.9250.64460.352
0.9520.2740.9640.4240.3980.15680.9520.2740.4240.398
0.41750.6450.27590.6120.1520.60210.41750.6450.6120.152
0.9150.7220.72660.68310.3760.25310.9150.7220.68310.376
0.92350.5960.7940.15760.4240.34710.92350.5960.15760.424
0.41850.6660.98170.0210.4110.66490.41850.6660.0210.411
0.13150.2310.95710.45850.0240.87890.13150.2310.45850.024
0.39690.6660.40750.10980.3040.66730.39690.6660.10980.304
0.21440.7540.89880.05820.0560.77940.21440.7540.05820.056
0.4790.4610.0160.4820.1990.66130.4790.4610.4820.199

Table 1.

Feature extraction un-normalized and normalized. (a) Un-normalized features (b)Normalization.

5.3 Representation of feature selection

When the procedure of feature selection technique for windows was accomplished, those features with sufficient number of windows were kept. The features contained stroke patterns occurring in the windows. Generally, the number of patterns for each feature selection was proportional to the size of the Arabic signature sample. According to Figure 4, one important point to note is the number of selected features. This is a property of the signer as can be observed from Figure 5, where the number of selected features are presented. In this case, feature selection is generated from 40 different signers using two tasters from each one. As can be realized, the bows represent the number of selected features in the two tasters of the same writer are close to each other for DCT + SPCA method. This seems consistent with the supposition that the value of selected features is a signer-dependent feature.

Figure 4.

After Selection Feature step by SPCA method.

Figure 5.

The number of selected important features of DCT + SPCA method for the two samples of 40 signers.

5.4 Matching

The matching phase is when the model is created using Classification and Regression Tree (Tree) and Support Vector Machines Classification (SVM) with different input parameters. Based on a person’s signature, a model was created for the original and forgery signatures. The performance of the proposed method on 100 signers from Arabic signatures were used in identification for classification using DCT + SPCA features for selected important features with SVM classifier achieved the verification rate of 98.7%, and EER of 1.90% and same DCT + SPCA features with Tree classification achieved the verification rate of 99.8%, and the EER dropped to 1.20%. which was better than other techniques, as shown in Table 2. The objective of this study was creating a system that 1) can identify handwritten signatures and verify their authenticity, and 2 distinguish forgery from genuine ones, and those created under pressure and other influences. Using 2000 Arabic signatures samples. The results of the matching phase are shown in Table 2.

Classification Techniques with Features Selection TechniqueVerification RateVerification EERRecognition Rate
Tree+ DCT+SPCA method99.8%1.2098.5%
SVM+ DCT+SPCA method98.7%1.9097%

Table 2.

Experimental results obtained from 100 signer based on Arabic signatures.

This implies that a forger may not skillfully repeat all aspects of the original signature. It also shows a pattern in forgers, which has small variations. Evidence shows that the mean of a feature produced by a forger in multiple attempts at forging tends to lie in a small range. Conversely, genuine signatures produced by a signer may vary under unusual conditions. Signers possess certain unconscious features that remain consistent and stable despite the interference of influencing factors. Such natural features are almost impossible to imitate, even by the original signers.

Advertisement

6. Authentication of results

The comparison between Arabic signature recognition methods were by verification rate and not by the computational time. The accuracy performance measure has been computed using confusion matrix Where, TP signifies the number of true positive signatures, TN refers to the number of true negative signatures, FP signifies the number of false positive cases and FN signifies the number of false negative signatures. True Positive Ratio is the measure of genuine signatures classified correctly as genuine; False Positive Rate is the measure of a forgery signature classified as genuine. False Negative Rate is the measure of a genuine situation classified as forgery. True Negative Proportion is a measure of a forgery signature classified as forgery.

The Verification Rate=(TP+TN)/(TP+TN +FP+FN)×100%

The tests assumed that 99.8% accuracy proportion Predicted Valu and Decision Tree. Such promising results are pinpointing the state-of-the-art preprocessing techniques and best performance of proposed features to discriminate between genuine and forged signatures with higher accuracy rate. The authentication of the achievements of the suggested method was achieved applying the verification rate using DCT + SPCA method were computed and compared against the two other vastly agreeable autograph verification methods. Table 3 shows the simulation results with the Arabic signatures consisting 2000 signatures from 100 various signers. The validation rate for the proposed technique is 99.8% attesting to its superiority against the others. We could conclude that DCT + SPCA features technique and Decision Tree classifier was a credible and reliable technique for verification of offline Arabic signatures.

AuthorsMethodsLanguageVerification Rate
Ismail al. [25]New procedures for autograph verification by fuzzy conceptsArabic98%
A.y. Ebrahim [26]DCT+ DWT TechniqueArabic99.75%
SM Darwish, al., [27]Distance and Fuzzy Classifiers AllianceArabic98%
C. Ergun al., [28]word layout signatureFarsi94:3%
Proposed method (2021)DCT+SPCA features TechniqueArabic99.8%

Table 3.

An assessment table relating between the projected Arabic signature recognition system based on Arabic signatures and other signatures with other previously known approaches.

7. Conclusion

This paper, we described a method we developed to important features selection using DCT + SPCA features technique in offline Arabic signature verification. It employed the partition of signature samples into 14x14 windows and generated the features extracted for each window. Then, this feature selection was used for classification techniques.

We have mentioned the limitation of the research in the apply of set of Arabic signatures for collecting the Arabic signature samples used in this study. To judge our findings objectively, we used Arabic signatures, which includes Arabic signers. The results of our study show that this method was a credible technique for offline Arabic signature feature selection. This method can be used as a Arabic signature verification method for the exposure of offline signatures.

In the simulation phase, two different comparisons have been made. The first was the performance of support Vector Machine classifier and DCT+ SPCA features technique, and the second was the performance of Decision Tree classifiers with DCT+ SPCA features technique working together. The Decision Tree classifiers and DCT+ SPCA features technique produced the best verification rate of 99%, which improved the performance of offline Arabic signature verification.

There are many extensions which can be employed to develop the study. The proposed future works can be divided into two main fragments. Firstly, the extension is by an expansion of the procedures with more accuracy for autographs verification. Secondly, the extensions which can be made to the different dataset of autographs.

Download for free

chapter PDF

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Anwar Yahya Ebrahim and Hoshang Kolivand (April 9th 2021). New Attributes Extraction System for Arabic Autograph as Genuine and Forged through a Classification Techniques [Online First], IntechOpen, DOI: 10.5772/intechopen.96561. Available from:

chapter statistics

29total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us