New Attributes Extraction System for Arabic Autograph as Genuine and Forged through a Classification Techniques

The authentication of writers, handwritten autograph is widely realized throughout the world, the thorough check of the autograph is important before going to the outcome about the signer. The Arabic autograph has unique characteristics; it includes lines, and overlapping. It will be more difficult to realize higher achievement accuracy. This project attention the above difficulty by achieved selected best characteristics of Arabic autograph authentication, characterized by the number of attributes representing for each autograph. Where the objective is to differentiate if an obtain autograph is genuine, or a forgery. The planned method is based on Discrete Cosine Transform (DCT) to extract feature, then Spars Principal Component Analysis (SPCA) to selection significant attributes for Arabic autograph handwritten recognition to aid the authentication step. Finally, decision tree classifier was achieved for signature authentication. The suggested method DCT with SPCA achieves good outcomes for Arabic autograph dataset when we have verified on various techniques.


Introduction
Handwritten autograph plays an important role in modern life as it is routinely used in every sphere of human activity. Couto [1] utilizes a lexical similarity technique for each entity identified. This frequently makes it unattainable to differentiate between a forged signature and a signature created under influence. Chung [2] applied Fuzzy groups to handle uncertainty. Although there are contributing studies in this area, research often failed to take into account the influence of contributing factors such as distractions and singers' stress which may affect the signatures being signed [3,4]. It is widely used for authenticating financial and business transactions [5,6]. There are online and offline authentication systems. In contrast, online signature systems require special hardware such as pressure tablets. These devices extract dynamic information including pressure, signer's speed, and the static image of signature. Unfortunately, both online and offline signatures can easily be imitated or forged, leading to false representation or fraud [7]. Yang [8] used learned dictionary to check samples. This method has been successfully utilized in image recognition lately. According to Alattas [9], financial institutions are interested to benefit from the reliability and safety of offline signaturerecognition systems. Another major reason is that online authentication systems require more complex processing and high-tech gadgets than off-line systems. Offline autographs are usually presented on a piece of paper, which is the norm in documentation. Currently, there is a need for efficient online and offline systems to ascertain the genuineness of personal autographs. Authentication of handwritten autographs usually consists of a series of procedures. These processes are preprocessing (where images are enhanced, binarized, divided into fragments and other related operations), feature extraction (features of the signatures are extracted as raw forms), feature selection or reduction (extracted features are reduced for efficiency), identification and authentication of the signatures against the signature database based on the selected features. A good verification outcome can be performed by likening the strong features of the taster against the autograph of a signer sample utilizing suitable techniques or classifiers [10]. Methods depend on local tests, which concentrate on the analysis of the essential features of different scripts [10][11][12]. Some studies utilized evolving curves which do not move away to near by features decreasing the superfluous fragmentation [13]. Based on the available gap in the literature, in this paper, we propose a new process to identify and authenticate Offline-Arabic signatures. This method uses a combination of techniques including adaptive window positioning procedure for autograph attribute extraction and feature selection method for reduced features and selection of important features. In this paper, enhanced Discrete Cosine Transform (DCT) and, Spars Principal Component Analysis (SPCA) method is used to extract attributes. Further, these extracted features are reduced to the best features only. In this research, in order to classify genuine and forged signature two types of classifiers: 1) Decision Tree and 2) Support Vector Machine (SVM) are applied. The classification outcomes of Decision Tree and SVM are compared to choose a better classifier. .

Proposed scheme
In this part, we introduce an offline Arabic signature identification system based on classification techniques. The procedure consists of four phases: pre-processing, features extracting, selected feature by (DCT+ SPCA) technique, and matching. The complete process begins with acquiring the images of signatures to undergo a pre-processing stage, and then identification and verification process, which are illustrated in Figure 1.

Pre-processing
In this step, data are acquired and signature images are pre-processed. For the purpose of this study, Arabic signature is used as the data consisting of 500 true samples and 250 forged samples was used. True samples were obtained from 50 different persons. Every signer was asked to sign 10 times using common types of pens. The 10 signatures collected from each person were used as follows: six of these signatures were selected at random for system learning and the remaining four were used for system testing in addition to "ve forged" samples.

Arabic signature database
This study employed the Arabic signature database created by "Anwar Yahya Ebrahim" as the Arabic signature samples for testing the proposed method. The Arabic signatures are booked on A4 size paper and then scanned at 300 dpi, 256 gray level images. The dataset contains encompasses signatures from persons scanned signatures were collected from the signer Anwar Yahya Ebrahim et al. Each signatory has 10 signatures to predict a response of which, 6 are genuine signatures and 4 are forged signatures. There are enough signatures to ensure sufficient samples for both training and testing. Where 7 of the samples are assigned to the training set, and the rest 3 to the testing set from both classes.
The distribution of the number of genuine and forgery samples for different signatories is illustrated in Figure 2. Arabic Signature images are then pre-processed in order to improve the quality of images. Noises, such as irrelevant data, are removed from the features to develop the achievement of identification. These samples are then converted into binary samples before feature extraction process [14][15][16][17].

Feature extraction
Adaptive window positioning technique is then applied to separate Arabic autograph images into small segments or sub-images. This makes the process of removing redundant data easy and facilitates the comparison of segmented fragments. A 14x14 segment size is chosen for the images for an optimum output [18]. Further, a group of features (form measures) from the approaches are extracted, which represents the signature image in a feature space. To analysis data accurately, a variety of observations as well as a value of significant individual features are needed to be organized. Such the data can be given and analyzed by machines or humans.
The goal of form representation is to get form measures. These measures are used as classification features in models. Moreover, sub-images are presented from the set of obtained features [19]. The attributes are then normalized using a feature matrix. The normalization process is very important. This is because when attributes are in various ranges, higher ratios may dominate lower values, which may distort the results. Normalization places the attribute ratios within the same scales and ranges to enable comparison. The projection and profile features are normalized by using window height, while the other descriptors are normalized by their maximum possible respective values. After normalization, each feature of the main window is composed to form a vector. This scales and translates each feature individually to a fixed range on the training set, which is a number between zero and one [20].

Features selection
This study proposes two fusions of features namely, Discrete Cosine Trans-form⨁ Spars Principal Component Analysis and (DCT⨁SPCA). The former is introduced represent the high pass in vertical, diagonal and horizontal directions, respectively in signature images whereas the latter is proposed to discriminate between genuine and forged of Arabic signatures. The reason to combine DCT and SPCA features is that both are transformed based features so due to homogeneity they are best choice for combining. Fusion combines the useful information from both images. The motivation to combine these both features are numerous similarities found in DCT and SPCA features. This proposed technique uses the high pass signature images to extract the necessary information for the signature verification.
Succeeding the feature selection, the twelve DCT features and the eight SPCA features are extracted. These features are then fused in order to classify signatures into genuine and forged classes. Suppose twelve DCT features are represented by α1, α2, α3, … … … , α12 and eight SPCA features are represented by βSPCA1, βSPCA2, βSPCA3, … … , βSPCA8. These both subsets of features can be combined by concatenating DCT features with SPCA features to form a single features vector (DCT⨁SPCA) of 20 features as shown below in Eq. (1).
This set of 20 features represents one signature.

Classification
In this step, the model is presented based on training and testing. The various performed sub-steps are as follows:

Signature alignment
In order to perform a meaningful comparison of images of different lengths, we applied Extreme Points Warping (EPW) method [21]. EPW method modifies a shape using peaks and valleys as pivoting points, rather than warping the whole shape. The algorithm fixed the optimum linear alignment of two vectors by using the smallest overall dimension between them. The distances were recalculated between feature directions at each iteration. The alignment was considered to achieve optimal status in case the average dimension between feature vectors attained a low value. The dimension between two signature samples was calculated as the median of the dimensions between the fully aligned feature vectors.

Enrolment
For enrolment to the system, 54 signatures were selected from each user for training. Each pair of Arabic signatures was aligned to determine their distance, as described in the previous section. Using these aligned distances, the following measurement were evaluated: 1. Median dimension to the farthest sample (dmax).

Median dimension to the nearest sample (dmin).
The training group of Arabic signature images was used to determine the threshold parameter in order to distinguish dubious group from the genuine class.

Training
The 2-dimensional feature vectors (Pmin, Pmax) and normalize the feature values by the matching averages of the reference set (dmin, dmax) were obtained using the EPW algorithm. These were calculated based on Eqs. (2) and (3) to represent the allocation of the feature group.
Normalization of information ensures the genuineness or forgery of signatures in the training set. We trained a decision tree classifier to recognize the genuine and forged signatures in this normalized feature area (Figure 3). To facilitate comparisons, two classifiers were used: The Tree classifier and SVM classifier were applying the 2-dimensional attribute vectors. A linear classification was made by choosing a threshold ratio separating the two classes within the training set. This threshold was used in the verification process.

Classification based on SVM
For offline Arabic signature verification and identification Support Vector Machines (SVM) was used. Important features in the Arabic signature images were extracted and the samples were confirmed with the assistance of Gaussian empirical law. SVM was applied to record corresponding results for comparing all signatures from database with the test signature. The suggested method is tested on Arabic signatures containing 500 samples of 50 users and the outcomes are obtained to be encouraging. In a high dimension feature area the principle of SVM, depends on a linear isolation where information were mapped to take into consideration the final non-linearity of the issue. SVM classifier [22,23] was trained with corresponding result vectors for each distance. This is to obtain a good level of generalization capability. To establish the rating of signers' relationship to the inquiry samples, firstly we used these processing points and then we combined the results of the entire samples.

Decision tree classification
Evaluation of Tree Classification (Bagged Trees) technique was used in the same way and on the same samples from Arabic signatures as SVM. MATLAB 2014 bagged tree classification and trees software were used in the training and classification simulation. To predict a reaction, the decision procedure in the decision tree from the root (starting) node (feature) down to a leaf (feature) node was followed. Responses were included in the leaf feature. Decision trees granted responses, such as 'true' or 'false'. Decision Tree was created to perform classification [20,24]. The described steps are presented in Algorithm 1.

Algorithm 1
Step 1: Start the first with all input features and then examine all potential binary divides on each predictor Step 2: choose a divides with good optimization standard Step 3: If the divide leads to a child node with less than the least leaf parameter), choose a divide with the better optimization standard. Subject to the least feature constraint Step 4: put the divides and reiterate recursively for the two child (features) nodes Step 5: If it made up of only observations of one category a (feature) node is perspicuous. Therefore, the node is fewer than minimum parent observations

Outcomes and discussion
In this section, we discuss the outcomes of the suggested methodology on some of samples from the Arabic signatures.

Pre-processing
The input image in RGB color space was first converted to grayscale image as displayed in Figure 3(a) represented Gray image. Then, the image was smoothened with median filter and converted to binary as shown in Figure 3(b). Further, the image was passed from boundary box to find the boundaries of the text area as presented in (c), while in (d) the image was resized to apply the adaptive windowing algorithm to divide it into fragments as shown in (e).

Feature extraction
In this phase, we represent the sub-images from a set of features. The outcome of the feature extraction is shown in Table 1(a). Initially, these features were not normalized. The values shown in Table 1(a) represent the frequencies of the designs extracted from each box. Higher ratios mean there is a more specific model with the genuine autograph, which suggests that the Arabic signatures are highly similar to the test signature. The features were then normalized using a composed matrix of features. The projection and profile features were normalized using window height, while the other descriptors were normalized by their respective maximum possible value. Normalization places different feature values in the same ranges as shown in window by a vector. This process can standardize all features by scaling each feature to a given range.

Representation of feature selection
When the procedure of feature selection technique for windows was accomplished, those features with sufficient number of windows were kept. The features contained stroke patterns occurring in the windows. Generally, the number of patterns for each feature selection was proportional to the size of the Arabic signature sample. According to Figure 4, one important point to note is the number of selected features. This is a property of the signer as can be observed from Figure 5, where the number of selected features are presented. In this case, feature selection  is generated from 40 different signers using two tasters from each one. As can be realized, the bows represent the number of selected features in the two tasters of the same writer are close to each other for DCT + SPCA method. This seems consistent with the supposition that the value of selected features is a signer-dependent feature.

Matching
The matching phase is when the model is created using Classification and Regression Tree (Tree) and Support Vector Machines Classification (SVM) with different input parameters. Based on a person's signature, a model was created for the original and forgery signatures. The performance of the proposed method on 100 signers from Arabic signatures were used in identification for classification using DCT + SPCA features for selected important features with SVM classifier achieved the verification rate of 98.7%, and EER of 1.90% and same DCT + SPCA features with Tree classification achieved the verification rate of 99.8%, and the EER dropped to 1.20%. which was better than other techniques, as shown in Table 2. The objective of this study was creating a system that 1) can identify handwritten signatures and verify their authenticity, and 2 distinguish forgery from genuine ones, and those created under pressure and other influences. Using 2000 Arabic signatures samples. The results of the matching phase are shown in Table 2.
This implies that a forger may not skillfully repeat all aspects of the original signature. It also shows a pattern in forgers, which has small variations. Evidence shows that the mean of a feature produced by a forger in multiple attempts at forging tends to lie in a small range. Conversely, genuine signatures produced by a signer may vary under unusual conditions. Signers possess certain unconscious

Authentication of results
The comparison between Arabic signature recognition methods were by verification rate and not by the computational time. The accuracy performance measure has been computed using confusion matrix Where, TP signifies the number of true positive signatures, TN refers to the number of true negative signatures, FP signifies the number of false positive cases and FN signifies the number of false negative signatures. True Positive Ratio is the measure of genuine signatures classified correctly as genuine; False Positive Rate is the measure of a forgery signature classified as genuine. False Negative Rate is the measure of a genuine situation classified as forgery. True Negative Proportion is a measure of a forgery signature classified as forgery.
The tests assumed that 99.8% accuracy proportion Predicted Valu and Decision Tree. Such promising results are pinpointing the state-of-the-art preprocessing techniques and best performance of proposed features to discriminate between genuine and forged signatures with higher accuracy rate. The authentication of the achievements of the suggested method was achieved applying the verification rate using DCT + SPCA method were computed and compared against the two other vastly agreeable autograph verification methods. Table 3 shows the simulation results with the Arabic signatures consisting 2000 signatures from 100 various signers. The validation rate for the proposed technique is 99.8% attesting to its superiority against the others. We could conclude that DCT + SPCA features technique and Decision Tree classifier was a credible and reliable technique for verification of offline Arabic signatures.

Conclusion
This paper, we described a method we developed to important features selection using DCT + SPCA features technique in offline Arabic signature verification. It employed the partition of signature samples into 14x14 windows and generated the features extracted for each window. Then, this feature selection was used for classification techniques. We have mentioned the limitation of the research in the apply of set of Arabic signatures for collecting the Arabic signature samples used in this study. To judge our findings objectively, we used Arabic signatures, which includes Arabic signers. The results of our study show that this method was a credible technique for offline Arabic signature feature selection. This method can be used as a Arabic signature verification method for the exposure of offline signatures.
In the simulation phase, two different comparisons have been made. The first was the performance of support Vector Machine classifier and DCT+ SPCA features technique, and the second was the performance of Decision Tree classifiers with DCT+ SPCA features technique working together. The Decision Tree classifiers and DCT+ SPCA features technique produced the best verification rate of 99%, which improved the performance of offline Arabic signature verification.
There are many extensions which can be employed to develop the study. The proposed future works can be divided into two main fragments. Firstly, the extension is by an expansion of the procedures with more accuracy for autographs verification. Secondly, the extensions which can be made to the different dataset of autographs.