Open access peer-reviewed chapter - ONLINE FIRST

Artificial Neural Network Computational Techniques in Biometric Handwriting

Written By

Jose Luis Vásquez-Vasquez and Carlos M. Travieso-González

Submitted: 19 June 2023 Reviewed: 25 July 2023 Published: 22 April 2024

DOI: 10.5772/intechopen.1002454

Biometrics and Cryptography IntechOpen
Biometrics and Cryptography Edited by Sudhakar Radhakrishnan

From the Edited Volume

Biometrics and Cryptography [Working Title]

Dr. Sudhakar Radhakrishnan

Chapter metrics overview

9 Chapter Downloads

View Full Metrics

Abstract

This study presents a novel methodology that combines the power of multilayer perceptron (MLP) neural networks with validated graphometry approaches for individual identification based on handwriting. By integrating the computational capabilities of MLPs with the graphometry characteristics utilized in graphology, this proposal aims to leverage the distinctiveness and stability of both approaches. Handwriting, as a widely accepted behavioral biometric characteristic, serves as a reflection of an individual’s personality, enabling effective identification. The MLP’s ability to learn complex relationships between inputs and outputs, coupled with the graphometry measures capturing intricate patterns within the data, contributes to developing highly accurate and efficient identification systems. This comprehensive approach fuses the strengths of MLP neural networks and graphometry techniques, providing a promising avenue for advancing the field of personal identification through handwriting analysis. By harnessing the intrinsic uniqueness of handwriting and its equivalence to other behavioral traits, the methodology enables discerning a person’s psychological profile and overcomes variations over time. The implementation of identification systems based on these properties establishes robust and reliable solutions in personal identification.

Keywords

  • handwriting
  • graphometry
  • identification
  • computational techniques
  • behavioral biometric

1. Introduction

Handwriting, as a biometric-behavioral trait, operates on a deeply subconscious level, offering substantial and reliable information for person recognition [1, 2]. While the signature serves as a widely accepted means of legal authentication, other aspects of a person’s handwriting can also be analyzed for identification purposes, such as in forensic studies that determine or certify document authorship—a task typically entrusted to handwriting experts.

The act of writing encompasses a complex phenomenon influenced by various personal factors. Like how an individual can be recognized by their unique laughter, gestures, or gait, their writing style is shaped by their personality and physiological traits [3] endowing it with identification potential.

In line with other pattern recognition domains, research on handwriting for biometric identification revolves around three key aspects: feature extraction for effective representation of the phenomenon, utilization of encoding methods and models, and selection of the most suitable classifier that aligns with the recognition task and the properties of the developed model. Studies in this field explore diverse approaches to feature extraction. Some focus on contour-based features [4] or employ bi-quadratic interpolation to capture letter curvature information [5]. Others propose feature extraction through wavelet transforms of handwriting images [6] or Hermite coefficients derived from text lines [7].

In addition to feature extraction, research in this field encompasses the study of different types of classifiers commonly used for handwriting-based identification. Artificial Neural Networks (NN) [8, 9], K-Nearest Neighbors (KNN) [10], Support Vector Machine (SVM) [11, 12], Hidden Markov Models (HMM) [13], and Gaussian Mixture Models (GMM) [14, 15] are some examples commonly used in this field. These classifiers play a crucial role in achieving accurate identification outcomes.

A strong line of research, on which much of this work is based, focuses on the extraction and use of structural features in handwriting [16, 17]. Structural or graphometry features are commonly employed in forensic document analysis by handwriting experts and are also utilized in graphology to determine a person’s psychological profile.

Handwriting characteristics can be classified into two main groups: those that are quantitatively evaluated by measuring traces, and those that require a more qualitative analysis by an expert. Quantitative features include the size of ascenders and descenders, inclination angles, aspect ratios, proportionality index, and the size of the calligraphic box. On the other hand, qualitative features relate to the richness of gestures and particularities in strokes, such as the way certain letters begin or end in a word, the roundness of letter forms, and the overall legibility of the writing. Evaluating these qualitative characteristics poses a challenge in automated processing and remains an open research topic.

Research in this field has primarily focused on solving two problems: word or handwritten text recognition (RATM) and writer recognition [18]. Significant advancements have been made in achieving high success rates in both areas, reflecting the progress of dedicated research efforts. The utilization of diverse classifiers and the exploration of structural and graphometry features contribute to the development of robust and effective handwriting identification systems.

These endeavors contribute to advancing the field of handwriting-based biometric identification, enabling more accurate and sophisticated systems for person recognition.

1.1 Motivation

The analysis of handwriting, whether conducted manually or automatically, presents a significant challenge due to the inherent variability observed among samples from the same writer. This variability can stem from various factors, including the specific conditions in which the act of writing occurs and the changes that naturally occur over time. For instance, individuals using banking services often find themselves repeatedly asked to replicate their signature as it appears on their identity documents. However, this self-imitation can become indistinguishable from forgery, particularly in the case of older individuals whose handwriting changes become more pronounced with age.

The situation arises from a lack of attention to the changes in handwriting associated with the aging process. Interestingly, if these changes were considered, they could potentially provide more reliable means of identification than a mere copy of an original document. Despite the numerous challenges involved in handwriting analysis, significant progress has been made in its automatic processing in recent years [19, 20]. However, when it comes to automatic writer recognition, the writing samples commonly used in research are often obtained within short time periods. As a result, they fail to accurately capture the evolving characteristics of handwriting caused by the aging process. In fact, it can be argued that existing research does not delve deeply enough into this issue, leading to a lack of comprehensive analysis regarding this phenomenon. Given the increasing social and economic engagement of older populations, addressing this issue becomes even more crucial. It is imperative to investigate and account for the long-term variations in handwriting to develop robust automated writer recognition systems that better reflect the reality of handwriting changes over time.

The successful recognition of individuals through their handwriting, despite changes over time, requires a comprehensive investigation of handwriting characteristics and advanced processing techniques. This research focuses on two key aspects: the development of robust computational techniques capable of handling data variability, and the analysis of techniques and characteristics employed by handwriting experts.

The study of handwriting characteristics is particularly relevant in forensic analyses, where the comparison of documents written by the same person at different times becomes necessary. Addressing the challenges posed by potential changes in handwriting is a primary objective of this work. By identifying consistent graphical elements, we aim to enhance the writer recognition process, thereby increasing the reliability and security of systems while minimizing user inconveniences.

By synergizing robust computational techniques with insights from handwriting expertise, this research aims to contribute to the advancement of highly reliable and accurate writer recognition systems. The objective is to ensure the effectiveness of these systems, even in the presence of natural variations in handwriting that occur over time. Through this interdisciplinary approach, we strive to enable robust identification methods that can withstand the challenges posed by handwriting variability, enhancing the overall performance and applicability of such systems.

1.2 Relative works

Currently, the term “Biometrics” [21] is used to refer to the technological field dedicated to the identification of individuals based on their physiological or behavioral biometric traits, e.g., fingerprints, iris, handwriting, hand geometry and others [22].

Physiological biometric traits include iris, fingerprint, hand geometry, retina, and DNA. Their main quality is that they have little or no variability over time, but their acquisition is more invasive and requires the cooperation of the subjects, while behavioral or behavioral biometric traits, such as voice, signature or handwriting in general, are less invasive although the accuracy of identification is lower due to the variability of behavioral patterns.

A personal trait will be valid, and a biometric system will be able to distinguish people based on it if it fulfills the following properties:

  • Universality: every person must possess such a biometric trait.

  • Uniqueness: different people must possess different traits, sufficiently different to allow them to be distinguished based on that trait.

  • Permanence: the trait must be sufficiently invariant over time.

  • Measurability: the trait must be quantitatively characterizable.

Writing, as a system of graphic representation of language, holds the potential to be a distinctive biometric trait. It involves the creation of engraved or drawn signs on a flat surface, with paper currently being the most widely used medium, closely followed by digital devices.

The process of writing is learned and shares common techniques among individuals who speak the same language. Initially, it is a conscious and deliberate act, but with time and practice, it becomes ingrained in our subconscious, becoming a reflexive action. This transition from volitional to reflexive writing explains the enduring and unique graphic characteristics that are specific to everyone, allowing for their differentiation and distinction from others.

Identifying individuals through their handwriting is of utmost importance and finds applications in various domains [23]. In the realm of forensics, handwriting identification plays a critical role in determining document authorship and aiding in the resolution of legal cases.

Moreover, the ability to identify individuals based on their handwriting is also highly relevant in the context of security and access control [24]. Handwriting recognition systems are deployed in settings where ensuring the accurate identification of individuals is crucial, such as financial institutions, airports, and cutting-edge security systems.

Continual research and development in the field of biometric identification have propelled the study of identifying individuals through their handwriting, making it a topic of enduring interest and ongoing advancement. Researchers are exploring innovative techniques, including the analysis of structural and dynamic features of handwriting, as well as leveraging machine learning algorithms [25] and neural networks [26] to enhance the precision and reliability of handwriting identification systems. The evolving landscape of handwriting analysis presents opportunities to improve the accuracy and efficiency of biometric identification based on handwriting. This dynamic field continues to push the boundaries, enabling advancements in the science of identifying individuals through their unique writing characteristics.

In [27], the authors present a proof-of-concept for a cognitive-based authentication system that utilizes an individual’s writing style as a unique identifier to grant access to a system. They train a machine learning SVM model on stylometric features to effectively distinguish between texts generated by different users. The extracted stylometric feature vector is then used as input to a key derivation function, generating a unique cryptographic key for each user. Experimental results demonstrate the system’s accuracy of up to 87.42% in classifying texts as written and validate the security and uniqueness of the generated keys. This research explores the intersection of natural intelligence, cognitive science, and cryptography, aiming to develop a cognitive cryptography system. By leveraging behavioral features from linguistic-biometric data through stylometry, the proposed system detects and classifies users, generating cryptographic keys for authentication and enhancing access control security.

Advertisement

2. Material

The database used in this study was created at the University of las Palmas de Gran Canaria and contains handwriting samples from 100 different people, each of whom made 10 handwritten copies of the following text in Spanish (Figure 1).

Figure 1.

Text used to formalize the writers’ database.

From the proposed text, 34 words were selected whose images were extracted from each of the previously scanned writings. The database contains a greyscale image of each of the words, for a total of 34,000 images (Figure 2).

Figure 2.

Words extracted.

The main conditions for the elaboration of the writings used to form the database are the following:

  • All writers copied the same text and used the same type of pen.

  • All samples were written on 80 gram DIN-A4 paper, using a flat, rigid surface as a support.

  • The copies were made over the course of a week, on different days and at different times, depending on the availability of each writer.

Advertisement

3. Methods

3.1 Methodology

The proposed methodology adopts a classical structure of pattern recognition systems configured in identification mode, which avoids the need for conducting multiple experiments with different evaluation modes. In this mode, the objective is to accurately assign each sample to its corresponding class. The methodology consists of several phases: first, all images undergo pre-processing to ensure optimal conditioning. Subsequently, various processing techniques are applied to extract the features described in the previous chapter.

The subsequent stages of the experimentation focus on applying supervised classification in the identification mode. Three classifiers, namely KNN, NN, and SVM, are utilized for the calligraphic features, while a combination of HMM and SVM is employed for contour characterization. The use of multiple classifiers for the calligraphic parameters aims to determine whether the obtained results are primarily attributed to the effectiveness of the parameters rather than the idiosyncrasies of a specific classifier. This approach enhances the reliability and comprehensiveness of the experimental analysis, providing a robust assessment of the proposed methodology.

After parameterizing the contour processing, the generated models with HMM were subjected to transformation using Fisher’s Kernel. This transformation mapped the models to hyperdimensional spaces, which served as input data for a classification system based on SVMs. The choice of SVMs is supported by the favorable results obtained in previous research studies, such as those conducted [28, 29, 30, 31]. These studies utilized large datasets, highlighting the effectiveness of SVMs in handling substantial amounts of data. By applying SVMs in the classification system, the methodology leverages their proven performance to achieve accurate and reliable results in the analysis of contour data.

The system modeling technique in this study involved two phases: training and testing. To ensure reliable results, the databases were divided into two groups, with each group used for a specific phase. The Hold-out cross-validation technique was employed for the division of the groups. This technique ensures that the systems are trained and tested on entirely different samples, reducing the risk of bias.

To further enhance the reliability of the results and avoid dependence on specific samples, a significant number of iterations were performed. This statistical independence was achieved by repeatedly applying the cross-validation method, dividing the dataset into training and test subsets.

Moreover, to assess the robustness of the systems, the percentages of training and test samples were varied, ranging from 50% to 20%. This variation allowed for a comprehensive analysis of the system’s performance under different proportions of data.

Additionally, singular experiments were conducted to evaluate the stability, invariance, and robustness of the systems in more complex scenarios. These experiments aimed to test the system’s ability to handle diverse and challenging situations, providing insights into its performance under different conditions.

3.1.1 Support Vector Machine

The Support Vector Machine (SVM) is a learning system that has undergone significant development in recent years, both in the generation of new algorithms and in the strategies for their implementation. It is used to train linear learning machines efficiently, both for classification and regression (linear and non-linear). Formally, a Support Vector Machine is a static network based on kernels that performs classification on vectors transformed to a higher dimensional space, separated by a hyperplane in the transformed space, which is originated by the kernel itself. The basic operations performed by an SVM are listed below:

  • Transform the data to a higher dimensional space through a previously defined kernel function. This reformulates the problem by implicitly mapping the data to the new space.

  • Find the hyperplane that maximizes the margin between the nearest training class patterns by performing an efficient computation of the optimal hyperplane.

  • If the data are not linearly separable find the hyperplane that maximizes the margin and minimizes a function of the number of misclassifications.

3.1.2 Hidden Markov Models

Hidden Markov Models first appeared in recognition systems in the late 1970s, allowing the development of specific probabilistic techniques for the estimation of these systems. Today, they are very efficient, robust and computationally flexible techniques for many types of recognition systems.

The main difference between an HMM and a Markov chain is that each state in a Markov chain is deterministically associated with a single output observation value, whereas in an HMM each is associated with a probability distribution of all possible output observation values.

An HMM can be viewed as a finite state machine in which two well-defined stochastic processes interact with each other: one of them remains unobservable and acts in the background (it is the hidden layer of the model) behind another observable that produces the sequence of output observations. The former involves a set of states connected to each other by means of transitions with probabilities; while the latter consists of a set of output observations, each of which can be emitted by any of the states according to a probability function associated to each of them. In other words, an HMM consists of a Markov chain and a set of probability functions associated with each state; that is, states are no longer simply symbols but are now associated with sets of probability distributions. Transitions between states depend on the occurrence of some symbol.

3.2 Training phase

In this phase, we determine the values of the optimization variables that characterize each of the shape classes to be classified. These variables include the number of neurons in the hidden layer and the number of networks.

When using neural networks for classification, the optimal number of neurons in the hidden layer is determined through a trial-and-error process. We explore different configurations to find the one that yields the best performance.

Furthermore, in most cases where neural networks are employed as classifiers, multiple networks are utilized in parallel. This approach ensures that the classification result is independent of the initial values of the model. The final decision for classifying a particular sample follows the “most voted” strategy, considering the opinions of the parallel neural networks. The experimentation process also includes determining the appropriate number of parallel neural networks for each scenario, striking a balance between performance and efficiency.

When utilizing neural networks for classification, the optimal number of neurons in the hidden layer was determined through an iterative trial and error process. Various configurations were tested to find the most suitable number that yielded optimal performance and accuracy.

Moreover, in most cases where neural networks were employed as classifiers, multiple networks were employed simultaneously in a parallel manner. This approach aimed to mitigate the impact of initial values on the model and enhance the robustness of the classification results. By adopting a “most voted” strategy, the final classification decision for a particular sample was based on the collective input from the parallel networks. The experimentation process also involved exploring the appropriate number of parallel neural networks for each scenario, ensuring reliable and consistent classification outcomes.

3.2.1 Number of neurons in the hidden layer and number of networks

The optimal number of neurons in the hidden layer for classification using neural networks was determined through an iterative trial and error process.

Furthermore, to enhance the reliability and stability of the classification results, multiple neural networks were often employed in parallel as classifiers. This approach aimed to reduce the impact of initial values on the model and increase the robustness of the overall system. In this scheme, the final classification decision for a specific sample was determined based on a “most voted” strategy, where each parallel network contributed its prediction, and the class with the highest number of votes was selected.

The experimentation process also involved exploring the appropriate number of parallel neural networks for each situation. This step allowed for the identification of the optimal configuration that maximized the accuracy and consistency of the classification outcomes.

3.2.2 SVM parameters

The version used in this work is the SVMlight with radial basis kernel function (RBF), defined by [32]:

Kxy=eγxy2=exy22σ2E1

In this configuration, the crucial parameter to determine is the variance (σ2) of the Gaussian (gamma (γ)), which is inversely proportional to the variance of the Gaussian kernel and represents the width of the RBF kernel.

Regarding the SVM, another important parameter to define is the training cost constant (c). Selecting an appropriate value for this parameter in real-world applications can be more challenging than choosing the kernel. The optimal value of ‘c’ often depends on the nature of the data. In all experiments conducted, a value of 10 was used for ‘c’.

3.2.3 HMM state numbers

In the contour characterization using HMM, the determination of the number of states is a crucial parameter during the system modeling phase. It directly impacts the reliability, robustness, and stability of the system. To determine the optimal number of states, multiple experiments were conducted using different values. Based on the classification results obtained from these experiments, the most appropriate number of states was selected. This careful selection ensures that the system achieves the desired performance and accurately captures the patterns and characteristics of the contour data.

3.3 Graphometry feature extraction

As stated in the introductory section, graphometry characteristics play a pivotal role in the forensic analysis of documents, as they encompass a substantial number of parameters utilized by handwriting experts. In alignment with the research objectives, we conducted an extensive review of these measures. We considered factors such as the significance of each characteristic in defining an individual’s handwriting style and its effectiveness in discerning between different writers. Moreover, we considered the computational complexity involved in automatically extracting these features, which is a fundamental aspect to consider. By taking these factors into consideration, we aimed to ensure that the selected morphometric measures are both relevant and practical for our research purposes.

3.3.1 Length of ascenders and descenders

“Ascenders” and “descenders,” commonly referred to as “hampas” and “jambs” in the field of forensic document analysis, are key features extensively examined by handwriting experts. Ascenders represent the upper extensions of letters, while descenders refer to the lower extensions (refer to Figure 3 for visual illustration). These features hold significant importance in the analysis of documents, enabling experts to gather valuable insights about the handwriting style in question. By studying the hampas and jambs, experts can gain a deeper understanding of the writer’s unique characteristics and tendencies, aiding in the identification and comparison of different handwriting samples.

Figure 3.

“Ascending and descending”.

The significance of these features in the biometric study of writing becomes evident when we consider their prevalence in the Latin alphabet. Ascenders and descenders are commonly found in approximately 50% of the letters within this alphabet. This statistic, as illustrated in Table 1, highlights the widespread presence of these features in written text. By considering their frequency, handwriting experts can effectively utilize ascenders and descenders as reliable indicators and discriminators in the analysis of documents. These features provide valuable information about a writer’s style, aiding in the identification and differentiation of various handwriting samples.

LetterItalicsPrint
AscendersDescendersAscendersDescenders
b
d
f
g
h
j
k√*
l
p
q
t
y
z

Table 1.

Presence of ascenders and descenders in Latin albatross letters.

A classical method for extracting ascenders and descenders is to divide the text into three zones: the upper zone, the middle zone, and the lower zone. However, determining these zones is not an easy task, the most common method in image processing being the horizontal projection, see Figure 4.

Figure 4.

Horizontal projection to determine the body of a word.

Various algorithms have been proposed to determine the boundaries of each zone based on the horizontal projection [33, 34]. However, it is important to note that no single method can be universally applied to all cases. For instance, Kirli and Gülmezoǧlu [34] suggest minimizing a specific expression to calculate the base and top lines of a text. This demonstrates the diversity of approaches used to address this challenge in handwriting analysis. Researchers continuously strive to develop effective methods tailored to different scenarios, considering the specific characteristics and requirements of each case.

D(L2,L3)=L=L1L21PminPL2+L=L2L3PmaxPL2E2
+L=L31L4PminPL2E3

The upper line of the word boundary is denoted by L1, while the lower line is represented by L4. L2 and L3 correspond to the upper and baseline, respectively. The minimum and maximum values of the projection are represented by Pmin and Pmax. The value PL corresponds to the projection index L. To determine the optimal positions of L2 and L3, the total squared error, denoted as D(L2,L3), is minimized. The positions that yield the minimum value of D(L2,L3) correspond to the final placement of the base and top lines.

Once the boundaries of the word body have been established, the next step involves identifying the ascending and descending strokes within the text. Additionally, their lengths are measured relative to the total height of the word. This analysis provides valuable insights into the proportions and characteristics of the handwriting, contributing to the overall understanding of the writer’s style (Figure 5).

Figure 5.

Length of jambs and jambs as a percentage of the total height of the word.

f1=Hampa lengthTotal heightE4
f2=Jamb lengthTotal heightE5

3.3.2 Skew

Skew refers to the inclination of words in relation to the x-axis of the Cartesian system. However, there is some debate among authors regarding its specific definition. Some argue that skew pertains to the inclination of the entire line of writing with respect to the x-axis.

In handwriting recognition applications, skew is typically detected and corrected during the pre-processing stage. This is because skew, as a characteristic, is influenced by various factors, including the writer’s state of mind, and does not provide relevant information for recognition. Instead, it can hinder the recognition process. However, in the context of a biometric system, handwriting experts have observed that everyone tends to write with a relatively consistent skew, making it a commonly analyzed characteristic.

Considering that skew is a variable dependent solely on the writer, it was decided to estimate it as the mean value of the processed words. However, it is important to note that this estimation may not accurately represent the true value of the parameter.

The method developed for the quantification of this parameter is based on horizontal projection techniques with a series of modifications described below. In summary, the process consists of 6 steps:

  1. Starting from the original word image (in greyscale) binarisation is performed using Otsu’s method [35].

  2. The centre of mass is calculated from the obtained image, which will be used as a reference point of rotation, which allows a better correction for words with an oscillating skew or sinusoid. Next, the range of angles used for rotation is defined, which in this project goes from −10 to 10 degrees in steps of 0.1 degrees (α є [−10:0.1:10]).

  3. In this step, a rotation of the image with respect to the centre of mass is performed with each of the negative angles. On each of these images, a horizontal projection is performed where the cost function does not look for the maximum value of the foreground pixels per row, but for the maximum variation of these pixels.

  4. This step is the same as the third one, but here the image is rotated with each of the positive angles, also looking for the maximum variance of the foreground pixels per row.

  5. With all the variances obtained for each angle of rotation, we proceed to evaluate for which of them the maximum variance is obtained. This will be the angle of inclination (αSKEW) of the word with respect to the ‘x’ axis.

  6. Here a rotation of the image is performed with the opposite angle of the tilt angle obtained in the previous step. Figure 6 shows each of the steps listed, for a concrete example of a database image:

Figure 6.

Example of a breakdown of the skew detection and correction method.

3.3.3 Slant

Slant refers to the deviation from the vertical or ‘y’ axis of the Cartesian system exhibited by each letter within a word. It stands out as one of the most prominent characteristics of an individual’s writing style, making it a significant parameter for quantification by handwriting experts.

While the study of slant typically focuses on analyzing individual letters, in our case, it was extracted from words. When referring to the slant of a line, we describe it as upright or vertical when the line’s axis forms a 90-degree angle with the base of the line. Any deviation from this vertical position indicates an inclination. In our analysis, if the inclination is towards the right of the vertical, the angle of inclination is considered positive, whereas if it leans towards the left, it is considered negative. The vertical axis (90°) serves as the reference point with 0° representing no inclination. For visual reference, please refer to Figures 7 and 8.

Figure 7.

Reference system chosen for slant estimation.

Figure 8.

Examples of inclination types.

3.3.4 Colligation

Colligation, also known as cohesion, refers to the degree to which the writing appears connected. It should be noted that colligation is not synonymous with continuity, although there may be cases where the two concepts overlap. In offline processing, it becomes challenging to determine whether a person lifts the writing instrument, as a stroke may lack separation and appear to have been written in parts.

To assess the type of cohesion in a writing sample, the predominant percentage needs to be determined. Based on this percentage, we can classify the writing using the following guidelines:

Linked writing: Letters and parts of letters are interconnected without any breaks in between. The movements exhibit cohesion, remaining consistent and uninterrupted. However, cohesion or linking may be momentarily interrupted for essential strokes such as dots, accents, capital letters, or the horizontal stroke of the letter “T.”

Unlinked or juxtaposed writing: Words consist of unlinked letters, where there is no contact between letters (although they may touch). Words are independent, lacking cohesion. Grouped writing: Words are formed by groups of two, three, four, or more letters, depending on the word’s length.

Fragmented or disjointed writing: This term describes writing where letters are composed of two or more separate (fragmented) strokes. Capital letters and the letter “m” can give the impression of disjointed movements. By assessing the cohesion in writing samples, handwriting experts can gain insights into an individual’s writing style, providing valuable information for analysis and comparison purposes (Figure 9).

Figure 9.

Representation of three types of colligations.

Algorithmically, when characterizing the cohesion parameter, each image contains groups of active pixels (foreground) or connected components with an 8-neighbor connectivity. Due to the offline nature of the analysis, where temporal sequencing is not available to indicate the writing process, we can only differentiate whether individuals write words in a linked or separate manner. We can identify cases of partially linked or completely unlinked writing. However, we are unable to discern situations where the writer lifts the tool between each letter but maintains complete linkage.

This expression serves as a quantitative measure to evaluate the cohesion level within a writing sample.

f4=1Number of separationsNumber of connected componentsE6

3.4 Experimental methodology

The methodology employed to accomplish the objectives consists of four key steps, which are outlined below.

Firstly, a comprehensive set of characteristics commonly employed in forensic handwriting analysis was extracted. It should be noted that the coding of these parameters did not always align with the practices of this discipline. Specifically, qualitative classifications of certain characteristics were quantitatively estimated, aiming to capture the essence of the established categories. Additionally, word contour coding was performed using two techniques: the application of Fischer’s Kernel to the Markovian representation of the contour and Fourier descriptors.

Subsequently, the graphometry parameters underwent an analysis of variance to determine their level of consistency. It is crucial to highlight that the database utilized in this stage was meticulously constructed under controlled conditions. This approach ensures the accurate retrieval of the inherent nature of the characteristics, mitigating any potential interference in the analysis caused by noise generated from the context. By conducting the analysis under controlled conditions, the reliability and validity of the results were maximized, providing a robust foundation for further investigations.

In the third step, the effectiveness of the graphometric parameters and contour encodings was evaluated. Specifically, the results obtained from various classifiers in the task of writer recognition were thoroughly analyzed. This analysis aimed to assess the discriminatory power and accuracy of the selected parameters and encoding techniques in distinguishing between different writers. By examining the performance of different classifiers, including but not limited to neural networks, Support Vector Machines, and decision trees, valuable insights were gained regarding the effectiveness of the graphometry features in achieving accurate and reliable writer identification. The findings from this step contributed to the refinement and optimization of the proposed methodology, enhancing its overall efficacy in individual identification based on handwriting analysis.

In the final stage of this methodology, a selection and validation process were conducted to identify the most persistent and discriminant graphometry parameters. The validation was performed by applying these parameters in the task of writer recognition. In the training phase, a model for each writer was established using recent writing samples, while in the test phase, writings that were at least 10 years old and were not included in the training phase were utilized. This approach aimed to assess the reliability and longevity of the selected parameters by evaluating their effectiveness in accurately identifying writers based on handwriting samples that exhibited significant temporal variations. The validation process ensured that the chosen parameters were robust and capable of maintaining their discriminatory power over extended periods of time, strengthening the overall validity and practicality of the proposed methodology.

Advertisement

4. Results

This section presents a detailed analysis of the experimental results obtained from the application of various classification algorithms, namely K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Neural Network (NN), and Support Vector Machines (SVM).

Table 2 presents the classification accuracies obtained for different parameter combinations related to slant (Slant HJ, Slant BS), length (Length HJ), concentration (Concentration), roundness (Roundness), final features (End features, Final traits), colligation (Colligation), and pressure (Pressure). The results demonstrate the influence of these parameters on the performance of the classification models.

KNNMLPSVM
Slant HJ, Direction; Length HJ, Concentration; Roundness, Final features; Colligation; Pressure95.70 ± 1.9495.76 ± 1.4889.1 ± 2.60
Slant BS, Direction; Length HJ, Concentration; Roundness, Final features; Colligation; Pressure95.86 ± 1.6995.46 ± 1.5788.98 ± 2.16
Slant HJ, Direction; Length HJ, Concentration; Roundness, Final features; Colligation; Pressure95.92 ± 1.0496.18 ± 1.2490.24 ± 2.08
Slant HJ, Slant BS; Length HJ, Concentration; Roundness, Final features; Colligation; pressure96.36 ± 0.9296.32 ± 1.6190.08 ± 2.15

Table 2.

Hit rate by including two parameters in some of the graphical elements: (Slope; Dimension; Richness; Link; Pressure). 50% training.

For the parameter combination of Slant HJ, Direction; Length HJ, Concentration; Roundness, Final features; Colligation; Pressure, the classification accuracies were 95.70 ± 1.94% for KNN, 95.76 ± 1.48% for NN, and 89.1 ± 2.60% for SVM. This indicates that all three models achieved relatively high accuracy in classifying the samples based on these combined parameters.

Similarly, the parameter combination of Slant BS, Direction; Length HJ, Concentration; Roundness, Final features; Colligation; Pressure yielded accuracies of 95.86 ± 1.69% for KNN, 95.46 ± 1.57% for NN, and 88.98 ± 2.16% for SVM. These results suggest that the models performed consistently across different parameter combinations, with KNN and NN achieving slightly higher accuracies compared to SVM.

Furthermore, the combination of Slant HJ, Direction; Length HJ, Concentration; Roundness, Final features; Colligation; Pressure demonstrated improved performance, with accuracies of 95.92 ± 1.04% for KNN, 96.18 ± 1.24% for NN, and 90.24 ± 2.08% for SVM. These findings indicate the effectiveness of the selected parameters in accurately classifying the handwriting samples.

Moreover, when additional parameters such as Slant BS and End features were considered in the parameter combination of Slant HJ, Slant BS; Length HJ, Concentration; Roundness, Final features; Colligation; Pressure, the classification accuracies further increased. The models achieved accuracies of 96.36 ± 0.92% for KNN, 96.32 ± 1.61% for NN, and 90.08 ± 2.15% for SVM. These results highlight the importance of including a comprehensive set of parameters in achieving higher classification accuracies.

To delve deeper into the analysis, Table 3 presents the influence of specific combinations of graphometry parameters on the classification performance. For instance, when considering slant, length, roundness, colligation, and pressure, KNN achieved an accuracy of 91.84 ± 1.64%, while NN achieved an accuracy of 91.40 ± 1.75%, and SVM yielded 77.62 ± 3.33%. Similar trends were observed for other parameter combinations.

KNNNNSVM
Slant HJ; Length HJ; Roundness; Colligation; Pressure.91.84 ± 1.6491.40 ± 1.7577.62 ± 3.33
Slant BS; Length HJ; Roundness; Colligation; Pressure.91.90 ± 1.6191.90 ± 2.0978.16 ± 2.74
Direction; Length HJ; Roundness; Colligation; Pressure.83.00 ± 1.6277.40 ± 3.5166.18 ± 3.10
Slant HJ; Length HJ; End features; Colligation; pressure92.36 ± 1.5791.00 ± 2.7182.96 ± 2.35
Slant HJ; Calligraphic box; Final traits; Colligation; Pressure.90.30 ± 1.9488.20 ± 3.2380.02 ± 2.70
Slant HJ; Concentration; Final features; Colligation; Pressure.91.00 ± 1.6988.50 ± 3.0380.40 ± 2.92
Slant HJ; Length HJ; grapheme “a”; colligation; pressure.90.18 ± 1.3191.50 1.7386.08 ± 1.59

Table 3.

Hit rate by including one parameter in some of the graphical elements: (Slope; Dimension; Richness; Link; Pressure). 50% training.

Based on the provided tables, the model that consistently demonstrates the highest classification accuracy across different parameter combinations is the K-Nearest Neighbors (KNN) algorithm. In both tables, KNN consistently achieves the highest or one of the highest accuracies among the three classifiers (KNN, MLP, and SVM).

These results highlight the effectiveness of the proposed classification algorithms in accurately identifying and distinguishing various handwriting characteristics. The outstanding accuracies obtained when utilizing all parameters in a text-independent scenario emphasize the importance of incorporating a comprehensive set of graphometry features. Furthermore, the varying performance across different parameter combinations suggests that certain combinations possess stronger discriminative capabilities.

The findings from this study provide valuable insights into the potential of graphometry and classification algorithms for achieving precise and reliable writer recognition. However, further investigation and optimization are required to enhance the overall performance and generalizability of the proposed approach. Future research could focus on exploring alternative combinations of graphometry parameters, investigating advanced feature extraction techniques, and refining the training process of the classification algorithms.

These results contribute significantly to the existing body of knowledge in the field of writer recognition and open possibilities for practical applications in forensic studies, document analysis, and other relevant domains. The promising outcomes encourage further exploration of graphometry analysis techniques to fully unlock their potential in the realm of handwriting expertise.

4.1 Comparison with similar works

Among the referenced studies, Purohit et al. [36] utilized the IAM English language dataset and employed a CNN technique, achieving an accuracy of 92.7%. Hagstrom et al. [37] conducted their research using a dataset consisting of 4920 written dates of birth (DoBs) and applied the ResNet50 technique, resulting in an accuracy of 94%.

Nabi et al. [38] worked with an Urdu dataset they developed themselves and utilized the DeepNet-WI (VGG-16) technique, achieving an impressive accuracy of 98.71%.

Javidi and Jampour [39] performed their analysis using multiple datasets including IAM, CERUG, FIREMAKER, and CVL. They employed the Res-Net technique and obtained an accuracy of 88.95%.

In comparison to these studies (see Table 4), our proposed technique involved a dataset of 350 Spanish words, and we employed the KNN technique. Our results yielded a high accuracy of 96.36%. It is noteworthy that our technique outperformed the accuracies achieved by [36, 37, 39], while being slightly below the exceptional accuracy achieved by [38] in their Urdu dataset.

ReferencesDataset scriptTechnique usedAccuracy (%)
Purohit et al. [36]IAM English language datasetCNN92.7
Hagstrom et al. [37]4920 written DoBs (own dataset)ResNet5094
Nabi et al. [38]Urdu (own dataset)DeepNet-WI (VGG-16)98.71
Javidi and Jampour [39]IAM, CERUG, FIREMAKER, and CVLRes-Net88.95
Proposed technique350 Spanish wordsKNN96.36

Table 4.

Comparison of the proposal vs. the state of the art of the proposal.

Overall, these findings indicate the effectiveness of our proposed technique in handwriting recognition, particularly for Spanish words, and highlight its competitive performance when compared to previous studies using different datasets and techniques.

Advertisement

5. Conclusions

In conclusion, this academic article underscores the crucial role of handwriting as a biometric-behavioral trait in person recognition, particularly within the realm of forensic studies focused on document authorship determination. Through an extensive exploration of various feature extraction approaches and the utilization of diverse classifiers, the research presented in this article has significantly advanced the understanding and application of handwriting analysis for accurate identification outcomes.

The study delved into a wide array of feature extraction techniques, encompassing contour-based features, interpolation-based curvature information, wavelet transforms, and Hermite coefficients. These approaches showcased the versatility and effectiveness of different methods in capturing distinctive handwriting characteristics essential for reliable identification.

In addition, the investigation highlighted the significance of employing different classifiers, including artificial neural networks, k-nearest neighbors, Support Vector Machines, Hidden Markov Models, and Gaussian mixture models. By leveraging these classifiers, the study demonstrated the robustness and adaptability of various classification techniques in accurately discerning individual handwriting patterns.

Moreover, the research shed light on the vital role of morphometric and structural features in forensic document analysis, providing insights into their relevance for both establishing identity and unraveling psychological profiles. These features, categorized as quantitatively evaluated parameters and qualitatively analyzed characteristics, present both challenges and opportunities for automated processing and serve as avenues for future research and development.

Throughout the experimental phase, the study successfully grouped parameters into distinct grapheme elements. Notably, the combination of parameters within the inclination, dimension, and shape richness elements yielded remarkable hit rates surpassing 50%. While individual grapheme elements such as link and pressure exhibited consistent but relatively lower hit rates, they nevertheless contributed to the overall understanding of handwriting characteristics.

To summarize, this comprehensive investigation significantly advances the field of handwriting-based biometric identification. By exploring various feature extraction approaches, employing diverse classifiers, and emphasizing the importance of graphometry features, the study paves the way for the development of robust and sophisticated systems capable of accurately recognizing individuals based on their handwriting. The findings hold substantial value for forensic document analysis, offering insights and implications for applications requiring precise writer identification in a variety of domains.

In conclusion, this research lays a solid foundation for further advancements in the field, propelling the development of cutting-edge systems that leverage the intricacies of handwriting for accurate identification. The knowledge gained from this study holds great promise for enhancing forensic investigations, bolstering security systems, and enabling various other domains to benefit from the reliable and precise identification of individuals through their unique handwriting patterns.

Further results obtained by merging deep learning systems together with our machine learning system for manuscript identification will be included as proposed future lines of research.

References

  1. 1. Srihari SN, Cha SH, Arora H, Lee S. Individuality of handwriting: A validation study. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. 2001-January, 2001. pp. 106-109. DOI: 10.1109/ICDAR.2001.953764
  2. 2. Srihari SN, Xu Z, Hanson L. Development of handwriting individuality: An information-theoretic study. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR. Vol. 2014-December. 2014. pp. 601-606. DOI: 10.1109/ICFHR.2014.106
  3. 3. Schomaker L. Advances in writer identification and verification. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. Vol. 2. 2007. pp. 1268-1273. DOI: 10.1109/ICDAR.2007.4377119
  4. 4. Siddiqi I, Vincent N. A set of chain code based features for writer recognition. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. 2009:981-985. DOI: 10.1109/ICDAR.2009.136
  5. 5. Chanda S, Franke K, Pal U. Text independent writer identification for Oriya script. In: Proceedings—10th IAPR International Workshop on Document Analysis Systems, DAS. Vol. 2012. 2012. pp. 369-373. DOI: 10.1109/DAS.2012.86
  6. 6. Hiremath PS, Shivashankar S, Pujari JD, Mouneswara V. Script identification in a handwritten document image using texture features. In: 2010 IEEE 2nd International Advance Computing Conference, IACC. Vol. 2010. 2010. pp. 110-114. DOI: 10.1109/IADCC.2010.5423028
  7. 7. Imdad A, Bres S, Eglin V, Emptoz H, Rivero-Moreno C. Writer identification using steered hermite features and SVM. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. 2007;2:839-843. DOI: 10.1109/ICDAR.2007.4377033
  8. 8. Anton C, Stirbu C, Badea RV. Automatic hand writer identification using the feed forward neural networks. In: World Congress on Internet Security, WorldCIS-2011. 2011. pp. 290-293. DOI: 10.1109/WORLDCIS17046.2011.5749871
  9. 9. Chaturvedi S, Titre RN, Sondhiya N. Review of handwritten pattern recognition of digits and special characters using feed forward neural network and izhikevich neural model. In: Proceedings—International Conference on Electronic Systems, Signal Processing, and Computing Technologies, ICESC. Vol. 2014. 2014. pp. 425-428. DOI: 10.1109/ICESC.2014.83
  10. 10. Marti UV, Messerli R, Bunke H. Writer identification using text line based features. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. Vol. 2001-January. 2001. pp. 101-105. DOI: 10.1109/ICDAR.2001.953763
  11. 11. Ibrahim AS, Youssef AE, Abbott AL. Global vs. local features for gender identification using Arabic and English handwriting. In: 2014 IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2014. 2015. DOI: 10.1109/ISSPIT.2014.7300580
  12. 12. Djeddi C, Meslati LS, Siddiqi I, Ennaji A, El Abed H, Gattal A. Evaluation of texture features for offline Arabic writer identification. In: Proceedings—11th IAPR International Workshop on Document Analysis Systems, DAS. Vol. 2014. 2014. pp. 106-110. DOI: 10.1109/DAS.2014.76
  13. 13. Schlapbach A, Bunke H. Off-line handwriting identification using HMM based recognizers. In: Proceedings—International Conference on Pattern Recognition. Vol. 2. 2004. pp. 654-655. DOI: 10.1109/ICPR.2004.1334343
  14. 14. Christlein V, Bernecker D, Hönig F, Maier A, Angelopoulou E. Writer identification using GMM supervectors and exemplar-SVMs. Pattern Recognition. 2017;63:258-267. DOI: 10.1016/J.PATCOG.2016.10.005
  15. 15. Slimane F, Märgner V. A new text-independent GMM writer identification system applied to Arabic handwriting. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR. Vol. 2014-December. 2014. pp. 708-713. DOI: 10.1109/ICFHR.2014.124
  16. 16. Rafiee A, Motavalli H. Off-line writer recognition for farsi text. In: Proceedings–2007 6th Mexican International Conference on Artificial Intelligence, Special Session, MICAI. Vol. 2007. 2007. pp. 193-197. DOI: 10.1109/MICAI.2007.37
  17. 17. Pervouchine V, Leedham G. Extraction and analysis of forensic document examiner features used for writer identification. Pattern Recognition. 2007;40(3):1004-1013. DOI: 10.1016/J.PATCOG.2006.08.008
  18. 18. Pastor Gadea M. Aportaciones al reconocimiento automático de texto manuscrito. 2007. Available from: https://dialnet.unirioja.es/servlet/tesis?codigo=17935&info=resumen&idioma=SPA [Accessed: June 12, 2023 (Online)]
  19. 19. Jain R, Doermann D. Combining local features for offline writer identification. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR. Vol. 2014-December. 2014. pp. 583-588. DOI: 10.1109/ICFHR.2014.103
  20. 20. Angadi SA, Angadi SA, Angadi SH. Structural features for recognition of hand written Kannada character based on SVM biometrics view project research_hatture view project. Article in International Journal of Computer Science Engineering and Information Technology. 2015;5(2). DOI: 10.5121/ijcseit.2015.5203
  21. 21. Abdulrahman SA, Alhayani B. A comprehensive survey on the biometric systems based on physiological and behavioural characteristics. Materials Today: Proceedings. 2023;80:2642-2646. DOI: 10.1016/J.MATPR.2021.07.005
  22. 22. Handbook of Biometrics—Google Libros. Available from: https://books.google.es/books?hl=es&lr=&id=WfCowMOvpioC&oi=fnd&pg=PA1&dq=A.K.+Jain%3B+P.+Flynn%3B+A.A.+Ross%3B+%E2%80%9CHandbook+of+biometrics%E2%80%9C+,+Springer,+ISBN-13:+978-0-387-71040-2,+USA,+2007.&ots=xrXI5Tx5Gf&sig=QO7bkHWtNuz95wCzmVtLUj8U2is&redir_esc=y#v=onepage&q&f=false [Accessed: June 12, 2023]
  23. 23. Bozinovic RM, Srihari SN. Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1989;11(1):68-83. DOI: 10.1109/34.23114
  24. 24. de Sousa Neto AF, Bezerra BLD, Toselli AH, Lima EB. A robust handwritten recognition system for learning on different data restriction scenarios. Pattern Recognition Letters. 2022;159:232-238. DOI: 10.1016/J.PATREC.2022.04.009
  25. 25. Raj MAR, Abirami S, Shyni SM. Tamil handwritten character recognition system using statistical algorithmic approaches. Computer Speech & Language. 2023;78:101448. DOI: 10.1016/J.CSL.2022.101448
  26. 26. Zhang G, Wang W, Zhang C, Zhao P, Zhang M. HUTNet: An Efficient Convolutional Neural Network for Handwritten Uchen Tibetan Character Recognition. 2023. DOI: 10.1089/BIG.2021.0333. Available from: https://home.liebertpub.com/big
  27. 27. Contreras Gedler JA. Cognitive cryptography using behavioral features from linguistic-biometric data. Cryptology ePrint Archive. 2023
  28. 28. Li X, Cervantes J, Yu W. A novel SVM classification method for large data sets. In: Proceedings—2010 IEEE International Conference on Granular Computing, GrC. Vol. 2010. 2010. pp. 297-302. DOI: 10.1109/GRC.2010.46
  29. 29. Cervantes J, Li X, Yu W, Bejarano J. Multi-class support vector machines for large data sets via minimum enclosing ball clustering. In: 2007 4th International Conference on Electrical and Electronics Engineering, ICEEE 2007. 2007. pp. 146-149. DOI: 10.1109/ICEEE.2007.4344994
  30. 30. Banerjee S. Boosting inductive transfer for text classification using Wikipedia. In: Proceedings—6th International Conference on Machine Learning and Applications, ICMLA. Vol. 2007. 2007. pp. 148-153. DOI: 10.1109/ICMLA.2007.25
  31. 31. Cervantes J, Li X, Yu W. Support vector classification for large data sets by reducing training data with change of classes. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics. 2008. pp. 2609-2614. DOI: 10.1109/ICSMC.2008.4811689
  32. 32. Support Vector Machines—Ingo Steinwart, Andreas Christmann - Google Libros. Available from: https://books.google.es/books?hl=es&lr=&id=HUnqnrpYt4IC&oi=fnd&pg=PA1&dq=Steinwart+and+Christmann,+2008&ots=gakJEu1sUa&sig=JC39H7zWWeQdIT53Mi5YlkxL_F0&redir_esc=y#v=onepage&q=Steinwart%20and%20Christmann%2C%202008&f=false [Accessed: June 22, 2023]
  33. 33. Gatos B, Papamarkos N, Chamzas C. Skew detection and text line position determination in digitized documents. Pattern Recognition. 1997;30(9):1505-1519. DOI: 10.1016/S0031-3203(96)00157-4
  34. 34. Kirli Ö, Gülmezoǧlu MB. Automatic writer identification from text line images. International Journal on Document Analysis and Recognition. 2012;15(2):85-99. DOI: 10.1007/S10032-011-0161-9/METRICS
  35. 35. Otsu N. Threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics. 1979;SMC-9(1):62-66. DOI: 10.1109/TSMC.1979.4310076
  36. 36. Purohit N, Panwar S. Dual-pathway deep CNN for offline writer identification. Lecture Notes in Networks and Systems. 2022;249:119-127. DOI: 10.1007/978-3-030-85365-5_12/COVER
  37. 37. Hagstrom AL, Stanikzai R, Bigun J, Alonso-Fernandez F. Writer Recognition Using Off-line Handwritten Single Block Characters. In: 2022 International Workshop on Biometrics and Forensics (IWBF). 2022. DOI: 10.1109/IWBF55382.2022.9794466
  38. 38. Nabi ST, Kumar M, Singh P. DeepNet-WI: A deep-net model for offline Urdu writer identification. Evolving Systems. 2023;1:1-11. DOI: 10.1007/S12530-023-09504-/TABLES/3
  39. 39. Javidi M, Jampour M. A deep learning framework for text-independent writer identification. Engineering Applications of Artificial Intelligence. 2020;95:103912. DOI: 10.1016/J.ENGAPPAI.2020.103912

Written By

Jose Luis Vásquez-Vasquez and Carlos M. Travieso-González

Submitted: 19 June 2023 Reviewed: 25 July 2023 Published: 22 April 2024