Stroke-Based Cursive Character Recognition

Human eye can see and read what is written or displayed either in natural handwriting or in printed format. The same work in case the machine does is called handwriting recognition. Handwriting recognition can be broken down into two categories: off-line and on-line. ...


Introduction
Human eye can see and read what is written or displayed either in natural handwriting or in printed format.The same work in case the machine does is called handwriting recognition.Handwriting recognition can be broken down into two categories: off-line and on-line.
Off-line character recognition -Off-line character recognition takes a raster image from a scanner (scanned images of the paper documents), digital camera or other digital input sources.The image is binarised based on for instance, color pattern (color or gray scale) so that the image pixels are either 1 or 0.
On-line character recognition -In on-line, the current information is presented to the system and recognition (of character or word) is carried out at the same time.Basically, it accepts a string of (x, y) coordinate pairs from an electronic pen touching a pressure sensitive digital tablet.
In this chapter, we keep focusing on on-line writer independent cursive character recognition engine.In what follows, we explain the importance of on-line handwriting recognition over off-line, the necessity of writer independent system and the importance as well as scope of cursive scripts like Devanagari.Devanagari is considered as one of the known cursive scripts Jayadevan et al. (2011); Pal & Chaudhuri (2004).However, we aim to include other scripts related to the current study.

Why On-line?
With the advent of handwriting recognition technology since a few decades Arica & Yarman-Vural (2001); Plamondon & Srihari (2000), applications are challenging.For example, OCR is becoming an integral part of document scanners, and is used in many applications such as postal processing, script recognition, banking, security (signature verification, for instance) and language identification.In handwriting recognition, feature selection has been an important issue ∅ivind Due Trier et al. (1996).Both structural and statistical features as well as their combination have been widely used Foggia et al. (1999); Heutte et al. (1998).
These features tend to vary since characters' shapes vary widely.As a consequence, local structural properties like intersection of lines, number of holes, concave arcs, end points and junctions change time to time.These are mainly due to • deformations can be from any range of shape variations including geometric transformation such as translation, rotation, scaling and even stretching; and • defects yield imperfections due to printing, optics, scanning, binarisation as well as poor segmentation.
In the state-of-the-art of handwritten character recognition, several different studies have shown that off-line handwriting recognition offers less classification rate compared to on-line Plamondon & Srihari (2000); Tappert et al. (1990).Furthermore, on-line data offers significant reduction in memory and therefore space complexity.Another advantage is that the digital pen or a digital form on a tablet device immediately transforms your handwriting into a digital representation that can be reused later without having any risk of degradation usually associated with ancient handwriting.Based on all these reasons, one can cite a few examples Boccignone et al. (1993); Doermann & Rosenfeld (1995); Qiao et al. (2006); Viard-Gaudin et al. (2005) where they mainly focus on temporal information as well as writing order recovery from static handwriting image.On-line handwriting recognition systems provide interesting results.
On-line character recognition involves the automatic conversion of stroke as it is written on a special digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching.Such data is known as digital ink and can be regarded as a dynamic representation of handwriting.The obtained signal is converted into letter codes which are usable within computer and character-processing applications.
The elements of an on-line handwriting recognition interface typically include: 1. a pen or stylus for the user to write with, and a touch sensitive surface, which may be integrated with, or adjacent to, an output display.
2. a software application i.e., a recogniser which interprets the movements of the stylus across the writing surface, translating the resulting strokes into digital character.
Globally, it resembles one of the applications of pen computing i.e., computer user-interface using a pen (or stylus) and tablet, rather than devices such as a keyboard, joysticks or a mouse.Pen computing can be extended to the usage of mobile devices such as wireless tablet personal computers, PDAs and GPS receivers.Historically, pen computing (defined as a computer system employing a user-interface using a pointing device plus handwriting recognition as the primary means for interactive user input) predates the use of a mouse and graphical display by at least two decades, starting with the Stylator Dimond (1957) and RAND tablet Groner (1966) systems of the 1950s and early 1960s.

Why Writer Independent?
As mentioned before, on-line handwriting recognition systems provide interesting results almost over all types scripts.The recognition systems vary widely which can be due to nature of the scripts employed along with the associated particular difficulties including the intended applications.The performance of the application-based (commercial) recogniser is used to determine by its speed in addition to accuracy.
Among many, more specifically, template based approaches have a long standing record Bahlmann & Burkhardt (2004); Connell & Jain (1999); Hu et al. (1996); Santosh & Nattee (2006a); Schenkel et al. (1995).In many of the cases, writer independent recogniser has been made since every new user does not require training -which is widely acceptable.In such a context, the expected recognition system should automatically update or adapt the new users once they provide input or previously trained recogniser should be able to discriminate new users.

Why Devanagari?
In a few points, interesting scope will be summarised.
1. Pencil and paper can be preferable for anyone during a first draft preparation instead of using keyboard and other computer input interfaces, especially when writing in languages and scripts for which keyboards are cumbersome.Devanagari keyboards for instance, are quite difficult to use.Devanagari characters follow a complex structure and may count up to more than 500 symbols Jayadevan et al. (2011); Pal & Chaudhuri (2004).3. Writing one's own style brings unevenness in writing units, which is the most difficult part to recognise.Variation in basic writing units such as number of strokes, their order, shapes and sizes, tilting angles and similarities among classes of characters are considered as the important issues.In contrast to Roman script, it happens more in cursive scripts like Devanagari.
Devanagari is written from left to right with a horizontal line on the top which is the shirorekha.Every character requires one shirorekha from which text(s) is(are) suspended.
The way of writing Devanagari has its own particularities.In what follows, in particular, we shortly explain a few major points associated difficulties.
• Many of the characters are similar to each other in structure.Visually very similar symbols -even from the same writer -may represent different characters.While it might seem quite obvious in the following examples to distinguish the first from the second, it can easily be seen that confusion is likely to occur for their handwritten symbol counterparts ( , ), ( , ), ( , Ú), etc.).Fig. 2 shows a few examples of it.• The number of strokes, their order, shapes and sizes, directions, skew angle etc. are writing units that are important for symbol recognition and classification.However, these writing units most often vary from one user to another and there is even no guarantee that a same user always writes in a same way.Proposed methods should take this into account.
Based on those major aforementioned reasons, there exists clear motivation to pursue research on Devanagari handwritten character recognition.

Structure of the Chapter
The remaining of the paper is organised as follows.In Section 2, we start with detailing the basic concept of character recognition framework in addition to the major highlights on important issues: feature selection, matching and recognition.Section 3 gives a complete outline of how we can efficiently handle optimal recognition performance over cursive scripts like Devangari.In this section, we first provide the complete and then validate the whole process step by step with genuine reasoning and a series of experimental tests over our own dataset but, publicly available.We conclude the chapter in Section 4.  In this illustration, we present a basic concept to form template via clustering of features of the strokes immediately after they are pre-processed.

Character Recognition Framework
Basically, we can categorise character recognition system into two modules: learning and testing.In learning or training module, following Fig. 3, handwritten strokes are learnt or stored.Testing module follows the former one.The performance of the recognition system is depends on how well handwritten strokes are learnt.It eventually refers to the techniques we employ.
Basically, learning module employs stroke pre-processing, feature selection and clustering to form template to be stored.Pre-processing and feature selection techniques can be varied from one application to another.For example, noisy stroke elimination or deletion in Roman cannot be directly extended to the cursive scripts like Urdu and Devanagari.In other words, these techniques are found to be application dependent due to their different writing styles.However, they are basically adapted to each other and mostly ad-hoc techniques are built so that optimal recognition performance is possible.In the framework of stroke-based feature extraction and recognition, one can refer to Chiu & Tseng (1999); Zhou et al. (2007), for example.It is important to notice that feature selection usually drives the way we match them.As an example, fixed size feature vectors can be straightforwardly matched while for non-linear feature vector sequences, dynamic programming (elastic matching) has been basically used Keogh & Pazzani (1999); Kruskall & Liberman (1983); Myers & Rabiner. (1981); Sakoe (1978).The concept was first introduced in the 60's Bellman & Kalaba (1959).Once we have an idea to find the similarity between the strokes' features, we follow clustering technique based on their similarity values.The clustering technique will generate templates as the representative of the similar strokes provided.These stored templates will be used for testing in the testing module.Fig. 4 provides a comprehensive idea of it (testing module).More specifically, in this module, every test stroke will be matched with the templates (learnt in training module) so that we can find the most similar one.This procedure will be repeated for all available test strokes.At the end, aggregating all matching scores provides an idea of the test character closer to which one in the template.

Preprocessing
Strokes directly collected from users are often incomplete and noisy.(1993).
Besides pre-processing, in this chapter, we mainly focus on feature selection and matching techniques.

Feature Selection
If you have complete address of your friend then you can easily find him/her without an additional help from other people on the way.The similar case is happened in character recognition.Here, an address refers to a feature selection.Therefore, the complete or sufficient feature selection from the provided input is the crucial point.In other words, appropriate feature selection can greatly decrease the workload and simplify the subsequent design process of the classifier.
In what follows, we discuss a few but major issues associated with feature selection.
• Pen-flow i.e., speed while writing determines how well the coordinates along the pen trajectory are captured.Speed writing and writing with shivering hands, do not provide complete shape information of the strokes.
• Ratios of the relative height, width and size of letters are not always consistent -which is obvious in natural handwriting.
• Pen-down and pen-up events provide stroke segmentation.But, we do not know which and where the strokes are rewritten or overwritten.
• Slant writing style or writing with some angles to the left or right makes feature selection difficult.For example, in those cases, zoning information using orthogonal projection does not carry consistent information.This means that the zoning features will vary widely as soon as we have different writing styles.
initial (pen-down) end (pen-up) We repeat, features should contain sufficient information to distinguish between classes, be insensitive to irrelevant variability of the input, allow efficient computation of discriminant functions and be able to limit the amount of training data required Lippmann (1989) where, α p l−1 ,p l = arctan y l −y l−1 x l −x l−1 .Fig. 5 shows a complete illustration.
Our feature includes a sequence of both pen-tip position and tangent angles sampled from the trajectory of the pen-tip, preserving the directional property of the trajectory path.It is important to remind that stroke direction (either left -right or right -left) leads to very different features although they are geometrically similar.To efficiently handle it, we need both kinds of strokes or samples for training and testing.This does not mean that same writer must be used.
The idea is somehow similar to the directional arrows that are composed of eight types, coded from 0 − 7.This can be expressed as, However, these directional arrows provide only the directional feature of the strokes or line segments.Therefore, more information can be integrated if the relative length of the standard strokes is taken into account Cha et al. (1999).

Feature Matching
Besides, discussing on classifiers, we explain how features can be matched to obtain similarity or dissimilarity values between them.
Matching techniques are often induced by how features are taken or strokes are represented.
For instance, normalising the feature vector sequence into a fixed size vector provides an immediate matching.On the other hand, features having different lengths or non-linear features need dynamic programming for approximate matching, for instance.Considering the latter situation, we explain how dynamic programming is employed.
Dynamic time warping (DTW) allows us to find the dissimilarity between two non-linear sequences potentially having different lengths Keogh & Pazzani (1999); Kruskall & Liberman (1983); Myers & Rabiner. (1981); Sakoe (1978).It is an algorithm particularly suited to matching sequences with missing information, provided there are long enough segments for matching to occur.
Let us consider two feature sequences of size K and L, respectively.The aim of the algorithm is to provide the optimal alignment between both sequences.At first, a matrix M of size K × L is constructed.Then for each element in matrix M, local distance metric δ(k, l) between the events e k and e l is computed i.e., δ(k, l) = (e k − e l ) 2 .Let D(k, l) be the global distance up to (k, l), with an initial condition D(1, 1) = δ(1, 1) such that it allows warping path going diagonally from starting node (1, 1) to end (K, L).The main aim is to find the path for which the least cost is associated.The warping path therefore provides the difference cost between the compared signatures.Formally, the warping path is, The optimised warping path W satisfies the following three conditions.
c2. monotonicity condition: c3. continuity condition: c1 conveys that the path starts from (1, 1) to (K, L), aligning all elements to each other.c2 forces the path advances one step at a time.c3 restricts allowable steps in the warping path to adjacent cells, never be back.Note that c3 implies c2.

Y X
(1,1) showing warping path In this illustration, diagonal DTW-matrix is shown including how back-tracking has been employed.
We then define the global distance between X and Y as, The last element of the K × L matrix gives the DTW-distance between X and Y, which is normalised by T i.e., the number of discrete warping steps along the diagonal DTW-matrix.The overall process is illustrated in Fig. 6.
Until now, we provide a global concept of using DTW distance for non-linear sequences alignment.In order to provide faster matching, we have used local constraint on time warping proposed in Keogh (2002).We have w(k, l) t such that l − r ≤ k ≤ l + r where r is a term defining a reach i.e., allowed range of warping for a given event in a sequence.With r, upper and lower bounding measures can be expressed as, Therefore, for all i, an obvious property of U and L is With this, we can define a lower bounding measure for DTW: Since this provides a quick introduction of local constraint for lower bounding measure, we refer to Keogh (2002) for more clarification.

Recognition
From a purely combinatorial point of view, measuring the similarity or dissimilarity between two symbols and S 2 = s j 2 j=1...m composed, respectively, of n and m strokes, requires a one by one matching score computation of all strokes s i 1 with all s j 2 .This means that we align individual test strokes of an unknown symbols with the learnt strokes.As soon as we determine the test strokes associated with the known class, the complete symbol can be compared by the fusion of matching information from all test strokes.Such a concept is fundamental under the purview of stroke-based character recognition.
Overall, the concept may not always be sufficient, and these approaches generally need a final, global coherence check to avoid matching of strokes that shows visual similarity but do not respect overall geometric coherence within the complete handwritten character.In other words, matching strategy that happens between test stroke and templates of course, should be intelligent rather than straightforward one-to-many matching concepts.However, it in fact, depends on how template management has been made.In this chapter, this is one of the primary concerns.We highlight the use of relative positioning of the strokes within the handwritten symbol and its direct impact to the performance Santosh, Nattee & Lamiroy (2012).

Recognition Engine
To make the chapter coherence as well as consistent (to Devanagari character recognition), it refers to the recognition engine which is entirely based on previous studies or works Santosh & Nattee (2006a;b;2007); Santosh et al. (2010); Santosh, Nattee & Lamiroy (2012).Especially because of the structure of Devanagari, it is necessary to pay attention to the appropriate structuring of the strokes to ease and speed up comparison between the symbols, rather than just relying on global recognition techniques that would be based on a collection of strokes Santosh & Nattee (2006a).Therefore, Santosh et al. (2010); Santosh, Nattee & Lamiroy (2012) develop a method for analysing handwritten characters based on both the number of strokes and the their spatial information.It consists in four main phases.For more clear understanding, we explain the aforementioned steps as follows.For a specific class of character, it is interesting to notice that writing symbols with the equal number of strokes, generally produce visually similar structure and is easier to compare.
In every group within a particular class of character, a representative symbol is synthetically generated from pairwise similar strokes merging, which are positioned identically with respect to the shirorekha.It uses DTW algorithm.The learnt strokes are then stored accordingly.
It is mainly focused on stroke clustering and management of the learnt strokes.
We align individual test strokes of an unknown symbols with the learnt strokes having both same number of strokes and spatial properties.Overall, symbols can be compared by the fusion of matching information from all test strokes.This eventually build a complete recognition process.

Stroke Spatial Description and its Need
The importance of the location of the strokes is best observed by taking a few pairs of characters that often lead to confusion: The first character in every pair has visually two distinguishing features: its particular location of the shirorekha (more to the right) and a small curve in the text.There is no doubt that one of the two features is sufficient to automatically distinguish both characters.However, small curves are usually not robust feature in natural handwriting, finding the location of the shirorekha only can avoid possible confusion.Our stroke based spatial relation technique is explained further in the following.
To handle relative positioning of strokes, we use six spatial predicates i.e., 2 × 3 relational regions: For easier understanding, iconic representation of the aforementioned relational matrix R can be expressed as, • • • • • • where black-dot represents the presence i.e., stroke is found to be in the provided bottom-right region.
To confirm the location of the stroke, we use the projection theory: minimum boundary rectangle (MBR) Papadias & Sellis (1994) model combined with the stroke's centroid.
Based on Egenhofer & Herring (1991), we start with checking fundamental topological relations such as disconnected (DC), externally connected (EC) and overlap/intersect (O/I) by considering two strokes s j and s j : We then use the border condition from the geometry of the MBR.It is straightforward for disconnected strokes while, is not for externally connected and overlap/intersect configurations.In the latter case, we check the level of the centroid with respect to the boundary of the MBR.For example, if a boundary of the shirorekha is above the centroid level of the text stroke, then it is confirmed that the shirorekha is on the top.This procedure is applied to all of the six previously mentioned spatial predicates.Note that use of angle-based model like bi-centre Miyajima & Ralescu (1994) and angle histogram Wang & Keller (1999) are not the appropriate choice due to the cursive nature of writing.
On the whole, assuming that the shirorekha is on the top, the locations of the text strokes are estimated.This eventually allows to cross-validate the location of the shirorekha along with its size, once texts' locations are determined.Fig. 7 shows a real example demonstrating relative positioning between the strokes for a two-stroke symbol .Besides, symbols with two shirorekhas are also possible to treat.In such a situation, the first shirorekha according to the order of strokes is taken as reference.

Spatial Similarity based Clustering
Basically, clustering is a technique for collecting items which are similar in some way.Items of one group are dissimilar with other items belonging to other groups.Consequently, it makes the recognition system compact.To handle this, we present spatial similarity based stroke clustering.
• The first step is to organise symbols representing a same character into different groups, based on the number of strokes used to complete the symbol.Fig. 8 shows an example of it for a class of character a.
• In the second step, strokes from the specific location are agglomerated hierarchically within the particular group.Once relative position for every stroke is determined as shown in Fig. 8, single-linkage agglomerative hierarchical clustering is used (cf.Fig. 10).This means that only strokes which are at a specific location are taken for clustering.As an example, we illustrate it in Fig. 9.This applies to all groups within a class.
In agglomerative hierarchical clustering (cf.Fig. 10), we merge two similar strokes and find a new cluster.The distance computation between two strokes follows Section 2.3.The new cluster is computed by averaging both strokes via the use of the discrete warping path along the diagonal DTW-matrix.This process is repeated until it reaches the cluster threshold.The threshold value yields the number of cluster representatives i.e., learnt templates.

Stroke Number and Order Free Recognition
In natural handwriting, number of strokes as well as their order vary widely.This happens from one writing to another, even from the same user -which of course exits from different users.Fig. 11 shows the large variation of stroke numbers as well as the orders.
Once we have organised the symbols (from the particular class) into groups based on the number of strokes used, our stroke clustering has been made according to the relative positioning.As a consequence, while doing recognition, one can write symbol with any numbers and orders because stroke matching is based on relative positioning of the strokes in which group while it does not need to care about the strokes order.

Dataset
In this work, as before, publicly available dataset has been employed (cf.Table 1) where a Graphite tablet (WCACOM Co. Ltd.), model ET0405A-U, was used to capture the pen-tip position in the form of 2D coordinates at the sampling rate of 20 Hz.The data set is composed of 1800 symbols representing 36 characters, coming from 25 native speakers.Each writer In this illustration, red-dot refers to the initial pen-tip position so that it makes easy to realise how many number of strokes to make a complete symbol.In addition, stroke ordering is different from one to another.was given the opportunity to write each character twice.No other directions, constraints, or instructions were given to the users.

Recognition Performance Evaluation
While experimenting, every test sample is matched with training candidates and the closest one is reported.The closest candidate corresponds to the labelled class, which we call 'character recognition'.Formally, recognition rate can be defined as the number of correctly recognised candidates to the total number of test candidates.
To evaluate the recognition performance, two different protocols can be employed: 1. dichotomous classification and 2. K-fold cross-validation (CV).
In case of dichotomous classification, 15 writers are used for training and the remaining 10 are for testing.On the other hand, K-fold CV has been implemented.Since we have 25 users for data collection, we employ K = 5 in order to make recognition engine writer independent.
In K-fold CV, the original sample for every class is randomly partitioned into K sub-samples.
Of the K sub-samples, a single sub-sample is used for validation, and the remaining K − 1 sub-samples are used for training.This process is then repeated for K folds, with each of the K sub-samples used exactly once.Finally, a single value results from averaging all.The aim of the use of such a series of rigorous tests is to avoid the biasing of the samples that can be possible in conventional dichotomous classification.In contrast to the previous studies Santosh, Nattee & Lamiroy (2012), this will be an interesting evaluation protocol.

Results and Discussions
Following evaluation protocols we have mentioned before, Table 2 provides average recognition error rates.In the tests, we have found that the recognition performance has been advanced by approximately more than 2%.
Based on results (cf.Table 2), we investigate the recognition performance based on the observed errors.We categorise the origin of the errors that are occurred in our experiments.As said in Section 1.3, these are mainly due to 1. structure similarity, 2. reduced and/or very long ascender and/or descender stroke, and 3. others such as re-writing strokes and mis-writing.
Compared to previous work Santosh, Nattee & Lamiroy (2012), number of rejection does not change while confusions due to structure similarity has been reduced.This is mainly because of the 5-fold CV evaluation protocol.Besides, running time has been reduced by more than a factor of two i.e., 2 seconds per character, thanks to LB_Keogh tool Keogh (2002).

Conclusions
In this chapter, an established as well as validated approach (based on previous studies Santosh & Nattee (2006a;b;2007); Santosh et al. (2010); Santosh, Nattee & Lamiroy (2012)) has been presented for on-line natural handwritten Devanagari character recognition.It uses the number of strokes used to complete a symbol and their spatial relations1 .Besides, we have provided the dataset publicly available for research purpose.Considering such a dataset, the success rate is approximately 97% in less than 2 seconds per character on average.In this chapter, note that the new evaluation protocol reduces the errors (mainly due to multi-class similarity) and the optimised DTW reduces the delay in processing -which has been new attestation in comparison to the previous studies.
Lamiroy (2012).Once again, to avoid contradictions, this chapter aims to provide coherence as well as consistent studies on Devanagari character recognition.
The proposed approach is able to handle handwritten symbols of any stroke and order.Moreover, the stroke-matching technique is interesting and completely controllable.It is primarily due to our symbol categorisation and the use of stroke spatial information in template management.To handle spatial relation efficiently (rather than not just based on orthogonal projection i.e., MBR), more elaborative spatial relation model can be used Santosh, Lamiroy & Wendling (2012), for instance.In addition, use of machine learning techniques like inductive logic programming (ILP) Amin (2000); Santosh et al. (2009) to exploit the complete structural properties in terms of first order logic (FOL) description.

Fig. 1 :
Fig. 1:On-line stroke sequences in the form of 2D (x, y) coordinates.In this illustration, initial pen-tip position is coloured with red and pen-up (final point) is coloured with blue.

2.
Devanagari is a script used to write several Indian languages, including Nepali, Sanskrit, Hindi, Marathi, Pali, Kashmiri, Sindhi, and sometimes Punjabi.According to the 2001 Indian census, 258 million people in India used Devanagari.

Fig. 2 :
Fig. 2: A few samples of several different similar classes from Devanagari script.

Fig. 3 :
Fig.3: Learning strokes from the handwritten symbols.In this illustration, we present a basic concept to form template via clustering of features of the strokes immediately after they are pre-processed.

Fig. 5 :
Fig. 5: An illustration of feature selection: pen-tip position and tangent at every pen-tip position along the pen trajectory.

Fig. 6 :
Fig. 6: Classical DTW algorithm -an alignment illustration between two non-linear sequences X and Y.In this illustration, diagonal DTW-matrix is shown including how back-tracking has been employed.

step 1 .
Organise the symbols representing the same character into different groups based on the number of strokes.step 2. Find the spatial relation between strokes.step 3. Agglomerate similar strokes from a specific location in a group.step4. Stroke-wise matching for recognition.

Fig. 8 :
Fig. 8: Relative positions of strokes for a class a in two different groups i.e., two-stroke and three-stroke symbols.

Fig. 9 :F 1 F 2 F 3 F 4 F 5 FFig. 10 :
Fig. 9: Clustering technique for each class.Stroke clustering is based on the relative positioning.As a consequence, we have three clustering blocks for text strokes and remaining three for shirorekha.

Fig. 11 :
Fig.11: Different number of strokes and order for a class .In this illustration, red-dot refers to the initial pen-tip position so that it makes easy to realise how many number of strokes to make a complete symbol.In addition, stroke ordering is different from one to another.
An illustration of testing module.As in learning module, test characters are pre-processed and we present a basic concept to form template via clustering of features of the strokes immediately after they are pre-processed.
Chun et al. (2005)dt (2004)iety of different pre-processing techniques before feature extraction Alginahi (2010);Blumenstein et al. (2003);Verma et al. (2004).The techniques used in one system may not exactly fit into the other because of different writing styles and nature of the scripts.Very common issues are repeated coordinates deletionBahlmann & Burkhardt (2004), noise elimination and normalisationChun et al. (2005); Guerfali & Plamondon