Histogram-Based Texture Characterization and Classification of Brain Tissues in Non-Contrast CT Images of Stroke Patients

This chapter describes histogram-based texture characterization and classification of brain tissue in CT images of stroke patients using a case study. It explored texture analysis in medical imaging. In the case study, two radiologists independently inspected non-contrast CT images of 164 stroke to identify and categorize brain tissue into normal, ischaemic and haemorrhagic strokes. Four regions of interest (ROIs) in each CT slice with lesion were selected for analysis; two each represented the lesion and normal tissue. Histogram texture parameters were calculated for them. Raw data analysis identified parameters that discriminated between normal brain tissue, ischaemic and haemorrhagic stroke lesions. The artificial neural network (ANN) and k-nearest neighbour (kNN) algorithms were used to classify the ROIs into normal tissue, ischaemic and haemorrhagic lesions using the radiologists’ categorization as the gold standard, and further analysed using the ROC curve. Three parameters namely mean, 90 and 99 percentiles discriminated between normal brain tissue, ischaemic and haemorrhagic stroke lesions. With ANN and k-NN, the weighted sensitivity and specificity were above 0.9 while the false positive and false negative rates were negligible. The characterization and classification of brain tissue using histogram parameters were satisfactory and may be suitable for automated diagnosis of stroke.


Introduction
Medical imaging is a rapidly developing branch of modern medicine.It has in the past few decades evolved into a highly sophisticated diagnostic tool.It has improved the study of human internal anatomy and to an extent physiology and detection of pathologies which were previously impossible.At this stage of its development, detection of lesions and their interpretation is becoming an automated computer-aided process.It can safely be said now that machine vision has become an emerging part of radiology and imaging in medicine.This is as a result of advances in medical imaging technology and computer science [1] which have greatly enhanced the interpretation of medical images and contributed to early diagnosis.The bases for computer-aided diagnosis (CAD) in radiology are medical image processing and artificial intelligence.
Stroke accounts for a significant proportion of neurological disorders seen in Nigerian hospitals [2].It carries a high morbidity and mortality statistics in industrialized countries [3][4][5][6], and in Africa, it is reported to be the leading neurological cause of death [7].The World Health Organization (WHO) defined stroke a rapidly developing clinical syndrome of focal or global disturbance of cerebral function presumably of vascular origin, lasting longer than 24 hours unless interrupted by surgery or death [8].A stroke occurs when the blood supply to the brain is disturbed which results in brain cells being starved of oxygen and consequently, some cells die while others are left damaged.Brain cells being permanent in nature achieve only very limited recovery, and thus, the patient may be left with a permanent disability.Clinical diagnosis of stroke and its subtyping is sometimes inaccurate [9][10][11][12].Neuroimaging is, therefore, essential for accurate diagnosis.Stroke remains one of the most important clinical diagnoses for which patients are referred to the radiology department for emergency imaging because a timely and accurate diagnosis would help in the management of the patients [13].Previous studies have highlighted the time-critical nature of ischaemic stroke diagnosis.Ischaemic stroke has a narrow therapeutic window in the first few hours following stroke ictus and a dramatic rise in haemorrhage complications thereafter [14][15][16][17][18][19][20].
Non-contrast head computed tomography (NCCT) has been suggested as the mainstay for early stroke diagnosis because computed tomography (CT) scanners are more widely available in the communities and may be accessed much more easily [13].Computed tomography examinations are not only cheaper than magnetic resonance imaging (MRI) but also faster to perform.Thus, taking the time-critical nature of early stroke diagnosis into consideration, NCCT is the preferred first-line imaging tool.Computed tomography and other neuroimaging procedures will, however, not benefit the patient until the images have been accurately interpreted.For visual analysis and interpretation of stroke CT images, the radiologist seeks to identify affected areas of the brain by examining the dissimilarity between the left and right cerebral hemispheres.The challenges associated with the visual interpretation of stroke CT images are dearth of neuroradiologists [21] and the human errors of interpretation and diagnosis.Errors in visual interpretation result from poor technique, failures of perception, lack of knowledge and misjudgements [22].Visual interpretation can be improved upon by texture analysis which will make it possible for automated computer-aided approach to be used as a second opinion for clinicians, especially in equivocal cases.Automatic method of stroke detection follows the same pattern as visual analysis and interpretation used by radiologists [23].
Computer-aided diagnosis (CAD) in medical imaging is an application of artificial intelligence in medicine.Artificial intelligence (IA) simulates the human brain or recreates it electronically.It is defined as the study and design of intelligent agents [24], where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success [24][25][26].The simplest intelligent agents are programs written to solve specific problems.More complicated intelligent agents include human beings and organization of human beings such as a firm or a team.Artificial intelligence is based on the central characteristic of human beings: intelligence-the sapience of Homo sapiens.This can be so precisely described that it can be simulated by a machine.
One very important stage in medical image processing leading to CAD is image texture analysis.Texture analysis of a medical image is the measurement of the quantitative parameters that constitute the image of a supposed lesion or normal tissue.This has the advantages of helping clinicians make accurate diagnosis and monitor disease processes under treatment.The analysis of texture parameters is a useful way of increasing the information obtainable from medical images [27].

The concept of texture and analysis of texture
Texture is a very difficult term to give a precise definition.This is because there is no unified definition of texture and every definition that has been used has rather aimed at relating it to the area of its application.The non-existence of a universally agreed-upon definition of texture is an acknowledged fact [28,29].In general, texture can be defined as a descriptor that provides measures of properties such as smoothness, coarseness and regularity [28].For medical images, image texture is defined as the appearance, structure and arrangement of the parts of an object within the image [27].The concept of texture as a quantitative measure is applied only to digital images which are made up of numerous rectangular picture elements (pixels) as illustrated in Figure 1.
In consideration of this technicality, the texture concept in a digital image is regarded as the distribution of grey-level values among the pixels of a given region of interest in the image [27].This definition is in agreement with a recent one which referred to texture as the spatial variation of pixel intensities in an image [29].In order to understand texture better, it is important to draw an analogy from the way the human visual system perceives scenes.The human eye perceives scenes as sets of objects that are related to each other over various surfaces despite varying ambient illumination [30].Texture has components called texels, which are notional uniform micro-objects placed in an appropriate way to form any particular texture.The placing may be random, regular, directional and so on, and there may be a degree of overlap in some cases [30].From the foregoing, texture in very simple physical concept is composed of the randomness, periodicity, directionality and orientation of the composite elements making up an object's structure.
Texture analysis is an aspect of imaging science which analyses pixel intensity variations or its spatial distribution on a pixel-by-pixel scale to unravel patterns which may not be perceptible to the human visual system.The technique evaluates the location and signal intensity of the image represented by the pixel and contrast index for digital images [27].Texture features represent the mathematical parameters obtained from the distribution of pixels which characterize the texture type and hence the structural components of an object [27].Texture analysis is employed in image classification, segmentation and synthesis.It also plays a very vital role in computer-aided detection or diagnosis or more broadly machine vision.

Methods of texture analysis
There are four major issues in texture analysis, namely feature extraction, texture discrimination, texture classification and shapes from texture [31].The purpose of feature extraction is to compute a characteristic of a digital image able to numerically describe its texture properties, while texture discrimination partitions a textured image into regions, each corresponding to a perceptually homogeneous texture (leads to image segmentation).In texture classification, the goal is to determine to which of a finite number of physically defined classes, such as normal or abnormal tissue, a homogeneous texture region belongs, while shape from texture reconstructs the three-dimensional surface geometry from texture information.
The first stage in texture analysis is the extraction of texture parameters, and the results obtained during this process are used for the remaining stages in texture analysis.The approaches to texture analysis are categorized into structural, statistical, model-based and transform methods [31].These approaches are herewith described briefly.

Structural methods
In this method, texture is represented by well-defined primitives.In other words, a square object is represented in terms of the straight lines or the primitives that form its border [27].To describe texture using the structural approach, one must first define the primitives (microtexture) and then the placement rules.Primitives are the parts from which texture is composed.Note well that primitives may be tonal, that is, grey levels.Tonal primitives are regions of an image with tonal properties [32].The advantage of structural methods is that they provide a good symbolic description of the image [31], but the disadvantage is that it is not a very powerful way describing texture.

Statistical methods
The statistical approach to texture analysis uses grey-level distribution within an image to describe texture.This approach provides better discrimination between classes than structural or transforms methods.It is the most widely used method in medical applications.Statistical methods can be used to analyse the spatial distribution of pixel grey values in an image.This is done by computing local features at each point in the image and then deriving a set of statistics from the distributions of the local features [33].Statistical methods are classified as first-order, second-order and higher-order statistics based on the number of pixels that define the local feature.In the first-order statistics, only one pixel is involved; in second-order statistics, a pair of pixels; and higher-order statistics, three or more pixels [33].There are differences between the different statistical methods.In the first-order statistics, properties such as average and variance of individual pixel values are estimated, but the spatial interaction between the image pixels is not taken into consideration.More specifically, first-order statistics measure the frequency of a particular grey level at a random image position without taking into account the correlations or co-occurrences between the pixels.Thus, information on texture is derived from the histogram of image pixel grey values [29].The second-order and higher-order statistics estimate properties of two or more pixel values occurring at specific locations relative to each other, and thus, pixel-pixel interaction is a feature of these two methods [33].Specifically, information on the texture of an image based on second-order statistical texture analysis is based on the probability of finding a pair pixels with the same grey level at random distances and orientations over an entire image, while higher-order statistics means the number of variables studied is increased [29].

The co-occurrence matrix (COM)
The co-occurrence matrix is a second-order histogram that analyses the grey-level distribution of pairs of pixels [27].In grey-level co-occurrence matrix method, the probability of finding a pixel with a defined grey level (i) at a defined distance (d) and a defined angle (α) from another pixel with defined grey level (j) is calculated.So, the co-occurrences of pixel pairs are calculated in vertical, horizontal and two diagonal directions, as well as distances up to five pixels.An essential feature of this arrangement is that each pixel has eight nearest neighbours connected to it except when the pixel is located at the periphery.A very simple illustration of grey-level co-occurrence matrix as relative positions of pixels of the same grey-level intensities is shown in Figure 2. In this illustration, the reference pixel (X) is of the same grey-level value with the pixels X 1 in horizontal direction for inter-pixel distance of 1, X 2 in vertical direction for interpixel distance of 2, X 3 in 45° diagonal direction for inter-pixel distance of 3 and X 4 in 135°d iagonal direction for inter-pixel distance of 3. A co-occurrence matrix is produced in each direction (α), for each inter-pixel distance (d), with the matrix dimension being equal to the number of intensity levels.It, therefore, means that the process becomes computationally intense and the number of grey levels in an image would undergo a rescaling and re-binning procedure to reduce the range of pixel values contained within an image [34].The implication of rescaling and re-binning of the grey levels in the image is loss of texture information.
The co-occurrence matrix parameters include the angular second moment, contrast, correlation, sum of squares, inverse difference moment, sum average, sum variance, sum entropy, entropy, difference variance and difference entropy.The construction of the co-occurrence matrix and mathematical derivation of the formulae for calculating the parameters are both tedious processes and further reading is necessary for better understanding [28,31].

The run-length matrix (RLM)
The grey-level run-length matrix is a higher-order statistical method of texture feature extraction.The run-length matrix aims to calculate the number of consecutive pixels in a given direction that has the same grey-level intensity.It is a number of pixels in a particular direction with the same grey-level intensity value [29].A coarse texture will, therefore, be dominated by relatively long runs, whereas a fine texture will be populated by much shorter runs [29].
The parameters derivable from the run-length matrix are usually computed in four different directions: horizontal, vertical and two diagonals.The grey-level run-length matrix is illustrated in Figure 3 which shows a run-length of 4 pixels in a 45° diagonal direction [34].The run-length emphasis describes a number of consecutive pixels with the same grey-level value.It could be suitably termed long-or short-run emphasis depending on the number of consecutive pixels in the chosen direction with the same grey-level value [35].The run-length and grey-level non-uniformity describe the disorderliness in pixel and pixel grey-level runs.
The fraction of the image in runs simply refers to run percentages.That is, the ratio of the total number of runs in the image to the total number of pixels in the image expressed as a percentage [35].
The run-length method of texture analysis was first introduced by Galloway [36], but it has not gained the desired general acceptance as an efficient way of calculating texture [35].It is therefore not popular among researchers working to develop diagnostic tools for medical applications.
The calculation of the run-length matrix parameters using MaZda ® can be illustrated as follows.If , is the frequency of the run of a length j with a grey-level intensity i, N g is the number of grey-level intensities and N r is the number of runs.Then, the parameters for the run-length matrix p(i, j) can be calculated using the following equations: , , , ( ) ( )

Fraction of Imagein Runs p i j jp i j
The coefficient C in Eqs. ( 1)-( 4) above is defined as: ,

The absolute gradient (Gr)
The gradient of an image measures the spatial variation in grey-level values across the image [27].This method evaluates the relationship of variations in grey-level intensity values across neighbouring pixels as shown in Figure 4 according to the illustration by Waugh [34].A high gradient is produced when there is abrupt change, from extreme pixel grey-level intensity value to another extreme grey-level intensity value.Conversely, a low gradient is produced in gradually changing pixel grey-level values.The five parameters derived from absolute gradient are the gradient mean, gradient variance, gradient skewness, gradient kurtosis and gradient non-zeros.Conventionally, only the magnitude of the gradient is taken into consideration [27].The direction of variation, whether it is positive or negative, is irrelevant and hence the term "absolute gradient".The gradient non-zero is the number of pixels in an image with a grey-level value greater than zero, and gradient variance is the deviation of absolute pixel grey-level value from the mean, while gradient mean is the average variation in pixel grey-level value across the image [31].
The absolute gradient as a method of texture analysis find application in accentuating the boundaries of an image [27] and therefore is useful in edge enhancement.

The histogram
This is a first-order statistical analysis and uses pixel occurrence probability to calculate texture.
To illustrate the histogram approach to texture analysis, assume in an image the grey levels are in the range 0 ≤ ≤ − 1, where Ng is the total number of particular grey levels.If N(i) is the total number of pixels with intensity i and M is the total number of pixels in the image, then the pixel occurrence probability P(i) is given by [29] ( ) ( ) The probability of occurrence of a pixel of particular grey level (intensity) is called the histogram.It does not consider the spatial relationships, and correlations, between pixels [29].
The main advantage of the histogram is its simplicity by the use of standard descriptors such as mean and variance to characterize texture data.The features derivable from the histogram are mean, variance, skewness, kurtosis, percentile 01, percentile 10, percentile 50, percentile 90 and percentile 99.Some of the features from the histogram used to characterize texture are represented by the equations below: ( ) ( ) ( )

The model-based methods
In model-based texture analysis, there is an attempt to fit an image texture to a computational (mathematical) model.For MaZda ® texture analysis software, the model used is referred to as the auto-regressive model (ARM).In this model, an assumption that knowing the grey-level intensity value of one pixel, the grey-level intensity values of other neighbouring pixels can be deduced holds.In a more formal way, the ARM assumes a local interaction between image pixels in that pixel grey-level value is a weighted sum of the grey-level values of the neighbouring pixels [27].The main disadvantage of the model-based approach to texture analysis is the complexity involved in the computations to estimate the model parameters.Other models of texture aside ARM are Markov random field (MRF) and fractal models [31].

The transform methods
In the transform methods, the texture of an image can be analysed in the frequency or scale space.These methods can employ the Fourier [37], Gabor [38] or wavelet transform [39].However, the wavelet transform is the most popular because it can easily be adjusted to suit the problem at hand as desired by the user [27].Wavelet is a technique that analyses the frequency content of an image with different scales of that image.The wavelet analysis yields a set of numbers called the wavelet coefficients which correspond to different scales and frequency directions [27].Each pixel of an image analysed by wavelet transform is associated with a set of wavelet coefficients which describe the frequency content of the image at that point over a set of scales.

Texture analysis of medical images
Texture analysis of medical images remained without much clinical interest until 1998 when it took a giant leap.This was when MaZda ® , a computer program for calculating texture parameters (features) in digitized images, was developed.The software has been under development since 1998, to satisfy the needs of the participant of COST B11 European Project "Quantitative Analysis of Magnetic Resonance Image Texture" and the subsequent COST B21 "Physiological Modelling of Magnetic Resonance Image Formation" [31].MaZda ® is a very versatile software package that is capable of 2D and 3D image texture analysis.It can be used for quantitative analysis of image texture, computation of texture features, feature selection and extraction.The software also has algorithms for data classification, data visualization and image segmentation tools [40].The software was originally developed in 1996 at the Institute of Electronics, Technical University of Lodz (TUL), Poland, for texture analysis of mammograms [41].The software has been further developed and made more versatile to be used in the analysis of other textured image.It has been found to be efficient and reliable for quantitative image analysis even in more accurate and objective medical diagnosis.There has also been a non-medical application in the food industry to assess food product quality [40].Other computer softwares that are used for texture analysis of digital images are MATLAB ® and Scilab ® [42,43].Scilab ® is available to users free, while MATLAB ® is commercially available.
The medical importance of texture analysis cannot be over-emphasized.Analysis of medical image texture helps to increase the information obtained from medical images [27], which may improve diagnosis.It is an emerging aspect of medical imaging and finds applications in segmentation of specific anatomical structures and detection of lesions.The detection of lesions implies differentiating between unhealthy and healthy tissues in the different organs of the body.The differentiation between unhealthy and healthy tissues implies that texture parameters obtained from medical images form the basis for computer-aided diagnosis.Just recently, it was demonstrated that texture analysis can be used in patients undergoing neoadjuvant chemotherapy treatment of breast cancer to indicate whether the patient will respond well or not.The results of that study appeared to correlate well with the final pathological outcome [34].

Role of texture analysis in computer-aided diagnosis
Many researchers have shown interest in texture analysis of medical images.The researches in texture analysis of medical images have been targeted at developing computer-aided diagnosis systems.Computer-aided diagnosis systems are gaining popularity in one way or another because of their ability to improve the precision and accuracy of characterization of lesions beyond what radiologists do by visual inspection [44].The main objectives of a CAD system in the diagnostic process are to accurately detect and precisely characterize potential abnormalities [45].This a very important step towards the effective treatment of diagnosed abnormalities.The radiologist detects and characterizes abnormalities by visual interpretation.
To do this, the radiologists must successfully integrate of two distinct processes, namely image perception to recognize unique image patterns and the process of reasoning to identify the relationships between perceived patterns and possible diagnosis.The two processes are heavily dependent on the empirical knowledge, memory, intuition and diligence of the radiologist.The approach of the radiologist is not always error-free as there are well-documented errors and variations in the human interpretation of clinical images [46].In summary of the foregoing, CAD aims to provide a computer output as a second opinion in order to assist physicians in the detection of abnormalities, quantification of disease progress and differential diagnosis of lesions [1].One important step in the generic architecture of CAD system is feature extraction (texture analysis), and thus, texture analysis is the fundamental basis of CAD at its present stage of development [1].
The human visual system can discriminate between different morphologic information such as shape and size, but there is evidence that the human visual system has difficulty in the discrimination of textural information that is related to higher-order statistics or spectral properties of an image [47,48].The human visual system if unaided has a limited number of grey levels it can tell apart.Thus, texture analysis can potentially augment the visual skills of the radiologist by extracting image features that may be relevant to the diagnostic problem but that are not necessary visually extractable [45].In the use of image texture analysis as a preprocessing step in CAD schemes, the input generation process is automated and, therefore, is reproducible and robust.Although useful to the diagnostic process, texture analysis is not a panacea for the diagnostic interpretation of radiologic images [45].The pursuit of texture analysis is based on the hypothesis that the texture signature of an image is relevant to the diagnostic problem at hand.A major drawback is that the effectiveness of texture analysis is bound by the type of algorithm that is used to extract meaningful textural features.

Decision making in computer-aided diagnosis
Texture analysis is the fundamental basis of computer-aided diagnosis in radiology and is, therefore, indispensable to the process.The main problem with calculated texture is that it produces an avalanched of outputs, especially co-occurrence matrix.The outputs need to be reduced to a manageable level so that useful information which could be used for decision making can be obtained from the further analysis.Using the MaZda ® software, feature reduction is achieved by using the Fisher coefficient, classification error combined with the correlation coefficient, mutual information [41,49] and a selection of optimal feature subsets with minimal classification error of 1-nearest neighbour (1-NN) classifier [50,51].The Fisher coefficient selects features by reducing intra-group variance and maximizing inter-group difference [52].If the above methods do not reduce the features sufficiently initially, further reduction is carried out by transforming the original features into a new feature space with lower dimensionality [40].This method is called feature extraction or projection [53] and can be achieved in MaZda ® using principal component analysis (PCA), linear discriminant analysis (LDA), nonlinear discriminant analysis (NDA) [50,[54][55][56][57] and raw data analysis (RDA).Artificial intelligence tools are used for automated decision making in computer-aided diagnosis.Such tools include different algorithms which are provided by different computer softwares.The Waikato Environment for Knowledge Analysis (WEKA) version 3.6.11data mining software is useful software equipped with many classification algorithms.It is a landmark system in data mining and machine learning [58].The software came about through the perceived need for a unified workbench that would allow researchers easy access to the state-of-art techniques in machine learning [59].
The two tools for decision making or classification in computer-aided diagnosis popular with researchers are the artificial neural networks (ANN) and k-nearest neighbour (k-NN).The ANN and k-NN algorithms are part of the resources provided in the WEKA software.Both algorithms perform supervised classifications implying that the classification is under the guidance of a human being.In supervised classification, the user selects sample pixels in an image that he considers representative of specific classes and then initiates the software to use these training sites as references for the classification of other pixels in the image.

The artificial neural network
Artificial neural networks are regarded as relatively crude electronic networks of "neurons" which simulate the neural structure of the human brain.They literally imitate the decisionmaking process of the human brain.The networks are the electronic equivalent of the human brain and are therefore trainable for improved performance.They process records one at a time as the records are fed into them and "learn" from "experience" by comparing their classification of each record with a known actual classification of the record.The subsequent classifications are therefore made more accurate by using the errors from the classification of previous records which are fed back into the network to modify the networks' algorithm.
A multilayer feed-forward neural network is the one that has one or more hidden layers.The neurons in the hidden layer arbitrate between the input and the output of the network.The source nodes in the input layer of the neural network receive the input feature vector.The input signals which are applied to the neurons in the hidden layer are made up of the neurons in the input layer.The output signals of the hidden layer can be used as inputs to the next hidden or output layer, and this process continues but terminates when the output layer produces the final output result [60].

The k-nearest neighbour
The k-nearest neighbour is a non-parametric method used for classification and regression [61].
In the algorithm, the training data set is stored, so that classifying a previously unclassified (new) record is by comparing it to the most similar records in the training data set.Simply put, in the k-nearest neighbour classification algorithm, a database in which data points are separated into several separate classes is used to predict the classification of a new data point.The data set is assumed to be in space and classification is achieved by assigning the new data point to its closest neighbour.It is a rather simple and versatile concept.

Research design and location
A prospective cross-sectional design that targeted patients clinically diagnosed with stroke and who underwent non-contrast CT (NCCT) investigation of the brain was adopted for the study.The research design and protocol were approved by the Research Ethics Committee of Nnamdi Azikiwe University Teaching Hospital, Nnewi, Anambra State, Nigeria.The study was carried in two locations, namely Onitsha, Anambra State in south-eastern Nigeria, and Ibadan, Oyo State in south-western Nigeria.Two privately owned radiodiagnostic centres were selected.The choice of the centres was to have an adequate number of patients because the centres have a high number of stroke patients referred to them for brain CT examination.

Sample size determination
The minimum sample size required for this study was determined using the Taro Yamane's formula for finite population [62]: where n = sample size; N = number of patients clinically diagnosed with a stroke who underwent NCCT study of the brain in the two radiodiagnostic centres in previous one year: May 2012 to April 2013; e = the level of precision or confidence level required. So, Within the period: May 2012 and April 2013, a total 208 patients with clinically diagnosed stroke underwent non-contrast CT of the brain in the two centres, and thus, a minimum sample of approximately 137 was calculated as shown above.

Patient selection
A total of 164 clinically diagnosed stroke patients who were referred to the two radiodiagnostic centres for CT scan and who met the inclusion criteria for the study were enlisted in the study to improve its precision.The inclusion criteria were: 1. Patients clinically diagnosed with stroke at the Nnamdi Azikiwe University Teaching Hospital (NAUTH), Nnewi, Anambra State, and University College Hospital (UCH) Ibadan, Oyo State, and peripheral private and public hospitals in these two states.

2.
Patients clinically diagnosed with stroke who underwent non-contrast CT of the brain at the two selected private radiodiagnostic centres.

3.
Patients in whose CT images stroke lesions were identified by the radiologist.

4.
Patients who met criteria 1-3 and consented to participate in the study.
All the participating patients directly or indirectly, through their relatives, expressed willingness to participate in the study by signing an informed consent form before enlistment in the study.

Equipment and softwares used
The equipment and computer softwares used include the following: Pattern Recognition -Analysis and Applications

1.
A four-slice helical Toshiba Asteion TM CT scanner with 512×512 reconstruction matrix manufactured by Toshiba Medical Systems Corporation and a two-slice Philips MX8000 Dual TM CT scanner also with 512×512 reconstruction matrix manufactured by Philips Medical Systems.The CT scanners were used to carry out non-contrast studies of the patients' brains.
2. Datamax TM digital video discs (DVDs) to copy the CT images from the scanners.
3. An HP C2000 TM laptop with 64-bit Windows 7 operating system used to view the images and perform texture analysis.

5.
MaZda ® texture analysis software version 4.7 for performing texture analysis on the images.The software was developed at the Institute of Electronics, Technical University of Lodz (TUL), Poland.

6.
The Waikato Environment for Knowledge (WEKA) version 3.6.11data mining software (Hamilton, New Zealand) used for image classification.

Patient data and image acquisition
The enlistment of patients in the study, collection data and acquisition CT images commenced in May 2013 and ended in April 2014.The patients after being clinically diagnosed with stroke in the hospitals were referred to undergo NCCT of the head to confirm or rule out the disease as the cause of their signs and symptoms.On arriving the radiodiagnostic centre, the patient or his/her relatives were approached and the study explained to them.The researcher through the request form identified the provisional diagnosis necessitating the scan.If it was a stroke, an appeal was made to the patient or his/her relatives to enlist in the study.If the response is affirmative, an informed consent form is signed by the patient or his/her relatives.There was no financial reward for participating in the study.Demographic data of the patient such as age and gender were thereafter obtained and documented.The approximate time interval between the onset of symptoms and head CT examination was ascertained and documented.Noncontrast CT images of the brain were obtained using the CT machine, Toshiba Asteion™ in one centre.In the second centre, a Philips MX8000 Dual™ CT scanner was used for the same purpose.Scans were obtained at 0.5-1 mm contiguous sections from the base of the skull to the vertex.The scan parameters used were exclusively chosen by the attending radiographer in each centre.The images were transferred from the CT archive to a DVD and then loaded into an HP 2000™ laptop for viewing using either Medysynapse™ or Microdom™, both DICOM viewing softwares.

Radiological reporting of the images
The CT images obtained were visually inspected and reported by a team of two radiologists with experiences in CT diagnosis of stroke.The first radiologist had five-year post-qualification experience as a consultant radiologist, while the second had seven-year post-qualification experience.Both radiologists reported on the images independently and were blind to each other.The reports included in the study were those in which the two radiologists were in agreement for the presence of stroke, the subtype and anatomical location of the lesions.The reports that indicated there were no radiological signs of abnormality and those that indicated neurological abnormalities mimicking stroke were excluded from the study.The anatomical locations of the lesions were identified and the lesions categorized as ischaemic or haemorrhagic lesion by the two radiologists as shown in Figures 5 and 6.The radiologist's reports contained the patient's name, identification number, age, sex, provisional diagnosis and radiological diagnosis, which contained details such as the type of stroke lesions identified, their number, anatomical locations of the lesions and geographic extent in the brain.

Texture analysis of stroke CT images
Texture analyses of stroke CT images were done using the MaZda ® texture analysis software.
The procedure for the texture analysis of the CT images is represented in the block diagram shown in Figure 7 below.Precaution was taken to ensure that machine settings which differed between cases did not affect the image during texture analysis.This was achieved by normalizing the image.Normalization process literally changes the range of pixel grey-level values of different images so that they appear to have been obtained with the same machine settings.This is called image consistency.The method of normalization prior to texture analysis was the ±3 sigma method selected from the program functions.Histogram texture parameters for the four ROIs were computed using the MaZda ® version 4.7 program.The output of the parameters computed for each CT image was saved as a comma separated value (CSV) file in Microsoft Excel for further analysis.

Statistical analyses
Statistical analyses were carried out in two stages.In the first stage, the lesioned brain tissues for which texture parameters were calculated were divided according to lesion types.The discriminating histogram texture parameters were obtained by raw data analysis (RDA).In the second stage, the normal brain tissues and lesions from which the histogram texture parameters were computed were then classified by the artificial neural network and k-nearest neighbour algorithms as normal tissue, haemorrhagic or ischaemic tissues.The classifications were then cross-validated with the radiologist's report as gold standard using the receiver operating characteristic (ROC) curve analysis.Raw data analysis of computed histogram texture parameters was performed with MaZda ® and classification of brain tissues with WEKA 3.6.11.

Feature reduction
In order to reduce the computed histogram texture parameters to only the ones useful for further analyses and eliminate redundant data, the Fisher coefficient was used.The Fisher coefficient reduced the intra-group variance and maximized the inter-group difference.It is a feature of the MaZda ® texture analysis software.

Feature extraction
The histogram texture parameters computation reports on the selected ROIs saved in Microsoft Excel files were loaded into MaZda ® , first according to lesion type and in combined lesion form, and raw data analysis was performed on them.The best discriminating texture parameters were extracted through the raw data analysis and displayed in a three-dimensional (3D) feature space.The process also classified the ROIs as that of normal tissue, ischaemic or haemorrhagic lesions using the best discriminating texture parameters.In this process, the ROIs in space were picked one at a time and assigned a class to which it belonged with the radiologist's interpretation taken as the expected ideal outcome.

Artificial neural network and k-nearest neighbour classifications
A multilayer feed-forward neural network and k-nearest neighbour algorithm were used to classify brain tissues as lesions, according to lesion type or normal tissues.For the purpose of classifying ROIs into normal brain tissue, ischaemic and haemorrhagic lesions using the knearest neighbour algorithm, a value of 1 was chosen for k.The Waikato Environment for Knowledge Analysis (WEKA) version 3.6.11data mining software was used to perform these classifications.Both algorithms were trained by creating a model on retrospective data before applying them to a test data.
The performance of the neural network and k-nearest neighbour algorithms in classifying the ROIs as normal brain tissue or lesioned and according to lesion type was cross-validated with the radiologist's report using the ROC curve analysis.The accuracy, sensitivity, specificity, positive predictive value and negative predictive value were determined from the ROC curves plotted.The parameters from ROC analysis were calculated.

Results
The raw data analysis was used to analyse the data from histogram texture parameters.The raw data analysis was discriminated between the various ROIs as normal brain tissue, ischaemic stroke lesion or haemorrhagic stroke lesions.The classifications of the ROIs obtained in the discrimination are shown in the 3D feature space diagram (Figure 8).In the figure, the ischaemic lesion is represented by 1, haemorrhage by 2 and normal brain tissues by 3. The discriminating histogram parameters were the mean, 90 percentile and 99 percentile as shown in Figure 8.The result of the raw data analysis shows that histogram texture parameters were very accurate in discriminating between normal brain tissues, ischaemic lesion and haemorrhagic lesions as shown in Table 1 and illustrated in Figure 9.

Discussion
Medical image analysis techniques play very important roles in several radiological interpretations.In general, the applications involve the automatic extraction of texture features from images which are then used for a variety of classification tasks, such as distinguishing normal tissue from abnormal tissue [33].
In this study, histogram parameters were computed for the selected ROIs chosen from stroke lesions and adjacent normal brain tissues using MaZda ® .The whole process involved computation of histogram texture parameters, feature selection or reduction and raw data analysis to extract discriminating parameters namely the mean, percentile 90 and percentile 99, were the best discriminators.They achieved very high accuracy in discriminating between normal brain tissues, ischaemic and haemorrhagic stroke lesions.According to the result of a previous study, histogram features when used with Radial Basis Function of Nerve Network (RBFNN) achieved accuracies of over 80% in classification brain of tissues [63].The histogram measures the frequency of occurrence of the different grey-scale patterns throughout the image by moving in steps of one pixel across the image.This approach is attractive for its conceptual simplicity and most people are at ease with it.The result of this study shows that histogram is highly accurate in discriminating between normal brain tissues and lesions, and between ischaemic stroke and haemorrhagic stroke lesions.In another similar study, grey-level cooccurrence matrix features were used in automatic detection of ischaemic stroke [64].Four different algorithms were used, namely decision tree, artificial neural network, k-nearest neighbour and support vector machine (SVM), and the results were quite similar to ours.The sensitivity was 93% for decision tree, 98% for artificial neural network, 96% for k-nearest neighbour and 98% for SVM, while specificity was 90% for decision tree and artificial neural network and 100% for k-nearest neighbour and SVM.The accuracy of detection was 92% for decision tree, 96% for artificial neural network, 97% for k-nearest neighbour and 98% for SVM [64].
The results of ROC curve analysis of the performance of the artificial neural network and knearest neighbour classifications of brain tissues based on data obtained from the histogram show that histogram-based texture parameters are highly accurate.A classification accuracy of over 90% was achieved, and the weighted average sensitivity, specificity and area under ROC curve of almost unity were recorded for both artificial neural network and k-nearest neighbour.Correspondingly, the false positive rate (referred to as fall-out in machine learning) and false negative rate in both methods were very low.Sensitivity and specificity are important measures of the diagnostic accuracy of a test [65].A diagnostic test with high sensitivity is useful in ruling out a disease condition when the test result is negative.Correspondingly, a diagnostic test with high specificity is useful in ruling in a disease condition when the test result is positive.The foregoing explanation of the importance of sensitivity and specificity in diagnostic test performance can be applied to the present study which was aimed at being used for automatic detection of stroke lesions.
Studies similar to ours have been carried out in the past with quite good outcomes.In one such study, classification of stroke lesions into acute infarct, chronic infarct and haemorrhage on non-contrast brain CT were done [23].The researchers used histogram-based comparison and wavelet energy-based texture information to classify stroke lesions.In a study to propose a method for automatic diagnosis of abnormal tumour region present in CT images using wavelet-based statistical texture features and support vector machine (SVM) for classification of brain tissues, the researchers obtained a very high classification accuracy [66].In another study, using extracted texture features from CT images with inductive learning techniques and Radial Basis Function Neural Network, brain tissues were classified as normal and abnormal with very high accuracy [63].
In this study, comparison of artificial neural network and k-nearest neighbour classifications of brain tissues showed that histogram-derived data achieved the same classification performance with both algorithms.This implies that either of the two algorithms can be used for classification and therefore may be used in real clinical situations.Histogram method of texture analysis is a rather simple concept and may be found attractive by many researchers with a view of developing computer-aided diagnostic softwares.The present database could be used in building a computer-aided diagnosis tool for stroke based on content-based image retrieval similar to that proposed by Yuan et al. [67].
The computer-aided diagnostic tool tries to emulate the radiologist's visual inspection and interpretation of brain CT images or any other image it has been presented with depending on the case under investigation.Classification is typically accomplished by using a decision or discriminant function [68].In this study, supervised classification was carried using the artificial neural network [69,70] and k-nearest neighbour [71], two algorithms popular with researchers in artificial intelligence in medicine.The performance of the artificial neural network and k-nearest neighbour algorithms in classifying brain tissues in non-contrast brain CT into normal, ischaemic and haemorrhagic lesions was evaluated using the ROC curves.In the ROC curve analysis, the classification of data points as belonging to normal brain tissue, ischaemic stroke or haemorrhagic stroke was cross-validated with the radiologist's identification of stroke lesions and normal brain tissues.Receiver operating characteristic curves are used to compare the diagnostic performance of two or more diagnostic tests [72][73][74] and also to discriminate between diseased and normal cases.With data from the histogram texture parameters obtained in this study, there was no difference in the results of ROC analysis of the classifications using the artificial neural network and k-nearest neighbour.This implies that both algorithms can be used with histogram-derived data to build automatic diagnostic tools for stroke.
The following factors may affect a generalization of the result of this study.So, its use should be with the following points in mind: 1.This study was not hospital-based.It was conducted in two radiodiagnostic centres, and the patients were carefully selected.The research conditions may therefore not reflect the actual clinical situation.

2.
Sensitivity and specificity levels in this study were high but not 100% implying that a computer-aided scheme can make mistakes.This study recognizes this fact, but it did not consider how the mistaken cases may be identified.Sensitivity is rarely 100% especially because of the wide variability in lesion and background appearance [75].It may be the case that majority of the computer-aided detection schemes may never be trained with enough cases to "see" all possible variations in a given target lesion.Even for a scheme that uses artificial neural networks and continues to learn with each successive case they analyse, the sensitivity of 100% may not be achieved [75].Thus, computer-aided detection systems should be used with caution and it ideally should not completely replace visual inspection and interpretation.Such systems are meant to complement visual inspection and interpretation.Heavy reliance on computer-aided detection system to detect and classify lesions may alter the normal search and decisionmaking processes [76].

3.
Only stroke cases confirmed at CT were evaluated in this study.Clinical mimics of stroke were not included, and therefore, it is not possible to tell if this method can distinguish stroke from its clinical mimics.

4.
The post-ictal intervals before CT imaging were not captured, and thus, the result of this study cannot be used to explain the changes in CT appearance of stroke lesions with time.
In view of the findings of this study, a larger-scale study in an actual clinical environment is recommended.This study will evaluate the performance of this proposed automatic method of detecting and classifying stroke lesions and compare it with radiologist's visual interpretation.This study will also include the changes in CT appearance of stroke lesions with the passage of time.The chronological sub-typing will be crucial to identifying hyperacute, acute and chronic stroke lesions on CT.This will help neurologist to estimate the post-stroke neurological deficit that should be expected in any individual case.
In conclusion, this study has established that histogram-derived texture parameters are accurate in classifying brain tissues in NCCT images and therefore suitable for automatic detection and classification of stroke lesions using the artificial neural network and knearest neighbour classifiers.The results obtained in this study suggest that computeraided diagnostic tool for stroke diagnosis utilizing histogram-derived texture parameters may be ideal.

Figure 1 .
Figure 1.An illustration of the pixel concept of digital medical images using a cranial CT.

Figure 2 .
Figure 2.An illustration of the grey-level co-occurrence matrix concept of texture computation.

Figure 3 .
Figure 3.An illustration of the grey-level run-length matrix concept of texture computation.

Figure 4 .
Figure 4.An illustration of the gradient concept of texture computation.

Figure 5 .
Figure 5.A non-contrast CT image showing left cerebral ischaemia (arrows).Note there is a small area of ischaemia on the right parietal lobe.

Figure 6 .
Figure 6.A non-contrast CT image showing left cerebral haemorrhage (arrows).Note the marked compression of the right and left ventricles.

Figure 7 . 8 .
Figure 7. Block diagram illustrating the analytical procedure.All the images in which lesion appeared were loaded into the computer program and analysed.Four regions of interest (ROIs) in each CT image that demonstrated the lesions were selected for analysis.Two ROIs each represented the lesion and normal brain tissue as shown in Figure 8.The lesioned brain tissue contained ROI 1 and RO1 2, while the adjacent normal brain tissue contained ROI 3 and ROI 4 as shown in Figure 8.

Figure 8 .
Figure 8. Illustration of the method of selection of the regions of interest (ROIs).Note that ROI 1 (red) and ROI 2 (green) are on ischaemic tissues on the left cerebral hemisphere, while ROIs 3 and 4 (blue and sky blue) are on normal tissues on the right cerebral hemisphere.

Figure 9 .
Figure 9.The distribution of ROIs in 3D feature space using data obtained from the histogram.

Table 1 .
Classification accuracy of the ROIs by raw data analysis.