Platelet fatty acids in MD
This work was born from the need to verify an intuition: could platelets’ fatty acids, rather than the fatty acids of other cellular elements, in some way, be potential markers of Ischemic Cardiovascular Disease and Major Depression? For this reason a research work has been set up that, besides the aid of the most advanced statistical methods, has also made use of the Artificial Neural Networks (ANN), in the shape and form of a Self-Organizing Map (SOM). The aim was to compare a group of apparently healthy subjects with two groups of subjects with different clinical diagnoses, one of depression and the other of ischemic heart disease.
From the statistical evidence of the numerous parameters affected, we thought it would be appropriate to develop the SOM. The field of investigation has been narrowed down by reducing it to three of the fatty acids characterizing the two diseases and the pathological subjects (depressed and ischemic) and the normal ones have been placed onto the map obtained by the SOM, clearly distinguishing one from the other.
Naturally conjectures, hypotheses and convictions accrued from long examinations and discussions of the data.
Having at our disposal two neural networks that had proven to be effective in classifying the two pathologies, we were naturally curious to see how the two groups would behave at the moment when their data were fed into the networks, but crossing them (the ones of the ischemic patients in the depression network and those of the depressive patients in the ischemic network).
The result was found to more or less uniformly corroborate the literature data, even if, of course, we are aware of the fact that it will be crucial to recruit from the field significant samples of patients affected by both clinically verified ischemia and depression. Also, we must not forget that in order to get an optimum recruitment, significant samples of normal subjects will also have to be selected, non-cardiopathic depressives and non-depressive heart disease patients. The four groups will obviously have to be balanced for age and sex.
Caution, therefore, but accompanied by the evidence, by the concordances that are constantly observed in the SOM and that will have to be repeatedly examined in the light of new and increasingly more consistent testing, that, even seems to justify the interpretative possibilities of the connections between depressive illness and ischemic cardiovascular disease.
The fact remains that the markers identified are the expression of the pathology of reference and that anyone who finds himself in the position of the network corresponding to the disease can be recognized to be at high risk. This could be stated much sooner than the moment when the clinical evidence becomes available.
We have been comforted in this evaluation by what has been shown by some patients added to the test for their correspondence between clinical evidence and placing of input vectors of the patients’ data in the network.
The investigations performed on other subjects, for example, children, young sportspeople, subjects with morphea, scleroderma etc., have shown unthinkable matches with the literature data relating to the risk of the diseases investigated.
For children, in particular, an absolutely original result has been found. In other words, the fact that the platelet stearic acid is practically double if compared with that of all the other subjects, and this point, takes on great importance, precisely in the safeguarding of the child in respect to ischemic cardiopathy and, probably, other possible pathological phenomena. The child loses this condition when he or she becomes a young adult.
To conclude, we can say that Major Depression and Cardiovascular Ischemic Disease benefit from an unequivocal biochemical typification that can be recognised through the identification of fatty acids markers of the platelets and by measuring its concentration. The platelets bring with them all those elements that mime the neuron and that can alternate the physiological hemocoagulative response, thereby actually becoming the great mediators between the brain and the heart.
Before the new results, it becomes almost maniacal to search for bibliographical references that, for better or for worse, are (but may also not be) comforting for the statements and/or concepts expressed by the researcher.
This is because novelties are always the subject of diffidence.
Before going to press, we have detected, following a very difficult search, some working hypotheses in some way capable of conceptually motivating all of the work performed, which we report hereunder:
L. S. Schneider, Principles and Practice of Geriatric Psychiatry (Second Edition), 2002:
It is important to consider that major depression is characterized and defined by descriptive - not biological - criteria, making it unlikely that a neurochemical finding could adequately characterize the disorder. Major depression is heterogeneous in its expression, possessing various phenomenology, family history, and course. In the elderly, a neurochemical characteristic of depression would have to be specific enough to distinguish from dementia, or from secondary depression. In addition, neurochemical differences would be expected between late-onset and early-onset depression, or delusional and non-delusional depression, thus helping to validate these putative subtypes.
Steven J. Garlow, M.D., Ph.D., Charles B. Nemeroff, M.D., PhD., Emory University, General Clinical Research Center, Period: 10/30/74 - 11/01/02, NIH:
Platelets from depressed patients are produced in a "unregulated" state, with increased amounts of a number of transcripts that encode platelet specific, the result being the platelets are more reactive and prone to thrombus formation, 2) the platelet serotonergic system, in particular the 5-HT2A receptor, is altered in depression which in turn contributes to the increased platelet reactivity observed in depression, and the relevant alteration may occur in the megakaryocytes and 3) alterations in the concentration of one or more humoral factors (interleukins, cytokines, stress hormones) orchestrate the alterations in platelet reactivity and serotonergic functioning observed in depression.
Steven J. Garlow, M.D., Ph.D., Emory University, General Clinical Research Center, Period: 10/30/74 - 11/01/02, NIH:
We hypothesize that platelet function is altered in depression resulting in platelets that are excessively prone to enter the clotting cascade and hence increase risk of heart disease. We have identified a series of megakaryocytic cell lines that express a number of platelet markers and the 5-HT2A receptor and SERT. We have demonstrated a direct regulatory interaction between the 5-HT2A receptor and the expression of a number of important platelet genes. Gene expression in megakaryocytes is regulated by a series of cytokines and growth factors including IL-3, IL-6, IL-11, TPO, and SCF. These factors are being tested for their ability to regulate the transcription of the 5-HT2A and SERT genes in megakaryocytes.
We think that we have identified, to a reasonable degree of certainty, the biochemical rules that govern the platelets in the diagnostic determination of Major Depression and Ischemic Cardiovascular Disease. Obviously, we are convinced that the disease, in this specific case Major Depression and Ischemic Cardiovascular Disease, is never linked just to biological factors but it is always the synthesis of biology and culture, as every medical anthropology teaches.
2. Medical-anthropology remarks
“Another, not less important, goal could be that of easing the pain in mental disorders. Outside the medical field, the problem of the suffering due to personal and social conflicts seems to find no solutions. Today we tend to make no distinction and to adopt the medical modus operandi to eliminate every kind of inconvenience. The supporters of this trend can make use of the following fascinating observation: if a rise in the levels of serotonin can, for example, not only treat depression but also reduce aggressiveness, making the subject less shy and more confident, why don’t we try to make the most out of it? One could say that only a self-righteous spoilsport could deny another fellow human being the benefits of these miraculous medicines…
But obviously the problem is that the choice is not clear for many different reasons. Firstly, we don’t know the deeper biological effects of the drugs. Secondly, the possible consequences of a drug large-scale use are still unknown. As a third and perhaps most important point it could be argued that the suggested solution for the problem of personal and social suffering must tackle the causes, the origin, of personal and social conflicts if it is to work effectively in the long-run. Otherwise this solution will work for symptoms but it won’t get to the roots of the malaise.” (1) (This passage has been independently translated into English from the Italian Edition: A. Damasio, L’errore di Cartesio. Emozioni, ragione e cervello umano, Adelphi, Milano 1995. For the official English version, see the English edition in the bibliography: 1).
This dense passage by Damasio leads to two final reflections:
the roots of the malaise should never be considered either merely biological or merely psychological, but rather human, deeply human. If you approach the human pain with an auto-referential knowledge or with knowledge displayed as pure, it becomes impossible, ab origine, to understand it (in this sense, the pure biomedicine is due to fail);
the understanding of each and every human experience, especially painful experiences in all their polychrome, iridescent images, means understanding the meaning, that is to say understanding those biological, existential, ethical, cultural, spiritual dynamics embodied in every single way we live our lives. And yet, the assumption that it is mainly up to philosophy (because of its very nature), to the arts in all their declensions, to theology and to religion to understand the meaning, doesn’t preclude the possibility of science having a say in the matter. Indeed, as the man is one, the science is one, too, and we are talking about human science, with its wide range of meanings: from a biological and chemical meaning to a spiritual one, following epistemological paths that cross and enrich each other, just because the man is biology and soulology, neurophysiology and psychology, flesh and soul, blood and essence. So, philosophical, ethical or theological meanings do exist, just in the same way as biological, chemical or biochemical meanings exist. The biologist or the chemist who’s running tests on, for example, the platelet membrane fatty acids is perfectly aware, or should be aware, that all his work can’t be separated from the man considered as a bundle of meanings, and that in lab he can grasp, of all these meanings, only the biological one. The same goes for the philosopher or the theologian who is perfectly aware, or should be aware, that the theoretical, the existential or the spiritual analysis takes shape in the biological dimension, whose progressive cognitive exploration, in turn, enriches the spiritual research. But now, after this important remark, let’s come back to the biochemical discours.
Platelets, Fatty Acids, Major Depression, Ischemic Cardiovascular Disease
Numerous epidemiological and clinical data suggest that variations in the fatty acid composition of erythrocyte cellular membranes can be correlated with major depression. In particular, an alteration in the Arachidonic Acid (AA)/ Eicosapentaenoic Acid (EPA) ratio has been reported. The fatty acid composition of the cellular membranes determines its fluidity. The physical properties of the receptors and the enzymes, such as adenylate cyclases and the phospholipases, are influenced by the fluidity of the cellular membranes and these effects have a certain relevance in depression, in that the final responses to the neurotransmittor stimulation depends on the membrane’s equilibrium. In fact, by virtue of the presence of double bonds in their molecule, omega-3 fatty acids have a folded over structure, they occupy a higher volume and they make the membranes, apparently more fluid (in fact, omega 3 fatty acids are less fluidizing of omega 6 by the saturated chain length from the first double bond). That aspect explains, in literature, why these compounds are thought to be effective as anti-depressives and mood stabilizers (2, 3, 4, 5, 6). The debate is still open and probably the antidepressant function of omega 3 fatty acids may be attributed to other reasons.
This fact is also connected to the neurochemical theories of depressive disease that see the involvement of various neurotransmittor systems; the new research could highlight the fact that the imbalance involves the receptor functions and some secondary intracellular messengers (7). Other authors (8) have found negative correlations between the erythrocyte level of EPA, positive correlations between the AA/EPA ratio in the membranes and the score of the Hamilton Rating Scale, which measures the severity of the depressive symptoms. A high proportion of AA and a low one of EPA in the erythrocyte cellular membranes can bring about a hyperproduction of eicosanoids that derive from the PUFA n6, such as the prostacyclines, leukotrienes and the inflammation mediators with an increase in the oxidative stress (9).
Some further food for thought is the problem of the depression at the outset in advanced age. Depression has a lifetime prevalence of about 15% and most of all it affects the age-range between 18 and 29 years. Henderson et al. (10), using the ICD10 criteria, found a 3.3% prevalence in the population aged over 70 years. Depression is ten times more frequent in the elderly bearers of organic diseases than in the healthy ones, although specific biological bonds were not evidenced between the diseases. The internal diseases that are most frequently complicated with the outset of depressive symptoms are hypo- and hyperthyroidism, Cushing’s syndrome, viral infections, lymphomas, carcinomas of the pancreas. Even some drugs can trigger a depressive episode (reserpine, alpha-methyl-dopa, clonidine, propanolol, digitalis, steroids, levo-dopa, and some anti-tumor drugs). The high prevalence of depression during other neurological diseases such as Alzheimer’s also stands out (between 15% and 20% of the patients with AD diagnosis present a major depressive picture, up to 50% present less serious depressive symptoms), and vascular dementia, Parkinson’s (40%-50% of patients affected by PD present an accompanying depression that is not in relation to the seriousness of the motor impairment). From a symptomatological standpoint, the depression that occurs in advanced age is prevalently characterized by somatic or anxiety symptoms, rather than by a marked drop in the humoral moral tone, by a marked apathy, obsessive ruminations, sleep disorders. Often the symptoms of a depression at the outset in advanced age mimic the cognitive deterioration typical of dementia (to the extent that we speak of pseudo-depressive dementia), posing large problems of differential diagnosis (11). Although the biological bases of depression in the elderly person are not yet known, it is possible to hypothesize, given the symptomatic confusion with the dementia picture and the high incidence of major depression in the neuro-degenerative diseases, that it is a question of a particular nosographic entity in relation to depression in adult age and that cerebral aging constitutes a factor of vulnerability (if not indeed an etiological factor).
The new frontiers of research take into consideration the addition of omega-3 and other anti-oxidant substances in the diet for the treatment of depression. It has been hypothesized that the omega 3 fatty acids have mood stabilizing properties with a mechanism akin to that of lithium and valproic acid, reducing the turnover of the arachidonic acid and modifying the transduction pathways of the neuronal signal. In their action on bipolar disorders, the omega-3 fatty acids largely resemble lamotrigin, that is, they seem to have stabilizing and anti-depressant properties (12). Biochemical studies have shown that a high-dose oral administration of omega-3 leads to their membrane incorporation. The increase in the concentration of omega-3 in the membranes seems to suppress the transduction pathways of the signal associated to the production of inoxitol-3-phosphate, which is the second messenger associated to numerous neurotransmittor systems, such as the serotoninergic one (rec, 5HT2). Another mechanism proposed is that of the calcium-antagonist, by means of the calcium channel block. Studies on this effect derive from the cardiological literature. Extrapolating the data on the cardiac mechanism to the SNC, an increased activity of the brain phospholipases can be supposed during the maniacal phases, with an increased release of fatty acids as second messengers, which triggers a cascade of events culminating in the release of calcium from the cell deposits (and the consequent activation of protein kinase, which in turn activate various enzymes, amongst which those dedicated to the gene transcription). The blocking of the calcium channels by the omega-3 could reduce a process of hyper-activated signal transduction (anti-kindling?). The omega-3 fatty acids also produce a direct inhibition of the protein kinase C (PKC), with an action akin to that of the valproate. Lastly, the omega-3 fatty acids inhibit the production of pro-inflammatory cytokines. The concomitant intake of antioxidant vitamins (Vitamins C and E) would optimize the effect of the omega-3, preventing their oxidation (13). Vitamin E has been used at high pharmacological doses in the treatment of disorders such as Parkinson’s, Alzheimer’s and retarded dyscinesia. Clinical studies have shown that the use of vitamin E brings benefits in the treatment of AD, but that it does not have effects in retarding the progression of Parkinson’s and retarded dyscinesia (14).
Melatonin also seems to have a ‘scavenger’ activity vis-à-vis the hydroxylic radicals besides its known properties of gonad function and biological rhythm regulation. This fact suggests that the melatonin could interfere with the neurodegenerative processes that affect the formation of free radicals and the release of amino-acid exciters (15).
Depression and the cardiac arrhythmias are linked to the autonomous nervous system as has been seen in the measurement of heart beat variability.
The transport protein of serotonin and the areas promoting the transport of serotonin ligands are shared by the platelets and the cerebral neurons so that central as well as peripheral effects may be expected.
Platelet activation is significantly higher among the depressed patients both in the presence and the absence of heart disease.
All of this shows the existence of a common path to the ischemic event both at the cerebral and the cardiac level by which the depression, more by the cerebral way than mental-behavioural, increases mortality. Although the routine screening for depression in the conditions of primary intervention is controversial, the incidence and the effects of depression among the heart patients and the stroke survivors are persuasive arguments for performing a screening among these groups.
While the post-MI depression has been well documented, depression as a risk factor for CHD remains controversial. Some studies have found a positive correlation between depression and the development of CHD in women alone, but other studies have found that depression was associated to an increase in the CHD risk in men. Confluent variables such as physical functions, hostility, anxiety and somatization involve evidence of a possible relationship between cause and effect. The parameters, the depression symptoms such as fatigue, the lack of interest in activities, the increase or loss of appetite, the psychomotor retardation or agitation, concentration disorders, low self-esteem, a depressed mood, and recurrent suicide attempts, can themselves be the precursors to CHD.
Vast as well as exhaustive scientific research has been performed on the relationship between depression and ischemic cardiovascular disease (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28), and it has also dealt with the study of the fatty acids evaluated in several bio-humoral fractions of the human organism (plasma phospholipids, cholesterol esters, erythrocytes, etc.) in order to identify the involvement of the different acidic components, in particular the saturated ones, on the determinism of the atherogenetic phenomenon and of those aspects of hemocoagulation that induce a thrombogenetic risk (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40).
3. Biological plausibility of the platelet role between depression and cardiovascular disease - fragments from the scientific literature
…It is well known that platelets possess metabolic capacities for fatty acid synthesis and that they possess the neurotransmittor receptors. These characteristics make the platelets, in the view of many researchers, an element akin to the neuron. With scientific simplification, a “fragment of circulating brain” and they differ substantially from the erythrocytes … (41)
…The platelets perform an important role not only in the hemostasis but also in the physiopathology of the ischemic coronary disease. Recent results suggest that the platelets are affected by different stress agents including the psychological ones and that the platelets offer an important advantage in the understanding of the neurophysiology of the various psychiatric disorders.
Research is described relating to the use of platelets as a ground for investigating the relations between stress and cardiovascular disease, apart from psychopharmacological research. There is evidence to propose circulating platelets as a model for bioaminergic neurons. There are many similarities between platelets and neurons relating to the metabolism of serotonin, and it is possible to extend this research model to other neurotransmitters such as dopamine, GABA, glutamate, etc. The reason for such similarities can be traced back to the common embryogenetic origin of the two different cells. Some modifications of the platelet functions have been observed in the psychiatric syndromes and the bond between coronary pathology, stress and platelet function is of interest for future research. Other studies might look to the thrombotic events such as stroke and vascular dementia and their association with stress agents using the platelets as search means or examine the association of certain risk factors of ischemic cardiopathy, with certain personality traits, using platelets as the variable key… (42).
…The comparison with the properties of human platelets and the serotoninergic synaptosomes may be useful as a model in the study of the transport, the metabolism and the release of serotonin by the serotoninergic neurons of the central nervous system (43).
…A marked reduction in the levels of serotonin has been found in patients with major depressive disorder but not in the dystimic disorders. These modifications may represent biochemical modifications suggesting major depressive disorders and cannot be attributed to chronic anti-depressive treatment … (44).
The reduction in the serotonin levels in depressive patients could be traced back to their psychobiological distinction, which involves an abnormal metabolism of the biogenic amines in the brain… (45). The identification of the peripheral markers for the psychiatric illnesses is important if we want to implement an improvement in the diagnosis and the treatment of the diseases. The response of platelet intracellular calcium consequent to the neurotransmitter stimulation has been used as a peripheral marker of psychiatric illness. There is evidence of the appropriateness of extending the use of the platelets as a peripheral marker. The depressed patients show a rise in the basal platelet activation as compared with normal subjects and an elevated susceptibility to platelet activation could be the mechanism that makes depression a significant risk factor for cardiovascular and cerebrovascular disease… (46, 47).
…Fatty acids other than the omega 3 can interact with the metabolism of the eicosanoids and influence the platelet function. For example, there is evidence that diets rich in unsaturated fatty acids such as linoleic and oleic acid can reduce the trend to thrombosis replacing the Arachidonic acid in the platelet phospholipids, diminishing the in vitro production of the A2 thromboxanes and the platelet aggregation. Nevertheless, there is little evidence that the platelet function, in vivo, is affected by these diets…(48)
There are results that show that the linoleic acid of the diet does not increase the level of arachidonic acid in the plasma and in the platelets besides not contributing in a persistent way to the prostaglandin biosynthesis that is increased by the intake of arachidonic acid with the Western diet… (49). An elevated intake of linoleic acid can be thought to be protective against ischemic stroke, possibly through a potential mechanism reducing arterial pressure, reducing platelet aggregation and increasing erythrocyte deformability… (50).
… Oleic acid has been shown to be a powerful inhibitor of induced PAF platelet aggregation and serotonin secretion. Consequently, in order to understand the molecular action mechanism of the oleic acid, the effects of this free fatty acid have been sought for in many biochemical events associated with the platelet aggregation induced by the PAF.
… The decrease in the level of [32P] PIP and [32P] PIP2 determined by the oleic acid has been associated with an inhibition in the platelet aggregation induced by the PAF. These results suggest that the inhibition of the PAF response by the oleic acid may be one of the steps in signal transduction …
…Many literature reports suggest that olive oil can inhibit platelet function. This possible effect is of interest for two reasons: it can contribute to the apparently anti-atherogenetic role of olive oil and can invalidate the use of olive oil as an inert placebo in the studies on platelet function … (51).
…After the supplementation with olive oil the platelet aggregation and the release of A2 thromboxane were diminished, the content of oleic acid was considerably increased, and the content of arachidonic acid was significantly diminished. These data suggest that an excess of oleic acid displaces the incorporation of the arachidonic acid in the platelet phospholipids. … it is concluded that the olive oil supplement exerts an inhibitor effect on the various aspects of platelet function, “an effect that can reduce the risk of heart disease, although fish intake can also exert a protective effect…” (52).
…There is relevance in the negative effect low plasma levels of linoleic acid in the long-term prognosis after myocardial infarction … (53).
…The polyunsaturated fatty acids and principally linoleic acid can have a substantially cardioprotective effect that is reflected in mortality. The quality of the dietary lipids seems more important than the quantity of the reduction in cardiovascular mortality in man… (54, 55).
4. The experimental design
Not believing the research performed on the omega 3 fatty acids to be exhaustive in the various experimental conditions of the literature, we have oriented our research towards a complex tissue and in particular that of the platelets, which, for their structural and functional characteristics, seemed to us to better fulfil the working hypotheses of a mediation role between cerebral and cardiac phenomena.
For the experimental aim of providing better knowledge on the fatty acid-depression and ischemic cardiovascular disease, we recruited:
84 subjects (51 females and 33 males, with mean age: 60, 21, SD ± 12.27) with clinical diagnosis of Major Depression. All the patients were interviewed to confirm the diagnosis of Major Depression; the instruments used were: Clinical Global Impression (CGI), Symptoms Check List-90 (SCL-90), Medical and Pharmacological history, BMI, Structured Clinical Interview DSM-IV-SCID-IV (American Psychiatric Association 2000), and Hamilton Rating Scale of Depression (HRSD).
The severity of the depressive symptoms in the group of patients was determined by the Hamilton Rating Scale of Depression 21 items version (HRSD-21), Hamilton (1960).
50 subjects (17 females and 33 males, mean age: 68, 00, SD: ± 9.50) with diagnosis of Ischemic Cardiovascular Disease confirmed in the hemodynamic diagnostic phase.
60 subjects (38 males and 22 females, with mean age: 33.97, SD: ± 12, 40), apparently healthy, with no known or referred history of depression or cardiovascular disease.
5. Results of the study
All the ANNs tested gave essentially the same result. However, one type of ANN, known as Self-Organizing Map (SOM), gave superior information by allowing the results to be described in a two-dimensional plane with potentially informative border areas. A series of repeated and independent SOM simulations, with the input parameters being changed each time, led to the finding that the best discriminant map was that obtained by inclusion of the following three fatty acids: Linoleic Acid (C18:2 n-6), Arachidonic Acid (C20:4 n-6) and Palmitic Acid (C16:0) for the depressive condition and Oleic Acid (C18:1), Linoleic Acid (C18:2 n-6) and Arachidonic Acid (C20:4 n-6) for the ischemic condition (56, 57, 58, 59, 60).
The SOM is an unsupervised competitive-learning network algorithm, which was invented by Teuvo Kohonen in 1981–82. According to Kohonen et al. (61, 62, 63) the central property of the SOM is that it forms a nonlinear projection of a high-dimensional data manifold on a regular, low-dimensional (usually 2D) grid. In the display, the clustering of the data space as well as the metric-topological relations of the data items is clearly visible. If the data items are vectors, the components of which are variables with a definite meaning such as the descriptors of statistical data, or measurements that describe a process, the SOM grid can be used as a groundwork on which each of the variables can be displayed separately using greylevel or pseudocolor coding. This kind of combined display has been found very useful for the understanding of the mutual dependencies between the variables, as well as of the structures of the data set. In the context of this definition, a manifold refers to a topological space with welldefined mathematical properties. A particular strength of the SOM map displays lies in enabling relevant information to be 'found' rather than 'searched for'.
The fatty acids values of the 3 groups, administered to the SOM, mixing healthy and pathological individuals and hiding the information on to their own pathological condition,
|Fatty acids||Normal (average±SD)||Depressive|
|C 16:0||20.68±2.15||17.92±4.462||< .01|
|C 16:1||1.48±0.71||2.02±1.571||< .05|
|C 17:1||0.80±0.540||0.45±0.267||< .01|
|C 18:0||11.22±3.00||12.7±3.016||< .01|
|C 18:1 n9||22.19±2.08||21.14±4.134||N.S.|
|C 18:1 n7||1.82±0.64||1.89±0.870||N.S.|
|C 18:2 n6||19.40±2.69||16.71±3.359||< .01|
|C 20:3 n3||2.11±0.76||2.29±0.773||N.S.|
|C 20:4 n6||14.06±2.41||19.03±3.839||< .01|
|C 22:6 n3||2.09±0.80||1.49±0.802||< .01|
|Fatty acids||Normal (average±SD)||Ischemic|
|C18:1 n9||22.19±2.08||17.48±2.14||< 0.01|
|C18:1 n7||1.82±0.64||1.04±0.46||< 0.01|
|C18:2 n6||19.41±2.69||10.51±3.44||< 0.01|
|C18:3 n3||0.48±0.17||0.59±0.30||< 0.05|
|C20:3 n3||2.11±0.76||0.73±0.42||< 0.01|
|C20:4 n6||14.06±2.41||15.17±3.01||< 0.05|
gave, as a result, the separation of the different groups, respectively, Depressive vs. Normal and Ischemic vs. Normal, recognizing as similar those belonging to the same population and, in the meanwhile, different those belonging to one population from the other ones.
An objective, clear and extremely economic diagnostic tool has been built, which beyond the simple healthy / diseased information, offers a range of intensity, starting from a simple blood test.
When this type of situation has been noted, we thought of putting together in a SOM the fatty acid triplets of the subjects with Major Depression (SOM-ADAM), Ischemic Heart Disease (SOM-CAIN) and of the healthy subjects.
We have thus proceeded to the phase of SOM training, which, in the end, provided its self-organised map.
The distribution of the 144 subjects affected by the SOM for depression has allowed us to identify four areas: two specific ones (exclusively normal and exclusively pathological) and two mixed ones with different concentrations of pathological subjects and apparently normal subjects of the sample. The two intermediate areas (yellow and orange) have initially been interpreted as different possible levels of staging of the depressive disease and a pathological condition not diagnosed in the normal subjects, as described in the literature, moreover. The capacity of this mathematical instrument to express the pathology according to a different scale of intensity soon appeared clear. As regards the hypothesis that the positioning of the cases was also the expression of a different chemico-physical condition of the membrane, we addressed our attention towards the possibility of identifying such a condition by only considering the Arachidonic, Linoleic and Palmitic acids. Among the different characteristics observed in these fatty acids there emerged a significant correlation (r = 0.66, p < 0.001) between the sum of the 2 unsaturated acids (AA+AL) as compared with the saturate (AP). In other words, the sum of the three fatty acids can be considered, to a fair degree of approximation, constant and equal to 53.33 ± 3.43 (average±SD).
Further analysis has led us to identify an effective index (B2) that relates the saturation characteristics of the set of the three fatty acids (57).
That index has been identified in the following way:
That coefficient was calculated for all the 144 subjects studied. This has allowed us to further classify the subjects on the grounds of this index with statistical significance (comparison between the indices of the normal and the pathological subjects).
|1||Palmitic Acid||C 16:0|
|2||Linoleic Acid||C 18:2|
|3||Arachidonic Acid||C 20:4|
Ai = percentage of i-th Fatty Acid
mwi = molecular weight of i-th Fatty Acid
mpi = melting point of i-th Fatty Acid
We then asked the SOM to express the B2 expected at every point of the map.
6. Cardiovascular ischemic disease
As the fatty acids identified by the network do not represent the majority of all the fatty acids of the acidic spectrum, we tried to identify an index that could express the saturation level that could most significantly be traced back to the CAIN logic (Coronary Artery Ischemic Neural network).
The search for such an index has been performed by assessing the various ratio combinations between saturated fatty and unsaturated acids defining the result with the reported initials (SI = Saturation Index Total and Classic), B2, B4, Btot. Some of these are already known of in the literature, others are defined by the Authors.
In this way we can define:
SITOT = (C14:0 + C16:0 + C18:0) / (C16:1 + C18:1n9 + C18:1n7 + C18:2 + C18:3 +C20:3 + C20:4 + C22:6).
SICLASSIC = C18:0 / C18:1 n9
B2= f (C16:0, C18:2, C20:4)
B4= f (C16:0, C18:0, C18:1, C20:4)
BTOT= f (C14:0, C16:0, C18:0, C16:1, C18:1n9, C18:2, C18:3, C20:4, C22:6)
As can be observed in Table 4, all the parameters that express the saturation level are significantly and strongly correlated to each other. For the sake of simplicity of calculation and use, we therefore decided to use the SI Classic (hereinafter SI).
This index, akin to the case of depression with the B2, has been put into the CAIN and, as can be seen (figure 5), it has cadenced the membership of the two groups in the pathological and normal quadrant in a way corresponding to the classification effected by the RNA. For that purpose, the cut-off value proposed for the SI is 0.80.
This SI trend is distributed growingly (from the bottom to the top of the map) and is coherent with CAIN. While the area of the disease is observed with a very gradual distribution, the area of normality appears to be more non-homogeneous. That condition can probably be traced back to three pieces of evidence:
The oleic level, in any case distributed in the area of normality, is the guarantee of the same, nonetheless having in the lowest cut-off values;
The levels of oleic acid in the area of normality must in any case take account of the distribution of the other fatty acids, CAIN markers. This supports the configuration assumed.
In the pathology area, the gradual distribution seems to suggest the position of strength of the oleic acid that leaves aside the value assumed by the other AG.
It is still interesting to see how the characterisation of the oleic acid is also conditioned by a relationship with a fatty acid (stearic) not considered by the conventional statistical methods, neither by the RNA.
7. CAIN and Framingham (The Framingham score)
The Framingham score (61) is one of the best known diagnostic instruments suited to quantifying the presence of the cardiopathy risk. Constructed on a statistical basis, it is made up of 5 items relating to:
Systolic Blood Pressure;
For each item a score is assigned, according to the value relating to the subject being studied, taking account of the sex (Fig. 6).
The 10-year risk percentage thus obtained can be further classified, according to what has been suggested by Anderson et al. (64) following the Table reported below.
|0 to 5%||low risk|
|5 to 10%||moderate risk|
|10 to 20%||average risk|
|20 to 40%||high risk|
|"/> 40%||very high risk|
Now, let’s suppose we have to examine an ideal, hypothetical subject, who is in excellent condition: a non-smoker, perfect levels of Total Cholesterol, HDL, systolic pressure, etc.
Although his/her health conditions are excellent, according to Framingham, he/she nonetheless has a risk factor: age! In other words, the fact of ageing, irrespective of conditions of health, in itself constitutes a risk factor. The graph of the Framingham score in relation to age of the ideal subject is represented in Fig. 8 (in blue). It should be pointed out that, as the score is different depending on the sex, the graph has been constructed using the mean value. Moreover, for the sake of illustration’s clarity, the linear interpolation of the curve has been made to overlap with that of the extrapolated curve (in red in Fig. 8).
Let’s now consider the high risk and the very high risk levels according to Anderson’s classification. They are described for a ten-year risk over 20% (see Table 5). That percentage risk, in Framingham’s conversion table, read retrospectively, corresponds to a value of 22.5 Framingham points, for the female population. It should be point out that we have chosen the female population because it is characterised by higher scores. It is worth underlining
that, in any case, by choosing the male population as reference or the mean value of the two sexes, the reasoning that follows does not vary quantitatively and leads to the same conclusion. By reporting the Framingham risk according to age, the high risk threshold and thus highlighting the risk area, we obtain the graph reported in Figure 9.
8. Saturation Index and CAIN
In order to find a good index to describe synthetically the classification effected by CAIN, several paths have been tried. The most effective one has turned out to be that of the saturation indices, as reported above. In particular, the literature (65, 66) has suggested the Saturation Index (SI) particularly used in the oncology field, which is defined in the following way:
It is possible to express the SI levels corresponding to each point of the CAIN map and describe then by means of a graph for level curves. In this way it is simpler to intuit the dynamics of SI in CAIN (Fig. 10). It should be remembered that CAIN is constructed using Oleic, Linoleic and Arachidonic Acid while the SI value is calculated by using Oleic Acid and Stearic Acid.
By trying to evaluate the SI trend in CAIN (Fig. 11) we can see that it takes on minimum values in the lower part, corresponding to the “Normal” cluster of CAIN. From the bottom upwards, the value grows gradually until it reaches the maximum values in the upper part of the map, corresponding to the area of pathology, according to the CAIN clustering.
9. The SI of some populations
We have evaluated the mean values of SI in some subject populations. The first two were:
“YOUNG”: Young sports people, supposedly healthy [n= 45; males = 35, females = 10; age = 22.7 ± 3.7 (m±SD)]
“ADULT”: Adults, supposedly healthy [n= 60; males = 38, females = 22; age = 34.0 ± 12.4 (m±SD)]
By building a graph whose y-axis had the age and whose x-axis the SI value, we obtained, for the two populations cited, the curve reported in Fig.12.
Then, a third population was added:
“DEPRESSIVE”: Subjects with clinical diagnosis of Major Depression [n=84; males = 33, females = 51; age = 60.2 ± 12.3 (m±SD)]
The subjects of the third group are positioned in a perfectly overlapping way with the interpolation (prolonging) of the segment that unites the “Young” and the “Adult” population (Fig. 12). In other words, the means of the three populations are positioned (to a good degree of approximation) on the same line. It should be borne in mind that, while the “Young” and “Adult” populations are healthy subjects, the “Depressive” population is composed of subjects with the Major Depression syndrome, characterised, however, by being non-pathological from the cardiovascular standpoint. Hence, a fourth population was added:
d) “Ischemic”: Subjects with diagnosis of ischemia [n= 50; males = 33, females = 17; age = 68.0 ± 9.5 (m ± SD)]. Figure 15 reports the resulting graph. It appears evident that the “Ischemic” population does not result to be in line with the other populations examined being characterised by a far superior SI value.
10. SI, CAIN and the Framingham Score
In fact, we can see that, by matching the curve by age obtained in the SI graph to the age curve obtained in the Framingham, the “Ischemic” population coincides with the “Very high risk” area of the Framingham score. The two graphs seem to let transpire a certain, as yet not well defined, similitude.
But what is the cut-off between healthy and pathological (from the cardiopathological point of view) in the SI graph?
To answer this question it is necessary to ask CAIN for help. It is important to evaluate which SI value corresponds to the subdivision line between the 2 clusters (“Ischemia” and “Normal”) that CAIN has formed (Figure 18).
By overlapping the CAIN clustering with the SI level curve graph (Figure 19) we can see that the SI value corresponding to the separation line of the two CAIN areas is indeed SI = 0.8.
Now it is sufficient to report on the SI graph undergoing construction, the SI value corresponding to 0.8. This will be the cut-off of the SI values that subdivides the pathological and non-pathological subjects according to CAIN. Indeed, Figure 19 clearly shows that values lower than 0.8 in CAIN characterise healthy subjects, as opposed to those superior to 0.8 which, on the contrary, identify pathological subjects.
Once the areas of pathology and non-pathology are identified in the SI graph, it is juxtaposed with the one obtained from the Framingham (Figure 20). The match is certain.
In order to better appreciate the overlap between the two graphs we propose (Figure 21) the overlapping obtained by coinciding the two cut-off lines. The two curves submitted overlap in an absolutely coherent way.
By overlapping the two graphs, so that the SI cut-off value matches that of the Framingham Score, we obtain a match of the two underlying curves (red and green).
The following figure sums up all the positions of the individuals investigated.
What has been illustrated shows, at least as a first approximation, that the results obtained by CAIN 3 are absolutely compatible with the Framingham Score. Apart from corroborating the results obtained with CAIN even more, this obliges us to make a comparison between the two methods.
The Framingham score, constructed on a statistical basis, offers as a result a pure number that quantifies a risk. Actually, it is one-dimensional. On the other hand, CAIN shows the result on a two-dimensional map, obviously capable of giving more information. In actual fact, for CAIN we have to speak of multidimensionality because once the position of a subject has been identified, there is a great deal of information available. Without citing it all, suffice it to think that it is possible to know if any ischemic pathology is characterised by the low level of oleic or linoleic acid. It is possible to know its SI value straight away and, at least at a first approximation, the Framingham value. In other words, once the coordinates of the subject have been identified, it is possible to decide on which plane to observe it: the SI plane, the fatty acid plane, for example of arachidonic acid, etc. In any case, what is certain is that CAIN offers much more information than the Framingham. Suffice it to think, for example, that for a CAIN value one can go back to its Framingham value (albeit approximate). If, on the contrary, we have the Framingham score we can, in no way, position the subject in CAIN.
These reasons raise a question up: is it plausible to think of CAIN as a new Framingham?
Novel markers for ischemic heart disease are under investigation by the scientific community at international level.
Our work focuses on a specific platelet membrane fatty acid condition of viscosity which is linked to molecular aspects such as serotonin and G proteins, factors involved in vascular biology.
A suggestive hypothesis is considered about the possibility to use platelet membrane viscosity, in relation to serotonin or, indirectly, the fatty acid profile, as indicator of ischemic risk.
In the case of biological membranes we use the terms of “fluidity”, “stiffness,” permeability “functionality”, and “stickiness”, related or connected with biological effects of considerable importance.
Fluidity and viscosity are two terms used in physics with specific meaning: the viscosity is a dynamic property of matter and is defined as the skid resistance of two fluid layers between them, in a real system treated as a package of fluid layers superimposed (in slow linear motion), which can vary with temperature for the same molecules, while the fluidity is the opposite.
Rigidity, permeability and function are, however, characteristics of the membrane and are terms used to describe a physical and biological membrane behavior on the physiology of the cell. The viscosity of the membrane, of course, is related to the composition of fatty acids constituting the lipid bilayer (membrane folds). With reference to the membrane folds, they are usually very close and the distance may be very small in the case of saturated fatty acids, distance which tends to increase, replacing these with unsaturated fatty acids, much more as they are unsaturated.
Platelet takes up serotonin from plasma by the serotonin transporters and it is, then, secreted by the platelet dense granules during platelet activation, playing a role in platelet aggregation and vasoconstriction of surrounding blood vessels. Recent studies suggest that intracellular serotonin may also play a role in platelet activation through covalent linkage to small G proteins, activating G protein signaling pathways and stimulating platelet aggregation.
Total serotonin levels and the number of platelets have been found significantly higher among patients whith coronary artery disease .
Independently of the SOM results, among all the fatty acid profiles we have investigated (about 350 subjects), three fatty acids (Palmitic, Linoleic and Arachidonic), unexpectedly, had a constant total amount (53. 33 ± 3.43) representing the larger amount of platelet membrane fatty acids in all cases studied.
If we consider this fatty acid triplet instead of that one which has allowed the classification of the ischemic subjects we can do a further consideration.
It means that we can, also for the ischemic subjects, calculate a new index (B3) according to the B2, as above explained.
|Fatty acid||MP/Mwt||Normal subjects|
|B3 index||Depressive(average±SD)||B3 index||Ischemic|
|C 18:2 n6||-0.018||19.40±2.69||-0.348||16.71±3.359||-0,301||10.51±3.44||-0.189|
|C 20:4 n6||-0.164||14.06±2.41||-2.306||19.03±3.839||-3.121||15.17±3.01||-2.488|
As a result we obtained an index that expresses, on the basis of fatty acids detected, a coefficient of viscosity.
The result is consistent with the knowledge that relates to the membrane viscosity, especially platelets, in the conditions investigated (normal, ischemic and depressive).
This could lead to the hypothesis of the possibility to evaluate the ischemic risk considering each fatty acid concentration, within the same, identical quantity.
As shown in table 1 the B3 index is significantly higher (about 30%) in ischemic than in normal subjects and 3 times higer than in depressive subjects.
For the properties that link membrane viscosity to the platelet serotonin receptor uptake and for the role of serotonin in coronary artery disease, the evaluation of the chemical-physical characteristic should be utilized to forecast the ischemic risk, in agreement with the experimental result obtained in an Ischemic Heart Disease group of subjects through the SOM use.
The Issue of Neural Networks
(Discussion with Kary B. Mullis Nobel Prize), September 21, 2007
Let’s try this one more time, Massimo, because somewhere there must be a disconnect in our dialogue on this business of an un-trained neural network being able to spot the likelihood of coronary heart disease in a set of patients, where the only information given to the network, is the concentration of three lipids on their platelets, let’s just call the lipids 1, 2 and 3.
What I understand is that you are demonstrating that in the absence of a “training set” your program can pick the patients who have a high probability for a coronary problem just from these values.
By a training set I mean that the values for 1, 2 and 3, paired with an independent diagnosis of a coronary artery problem are presented to the computer for a certain number of your initial group. You are claiming that a “training set” is not necessary.
How, I ask, could that possibly work?
Now I do realize that lipid composition on platelets could easily have something to do with coronary artery problems, (and clinical depression). I don’t disbelieve that they could. Nobody I ever heard of predicted that, but when you told me it was true, I accepted it. But when you explained to me that your program had discovered this relationship with no reference to some external standard, no training set, I was incredulous and still am.
I simply claim that without some additional data, like a training set of data points wherein the lipid variables were paired with clinical outcomes or some similar KNOWN variable relevant to coronary artery problems, (or a super-exceptional knowledge of physiology), the neuronal net, which is a mathematical object, not an oracle, could know anything except 1, 2, and 3 (the inputs) about the patients.
In my experience, usually, the inputs to neuronal networks, trained with known data, are based on something like a thousand RNA concentrations from 100 patients, such that humans would have trouble seeing the connections, thus the need for the computer in the first place. In fact if the relationship that they discover is as simple as some mathematical expression among three numbers, then everyone is surprised but the computer program is no longer necessary. The purpose of the computer was to find those particular three parameters out of the thousand available. The analysis of a particular clinical finding in that case can, thereafter, be done by a calculator. Something like this happened at a company I was consulting for in Savannah, wherein a very uncomplicated solution appeared from a computer program called a support vector machine regarding the differential diagnosis of acute lymphocytic leukemia and chronic myelogenous leukemia. The input had consisted of several thousands of RNA levels from several hundred patients. After the result had appeared it did not require a computer to make the diagnosis. A human could look at just two relevant RNA levels, and make the call.
Until then, which two, or whether there were only two RNA levels that held the critical data was not known. The support vector machine was instructed (trained) by clinical data, not simply the RNA levels. That’s why it worked.
Computers are patient and capable of tedious calculations, but they are not capable, of ever telling you something that given great time and near-infinite patience, you could not have worked out yourself on a calculator. There is no discontinuity between classical mathematics and what is referred to as Turing Machine-like universal calculators. They are just fast, very fast, but not infinitely fast, or perhaps more importantly here, wise. What this means is that we understand how they work. There’s no mystery, only speed.
So, if your program can always predict coronary artery conditions from three numbers representing the composition of three different lipids on platelets, without ever having being trained on a set of these variables paired with some clinical indicators, there is no way that a philosopher of science could legitimately say that you had not discovered something that was unexplainable, but clearly useful. However, its validity is purely based on induction, that is, it keeps working, unless Lucio can explain how it happened, and the process can be adapted to other applications. Most scientists, on the other hand, and I have to admit in this case that I fall into the latter category, would say you are damned lucky or you are missing something important.
What you are doing isn’t in the commonly accepted sense, scientific. It is lacking in the essential quality of being largely explainable as a consequence, however subtle, of known facts, and therefore serving as a guide for other scientists to develop similar methods.
If Lucio has discovered a new principle of neural nets that allows them, independently of a training set, to distinguish healthy coronary arteries from unhealthy coronary arteries on the basis of three numbers, which to a non-biologically informed computer, are only numbers, that principle overshadows this particular use of it by a revolutionary leap, and deserves to be published not only in the appropriate medical journal, but in a computer science journal as well.
Given that this is the case: some new principle of neural network programming has been discovered and it has found ratios of three lipids that can be used to predict future cardiovascular problems, then congratulations are in order, BUT even in that case, I don’t understand why the program is still necessary to make the diagnosis.
In the case I mentioned earlier regarding leukemia, once the relevant genes had been identified, the computer program was no longer necessary.
There is one explanation that I can imagine to account for the facts, but it brings into question your judgement and that of Lucio. Since there were only three variables being considered, it is not impossible that the relationship between them, which correlates with coronary heart disease, is simple enough that it could have been discovered empirically without the use either in the beginning or especially later of a neural net. In fact, how complicated can be the relationship between three numbers? What is the computer doing? Why didn’t you compare by eye the three numbers in light of who was a coronary artery risk patient (which as you explained to me from the Framingham data was mostly just the age of the patients.)
Was something preventing you from knowing their potential coronary artery status prior to the computer study, which is not now preventing you from knowing that, and if not, then how do you evaluate your present results? Either you know now the coronary artery risk and can compare it to what your computer says, or you don’t. If you do know it now then you can use it to validate your result. If you know it now, when did you learn it? Were you unaware of it before the computer study?
Dear Dr. Mullis,
How are you? And what about Linda and Nancy? Everybody is ok, here in Italy.
Well, let’s come to the point. Now I clearly understand (I hope) the misunderstandings between you and Massimo. I’ll try to explain.
“Let’s try this one more time, Massimo, because somewhere there must be a disconnect in our dialogue on this business of an un-trained neural network being able to spot the likelihood of coronary heart disease in a set of patients, where the only information given to the network, is the concentration of three lipids on their platelets, let’s just call the lipids 1, 2 and 3.
What I understand is that you are demonstrating that in the absence of a “training set” your program can pick the patients who have a high probability for a coronary problem just from these values. By a training set I mean that the values for 1, 2 and 3, paired with an independent diagnosis of a coronary artery problem are presented to the computer for a certain number of your initial group. You are claiming that a “training set” is not necessary.
How, I ask, could that possibly work?
Now I do realize that lipid composition on platelets could easily have something to do with coronary artery problems, (and clinical depression). I don’t disbelieve that they could. Nobody I ever heard of predicted that, but when you told me it was true, I accepted it. But when you explained to me that your program had discovered this relationship with no reference to some external standard, no training set, I was incredulous and still am.
I simply claim that without some additional data, like a training set of data points wherein the lipid variables were paired with clinical outcomes or some similar KNOWN variable relevant to coronary artery problems, (or a super-exceptional knowledge of physiology), the neuronal net, which is a mathematical object, not an oracle, could know anything except 1, 2, and 3 (the inputs) about the patients. “
Kary, you are right, we had to start from a “training set”. Maybe this is the main point of the misunderstanding. It’s simply a problem of words, I believe. Make the call.
Of course we started from a “training set”. About Ischemia, it is made up by 60 healthy subjects versus 50 patients with definite diagnosis of Ischemia. So, we had a data base of 110 subjects. For each Subject, we had his Fatty Acids pattern (11 variables) paired with “an external variable”: his healthy status related to Ischemia, I mean, if the subject is pathologic or healthy.
So, we absolutely started from a “Training Set”.
I think the misunderstanding occurred when we talked about the mathematical method, the training process, in particular.
Many mathematical methods, ANNs in particular, are classified into 2 big families:
Supervised ANNs and Unsupervised ANNs, according to the training process they use.
1) ANNs, using a supervised training process, need the complete training set, that is, all variables plus the “external variable”. For example, the Multi-Layered Perceptron (usually using the Back propagation algorithm) is a supervised ANN, probably the most common. It studies the Data Base, “learning” the features of each subject, according to his “external variable”, which it knows. Mathematically, it builds an n-dimensional error surface linking all variables involved to the “external variable”. It’s called “supervised” because it corrects itself reducing, in an iterative way, the global error knowing the correct result, as if “a teacher” would correct it, at each time.
2) ANNs using an unsupervised training process, such as the Self Organizing Map (SOM), just need all variables, without the “external” one.
This is because they use a different approach: they are not trained to find the feature of the different “external” value. They just look at the data and try to put together similar subjects without considering the “external variable” (pay attention, SOMs are different from the common “cluster analysis”).
A SOM is able to do so, comparing, in a particular way, all the subjects of the data base (if you are interested, I attach an appendix to this paper where I try to explain how a SOM works, but, please, read all this paper before).
I think it could be clearer following this example.
Suppose you want to build up a system able to recognize handwritten characters. You probably start from a Data Base made of characters written by different humans (of course with different handwritings). In this case, the “external variable” is the real alphabetical letter the handwritten character represents.
You could build this system using statistical “supervised” method. But you can also use a SOM. In this case, you have to show to the SOM the different handwritten characters, MIXED AND WITHOUT TELLING WHICH ONES RAPRESENT “A”, “B”, “C” AND SO ON. You just hope the Net will set different characters in different areas, putting the handwritten characters of “A” near to each other, and far from the handwritten characters of “B”, “C”, and so on. If (as usually happens), once the SOM is trained, all “A” are near, all “B” are near and so on, the SOM is ready: it mathematically realizes that all handwritten “A” are similar and puts them near to each other. The SOM realizes that “A” are different from “B” and put all handwritten ”B” far form “A” but, again, near to each other. I mean: the “external” variable in this case is the alphabetic letter, linked to any different handwritten character. The SOM did not know this, when it was training, and when it was producing the final map. Just once the map is done, an external human observes that all handwritten characters, linked to the alphabetic letter “A” are together! And it’s the same with all other letters. Just now, the human divides the map into clusters, one for each letter (usually using a well known mathematical method such as the “Voronoi clusterization”).
Once the SOM is trained, if we give it a new handwritten character, it will be mapped near to the similar characters, very probably in the correct cluster.
So the whole training set, at the end, is absolutely necessary. It is not, for the SOM training.
By the way, common commercial OCR (Optical Character Recognizer) software, usually use really this method. Actually, a lot of common commercial software use this kind of ANN even for many other purposes.
Let’s return to Ischemia.
We had a Training set of 110 subjects. We give to a SOM the 110 subject Data Base, without saying who was pathologic and who was not (exactly as done with handwritten characters). We just told the SOM the 3 fatty acids values.
Once the SOM gave us the result (I mean the map), we just observed that pathological subjects were all mapped on the top while healthy subjects were all mapped on the bottom. But we observe it, not the SOM, which does not know their pathological status. I mean: just once the net drew its answer, I mean the map, we colour healthy with green and pathologic with red, and realize the result. Since, all pathologic subjects are into a cluster, while other are opposite, we can say that:
all pathologic subjects are similar to each other, and different from normal, at least from the 3 fatty acids involved point of view. (statement 1)
This last statement is absolutely scientific. In fact:
The SOM is a very common neural network, deeply known and widely used even in common commercial software. It appeared to the scientific community more than 20 years ago (T. Kohonen, 1981). Actually, in the biomedical field, SOMs are widely used since 10 years at least, and of course, they are absolutely accepted.
So, the protocol used, is not a mystery, nothing unexplainable, but well known and accepted from years.
Maybe it’s not so popular as a logistic regression but, at least here in Italy, more popular than Support Vector Machines.
If anyone, everywhere and whenever he wants, builds up a SOM or buys it (there are hundreds of commercial software developing SOMs) and trains it with our training set (of course, without the information about the pathological status), he will find that pathologic subjects go to a side and healthy, opposite. Probably, he will not find exactly our map, but the same result for sure. In fact, a SOM, like almost all ANNs, depends on random starting weight and on some other parameters: as you know, being familiar with ANNs, each training process is always different from another! In any case, if someone gives the same SOM parameters we used (to tell you the truth, they are those suggested by the literature, so they are the first that a computer scientist would try…) he will have exactly the same map.
As a result of the statement 1, we can add that the 3 fatty acids of the pathologic population are, in some ways, different from those of the healthy population.
Another result is, like in the handwriting character example, that we can map a new subject, whose status is unknown and, according to his position on the map, according to the cluster he reaches, we can think that he can be considered healthy or pathologic. I E: if his position into the map is in the middle of all other pathologic subjects, he can be suspected to be pathologic, or at least it’s very probable.
Now, another point:
“In my experience, usually, the inputs to neuronal networks, trained with known data, are based on something like a thousand RNA concentrations from 100 patients, such that humans would have trouble seeing the connections, thus the need for the computer in the first place. In fact if the relationship that they discover is as simple as some mathematical expression among three numbers, then everyone is surprised but the computer program is no longer necessary. The purpose of the computer was to find those particular three parameters out of the thousand available. The analysis of a particular clinical finding in that case can, thereafter, be done by a calculator.”
Well, this is not our case. In our situation we had 110 subjects, and only 11 variables (coupled with the pathological status, 12th variable). So our main problem wasn’t to find those particular parameters out of only 11 variables. Talking about Ischemia (things are quite different about Depression), we identified the 3 fatty acids, quite easily, by means of common conventional statistics (Discriminant Analysis, ANOVA and so on).
Once we found those 3 parameters, we wanted to study their dynamics, we wanted more information, a deeper knowledge of the problem and, a diagnostic tool. So we tried to use different mathematical tools such as “Cluster analysis”, Classification Trees, and so on, but in our opinion, the best one has been the SOM.
In fact, it quickly led us to a lot of important (according to us) results. We do not use “a thousand RNA concentrations” but just a simple SOM. Maybe in our case, the need for the computer is not in the first place, maybe we could reach same results “by eye” but, as you say, the main feature of a calculator is speed and our main interest is the result, secondly the method. In any case, by eyes we can’t evaluate the 3 parameters in the same time, with the same speed and precision of the SOM.
Many people are scared by the name “artificial neural network”. But ANNs are not strange objects at all! They are not difficult; they are accepted as useful tools from years and commonly used even in everyday life application. Maybe, the only trouble is the different approach they use, a bit different from conventional statistics. But the problem, if it exists, is just in our way of thinking. And, I’m sure, Kary, you have not this kind of problems at all!!!
One more question:
“There is one explanation that I can imagine to account for the facts, but it brings into question your judgement and that of Lucio. Since there were only three variables being considered, it is not impossible that the relationship between them, which correlates with coronary heart disease, is simple enough that it could have been discovered empirically without the use either in the beginning or especially later of a neural net. In fact, how complicated can be the relationship between three numbers? What is the computer doing? Why didn’t you compare by eye the three numbers in light of who was a coronary artery risk patient (which as you explained to me from the Framingham data was mostly just the age of the patients.)”
Well, a computer is not necessary to guess a handwritten character!
How complicated can be recognizing a handwritten letter?
But I ask you: can you explain the mathematical rules in order to classify a handwritten character, in an easy way? Or better: a digitized character could be expressed, for example, with a 8x4 matrix of pixels = 32 pixels. We certainly could give a formula with 32 variables in order to classify a character, but all computer scientists find it more comfortable to use a SOM instead of a formula. And I agree with them.
Well, in our problem, there are only 3 numbers, it’s true. But, when we found the SOM map, we just understood that using the 3 fatty acid, we could classify a patient but we absolutely did not know the correct rules in order to do that, yet. We realized soon that it was not a so simple problem.
Some examples of handwritten “A” in an 8x4 matrix of pixel
I try to explain. When Massimo and I studied the “rules”, we observed that a subject was pathologic if parameter 1 was low while parameter 2 was medium and parameter 3 was high. If parameter 3 decreases, the subject is no more pathologic unless parameter 2 increases too, but just up to a certain value, and so on. I mean that there are a lot of possible configurations, combinations. We agree that we could express them with pencil and paper but we simply find it more comfortable and clearer using a 2 dimensional map. In our opinion, it’s just more simple, easy and clear.
For example, let’s think about the Body Mass Index (BMI) and its application. It’s one of the simplest formulas all over the world. You just have to calculate Height/Weight2 and than check the result in a reference table. Well, every doctor simply has stupid soft wares that do it. I’m sure this could be done by eye but it’s faster and maybe easier to use a computer that shows a graph. Of course, we translated all our results into rules, and we are using them in order to continue our research. But when we have to check the healthy status of a new subject, we find it more comfortable to use the SOM, mainly when a subject falls in our border-line area.
So Kary, as you have seen, there’s no mystery, nothing is unexplainable. Everything is, in the commonly accepted sense, scientific. I have not discovered a new principle of neural nets (warning: until now!), unfortunately, because the net we used has been existing since 1981.
Maybe the reason of this misunderstanding between you and Massimo, between you and us, is me. I feel really guilty and apologize, because I was not clear in my explanations with you. You always had to talk of mathematical questions with Massimo, who is neither a mathematician nor a computer scientist! I remember that, even if we didn’t have so many occasions to talk together in private discussions, you came to me many times, for questions, remarks and so on. I apologize but my main problem is that it’s very difficult to talk with you: I’m neither a “researcher”, and I should talk with a Nobel laureate. I know, you were present when I did my first lecture, and year by year you saw me grow.
I am really sorry, Dr. Mullis, but you are considered one of the best minds all over the world, and I feel so small, in front of you… sorry.
Thank you for your time,
Dec 17, 2007
Lucio has clarified the situation for me.
I now understand your work with Lucio and agree that it is a contribution to the field of diagnostics in cardiology which is worth pursuing. There was a communication problem.
I did not understand that after the self-organizing map program had operated in an unsupervised manner on all the data from healthy and at-risk patients, sorting each individual set of the values for oleic, linoleic and arachadonic acid into one of 400 bins, according to the similarity of the relations between their components, then you, knowing which sets were (from independent considerations) at-risk, “colored” the 400 squares according to “healthy” or “at-risk.” That input from you took the place of informing the overall procedure as to which of the patterns it had found originated in healthy and not-so-healthy patients, and logical induction allows you to assume that whenever future values fall into one of those categories it will be defined.
Certainly now by examining the program statements that assign the three fatty acid concentrations to particular cells in your program, you could replace the program with a series of “if then” statements, but I understand that computers are ubiquitous and cheap. Why bother. In addition, by continuing to add more data, which you know is from healthy or unhealthy patients you might refine your colouring pattern if that is called for.
So the mystery is solved.
I’m sorry to have been so much trouble. I think if scientists still refused to talk to each other except in Latin, such misunderstandings might not happen. But then we would all have to learn Latin (even Italians) and agree on how to interpret it.
Give Lucio my best and assure him that I understand now what you are doing as a result of his explanation.
I am looking forward too seeing you next year. My regards to all.