Cortical Specification of a Fast Fourier Transform Supports a Convolution Model of Visual Perception

Currently, the full extent of the role Fourier analysis plays in biological vision is unclear. Although we have examples of sensory organs that perform Fourier transforms, e.g. the lens of the eye and the cochlear, to date there is no direct empirical evidence for its implementation in cortical architecture. However, there does exist intriguing theoretical evidence that suggests a role for the Fourier transform in a primate’s primary visual cortex (area V1) which emerges from recent developments in our knowledge of contextual modulation. This paper proposes a new Fourier transform and a specification of how this transform has a natural implementation in cortical architecture. The significance of this new Fourier transform and its specification in neural circuitry is that it provides a plausible explanation for previously unexplained observable properties of the primate vision system.


Introduction
Currently, the full extent of the role Fourier analysis plays in biological vision is unclear.Although we have examples of sensory organs that perform Fourier transforms, e.g. the lens of the eye and the cochlear, to date there is no direct empirical evidence for its implementation in cortical architecture.However, there does exist intriguing theoretical evidence that suggests a role for the Fourier transform in a primate's primary visual cortex (area V1) which emerges from recent developments in our knowledge of contextual modulation.This paper proposes a new Fourier transform and a specification of how this transform has a natural implementation in cortical architecture.The significance of this new Fourier transform and its specification in neural circuitry is that it provides a plausible explanation for previously unexplained observable properties of the primate vision system.1.0.0.1 The spatial response properties, such as orientation tuning and spatial frequency tuning, of neurons in area V1 have been known for some time (Schiller et al., 1976).For a while, it was generally accepted that these tuning functions of receptive fields are largely context-independent (De Valois et al., 1979).However, later research has demonstrated contextual influences from the region close to the receptive field (Sceniak et al., 2001); (Cavanaugh et al., 2002); (Bair & Movshen, 2004).Moreover, it has been found that this near surround region of a receptive field can modify receptive field responses through suppression (Blakemore & Tobin, 1972) and by cross-orientation facilitation effects (Sillito & Jones, 1996); (Cavanaugh et al., 2002); (Kimura & Ohzawa, 2009).It has also been demonstrated that long-range contextual modulation is as robust a feature of neural function in area V1 as the extensively studied receptive field properties of this area (Lamme, 1995).Since that time, the evidence for long-range contextual modulation continues to grow, e.g.(Zipser et al., 1996); (Lamme et al., 1998); (Lee et al., 1998).
1.0.0.2 Concurrent with this research establishing the empirical evidence for contextual modulation has been research aimed at developing functional models of V1 that are consistent with the empirical evidence.In the early 1980's the concept of convolution was employed by David Marr (Marr & Hildreth, 1980) as a model that accounted for considerable observable properties of the human vision system.Since that time further theoretical and empirical evidence has been mounting that supports such a model.In particular, it has been shown that response properties of neurons in area V1 are modeled by convolution of the input image with a family of Gabor functions (Sanger, 1988).Further research has demonstrated that the upper layers of area V1 are modeled well by a bank of Gabor filters (Grigorescu et al., 2003); (Huang et al., 2008); (Lee & Choe, 2003); (Ursine et al., 2004); (Tang et al., 2007).A related, but alternative, approach to the Gabor response functions to model simple and complex cells of V1 is the use of Gaussian derivatives (Huang et al., 2009).The common denominator of these contextual modulation models is long-range convolution.However, the issue of accepting these state of the art computational models of contextual modulation as plausible functional models of Layer 2/3 of V1 thus becomes one of addressing the cortical convolution conundrum, more specifically: how are the large scale convolutions required by such models accounted for in cortical architecture?1.0.0.3This paper's goal is to address the cortical convolution conundrum.In the process, we will propose a new fast Fourier transform, named Generalised Overarching SHIA Fast Fourier Transform (GOSH-FFT) and argue: • GOSH-FFT has a natural implementation in the cortical architecture of visual area V1, and • Its implementation provides a plausible cortical mechanism to account for the convolutions implied by long-range contextual modulation.
The rest of this paper is organised as follows: Section 2 provides a description of key neurophysiological and mathematical concepts underpinning the main thrust of this paper.Section 3 describes the Generalised Overarching SHIA Fast Fourier Transform (GOSH-FFT).Section 4 proposes a new interpretation of the physiology of long-range intrinsic connections and reinterprets previously introduced physiological concepts to propose a plausible cortical implementation of GOSH-FFT.Section 5 discusses various implications of the novel material of this paper.Section 6 summarises and concludes the paper.Section 7 is an appendix that contains a MatLab-like pseudo-code description of GOSH-FFT and a mathematical proof of GOSH-FFT. 1974).The spatial and temporal frequency tuning preferences of neurons in V1 can also be measured.The neuron's response properties measured via the receptive fields resemble spatially localized filters with a preferred orientation and spatial frequency (Schiller et al., 1976); (Foster et al., 1985); (Mikami et al., 1986); (Edwards et al., 1995) or spatio-temporal energy (Basole et al., 2003); (Basole et al., 2006). 2.1.0.4 The orientation preference of neurons can be mapped using optical imaging techniques and neurological studies, which show good agreement with single cell measurements (Blasdel, 1992); and groups of neurons which act as a single unit.It has been experimentally shown that this single unit activity of large groups of single cells are composed of 10 4 (first order approximation) interconnected cells even in one local V1 column (Siegel, 1990).
The advantages of modeling large scale neuron activity which exhibit cohort macroscopic organisation was shown by (Sirovitch et al., 1996).No model was presented but organising principles for analyzing and viewing data were presented.These techniques have revealed an intricate structure to the orientation preference map in layers 2/3.A critical feature of these structures is the orientation pinwheel (local map), in which the orientation preference of the neuronal population changes through the entire range of 180 degrees of orientations over the 360 degrees of polar range of the circular pinwheel.At the centre of the pinwheel is the singularity, which is the point at which lines of iso-orientation preference meet (Obermayer & Blasdel, 1993). 2.1.0.5 The cortex is often called the iso-cortex because of the repeated structures of which it is comprised (Douglas & Martin, 1991).The smallest scale of structure is the minicolumn which, in the monkey, consists of 30 adjacent pyramidal cell shafts in layers 2/3 packed within a diameter of 23 µm (Peters & Sethares, 1996).There are approximately 20 cell bodies within a minicolumn in layers 2/3.The next largest physical scale in V1 at which repeated structures occur is the cortical column (Lund et al., 2003).The cortical column is 200 µm in diameter and is the scale at which long-range patchy connections terminate.A number of anatomical and functional markers repeat at a larger scale of 400 µm.These include the distance between CO blobs, the approximate periodicity of the orientation preference map, and the spatial scale of a single ocular dominance band (Lund et al., 2003).Orientation pinwheels are also of approximately this spatial scale.Each of these functional markers has been shown to be closely related to the system of patchy connections, in which like response preference connects to like, and the inter-patch distance in V1 has this same periodicity of 400 µm (Bartfeld & Grinvald, 1992); (Malach et al., 1993); (Bosking et al., 1997).The largest spatial scale is V1 itself, which is some 4 cm wide in the monkey.There are of the order of 10,000 CO blobs in layers 2/3 of V1 (Murphy et al., 1998), and ocular dominance bands of 120 in number (Horton & Hocking, 1998), suggesting that the multiple response property maps with periodicity of 400 µm repeat around 10,000 times over layers 2/3 of V1.The input connections from the LGN arborize at a range of scales within layer 4C of V1.These inputs are arranged in block-like structures at the approximate scale of an ocular dominance band in layer 4C, but at a finer scale of approximately one column in layer 4C (Fitzpatrick et al., 1985).Further fine scale arborizations occur at approximately the scale of one minicolumn in layer 4A.At the global scale of the cortex, inputs from the LGN are organized into a retinotopic mapping of the visual field (Rolls and Cowey 1970;(Tootell et al., 1988).Connectivity into the layers 2/3 of V1 occurs via a number of anatomical routes, apart from the well described feedforward connections from layer 4 (Fitzpatrick et al., 1985).Other routes of information transfer include extra-striate feedback (Rockland et al., 1994); (Rockland & Vanhoesen, 1994); (Angelucci et al., 2002), long-range intrinsic fibres within V1 (Blasdel et al., 1985), as well as feedback from V1 to the lateral geniculate nucleus (Marrocco et al., 1982); (Briggs & Usrey, 2007), and diffusion of visual signal in the retina (Kruger et al., 1975); (Berry et al., 1999). 2.1.0.6 The finest scale of axonal projections within V1 are the short-range intrinsic connections that provide connectivity between neurons up to the range approximated by an ocular dominance column width, or 400 µm.Within V1, long-range patchy connections extend for 3 mm within the supra-granular layers (Stettler et al., 2002) and long-range connections within the infra-granular layers extend for up to 6 mm (Rockland & Knutson, 2001).V1 also receives feedback from at least nine extra-striate areas (Rockland & Vanhoesen, 1994).Extra-striate feedback is considered by most researchers to be the primary source of long-range horizontal interactions measured in V1 (Alexander & Wright, 2006).These feedback connections are fast conducting myelinated cortico-cortical fibres, and while they traverse distances of up to 10 cm in the monkey, the transmission delays are of the same order as intrinsic short and long-range axons within V1 (Bringuier et al., 1999); (Girard et al., 2001).These feedback connections are often in register with the intrinsic patchy system within V1, depending on the area of origin (Angelucci et al., 2002); (Lund et al., 2003).The middle temporal (MT) visual area will serve here as brief illustration of the role of extra-striate feedback in V1.Receptive field sizes in MT are about 10 times larger than in V1 at all eccentricities (Albright & Desimone, 1987).Small focal injections of tracer into V1 indicate that the sizes of the feedback fields from MT to V1 are 21-fold larger than the aggregate receptive size of the V1 injection sites (Angelucci et al., 2002).These feedback connections are an obvious substrate for the integration of global signals into V1 (Bullier, 2001).The local-global map hypothesis (Alexander et al., 2004) of V1 posits a non-local influence on the structure of local maps in V1.This hypothesis states that the global visual map in V1 is remapped to the local map scale in V1 in the form of a map of response properties, e.g.orientation, and in the case of the monkey, spatial frequency preference and colour selectivity.These local maps tile the surface of V1 and each receive inputs from a large extent of the visual field.So rather than the local map being simply a map of primitive visual features that apply to a point in visual space, the local map is a map of primitive visual features as they arise in the organisation of the visual field and become relevant to a location in visual space.As the maximum range of contextual modulation in V1 approaches the size of the visual field (Alexander & Wright, 2006), the local organisation of response properties can be influenced by the functional properties of the global visual field.

Mathematical background
Fundamental to the Fourier transform proposed in this paper is the Spiral Honeycomb Image Algebra (SHIA).This is a data structure that embodies important properties of the natural visual constraints imposed by the primate eye (Sheridan et al., 2000).In particular, SHIA has a discrete, finite and bounded domain which mimics the distribution of photo receptors on the retinal field.The underlying geometry of the SHIA is a hexagonal or rectangular lattice.In the former case, each hexagon has a designated positive integer address expressed in base seven.The numbered hexagons form clusters of super-hexagons of size 7 n .These self-similar super-hexagons tile the plane in a recursively modular manner.As an example, a super-hexagon of size 7 2 = 49 and its concomitant addressing scheme is displayed in Fig. 1 (a).The importance of the SHIA addressing scheme is that it facilitates primitive image transformations of translation, rotation and scaling.One of these transformations that has proven to be of particular relevance to the Fourier transform is one that provides rotation and scaling.It is referred to as mapping M10 in the notation of SHIA.The critical observation to make in regard to the effect of M10 is that it produces multiple 'near' copies at reduced resolution of the input image.This transform will play a critical role in the proposed FFT.

2.2.0.8
The origin of what is now called a Fourier transform dates back to 1807 when Jean Baptiste Joseph Fourier defined the notion of representing a function as a trigonometric series.The discrete version of a Fourier transform (DFT) for a one-dimensional signal is defined as: for u = 0, ...N − 1, where f (x) is a real valued function, N represents the number of elements in the signal and j 2 = −1.
The effect of this transform is to capture the spatial relationships inherent in the signal f(x) and express these relationships as the sum of sinusoidal function (frequency components).
Similarly, the discrete version of an inverse Fourier transform (IDFT) for a one-dimensional signal is defined as: (2) for x = 0, ...N − 1, where F(u) is the Fourier transform of the real valued function f (x), N represents the number of elements in the signal and j 2 = −1.
The effect of this inverse Fourier transform is to take a signal in frequency domain back to the spatial domain. 2.2.0.9 Prior to the invention of the digital computer, the Fourier series was employed as a purely analytic tool.However, since that time, the development of a class of computationally efficient algorithms, known as fast Fourier transforms (FFT), has meant the notion has become a useful computational tool (VanLoan, 1992).One of the most attractive computational properties of the FFT is its ability to process signals at higher resolution with a minimal increase in cost to complexity.Today, most of us benefit from fast Fourier transforms every day without even knowing it as these algorithms power a vast range of electronic technology such as digital cameras and cell phones.

2.2.0.10
The relevance of a fast Fourier transform to this paper is its relationship to the notion of convolution.The convolution of two functions f (x) and g(x) is denoted by f (x) * g(x) and its discrete definition is A well known result to researchers in the field of signal processing is the Convolution Theorem, which relates convolution in the spatial domain to convolution in the frequency domain.For two functions, f (x) and g(x), let F(x) and G(x) represent the Fourier transform of f (x) and g(x) respectively.The Convolution Theorem states that, In other words, the convolution of two functions in the spatial domain can be achieved by the multiplication of the functions in the frequency domain.

Generalised Overarching SHIA fast Fourier transform (GOSH-FFT)
In this section we propose a new fast Fourier transform that, as we will see later, possesses the potential to be implemented in cortical architecture and thereby address the cortical convolution conundrum.Associated with SHIA, as described in Section 2.2, is a Cooley-Tucky type fast Fourier transform, named Generalised Overarching SHIA Fast Fourier Transform (GOSH-FFT).This novel fast Fourier transform employs the transform M10, as described in Section 2.2, as the critical mechanism that turns a Fourier transform into a fast Fourier transform.

3.0.0.11
Suppose an image is represented on a SHIA of size 7 n , where For(i:0:k) 1. Apply M to the input; 2. Perform a discrete Fourier transform over a sequence of sub images of size 7 m ; 3. Apply the inverse of M i locally.
A special case of GOSH-FFT was initially described in (Sheridan, 2007), with m = 1.The significance of the initial work was that it demonstrated the intrinsic connection between the Fourier transform and primitive image transformations of translation, rotation and scaling.It also turns out that another special case of GOSH-FFT, when n = 2m, will play a critical role in the core hypothesis of this paper.This special case, named Particular SHIA FFT (PaSH-FFT), is illustrated in Fig. 3.
A complete statement of Algorithm 3.0.0.11 is written in MatLab-like pseudo-code and can be found in Section 7 along with a mathematical proof that GOSH-FFT delivers a Fourier transform.

Cortical implementation of contextual modulation
In Section 1, we reviewed state of the art models of contextual modulation and concluded that these models implied the cortical convolution conundrum.We further motivate this conundrum by observing that as a consequence of Equation 3, a convolution of the entire visual field requires every minicolumn in Layer 2/3 of area V1 to receive an input from every other minicolumn of that layer.As there just are not enough connections to convolve the visual field in one cortical step, through before being output as a convolved value.With the cortical convolution conundrum thus fully formulated, in this section we will establish a specification of a sufficient sequence of steps to address the issue.This specification will unfold in three steps.First, we will discuss how the SHIA transform M10 manifests in cortical architecture.We will then employ this manifestation to demonstrate how neural circuitry accommodates PaSH-FFT.Lastly, we will show how the cortical manifestation of PaSH-FFT supports long-range convolution.

Cortical manifestation of M10
A critical component of the fast Fourier transform, PaSH-FFT, is the transform M10.Consequently, it is an imperative of our argument that the redistribution properties of M10 be accounted for in the neural circuitry of the visual system.To this end we now argue that the required effects of M10 are accounted for by the long-range properties of patchy connections It has been argued that the orientation pinwheel comprises a unitary organisational structure or local map in layer 2/3 of area V1 (Hubel & Wiesel, 1974); (Bartfeld & Grinvald, 1992); (Blasdel, 1992).When four pinwheels are reflected about their common borders, a saddle point arises at the centre of the four pinwheels.See Fig. 4. In the macaque, the preferred response properties of V1 neurons can be influenced by activity from a wide extent of the visual field.A review of contextual modulation in the monkey demonstrated contextual modulation in V1 from long-ranges in the visual field (Alexander & Wright, 2006).The review was compiled from a number of experimental paradigms, including visual stimulation with long lines while the neuron's receptive field is occluded (Fiorani et al., 1992), surround only textures (Rossi et al., 2001) and colour patches placed distally to the neuron's receptive field (Wachtler et al., 2003).It was shown that the maximum range of contextual modulation measurable in V1 approaches a large extent of the visual field relative to a neuron's receptive field size or the local cortical magnification factor.Some experimental paradigms, such as the curve tracing effect (Roelfsema & Lamme, 1998); (Khayat et al., 2004), relative luminance (Kinoshita & Komatsu, 2001), and texture defined boundaries (Lee et al., 1998) show excitatory contextual modulation with 'tuning curves' that are flat out to the maximum distance tested.The functional connectivity that underlies this long-range contextual modulation in the monkey is likely to involve cortico-cortical feedback from higher visual areas working in concert with long-range intrinsic patchy connectivity.In the monkey, the feedback connections to V1 from higher visual areas incorporate inputs from a very large extent of the visual field (Angelucci et al., 2002); (Lund et al., 2003). 4.1.0.14 In the analysis that follows, the combination of patchy intrinsic connections and patchy feedback connections are therefore assumed to enable transfer of visual information at ranges approaching the global scale of the visual field.Moreover, we assume that the quantity and distribution of these connections are adequate to deliver the effects of transform M10 at the scale of the visual field.

Cortical manifestation of PaSH-FFT
The next step in accounting for global convolution in cortical circuitry is to explore how PaSH-FFT manifests itself in cortical architecture.The raw data, at the lowest level of PaSH-FFT, are complex numbers that must be multiplied and added.The first issue to address is to justify our assumption that the operations being performed by a neuron could be represented as complex arithmetical operations on complex numbers.Specifically, PaSH-FFT requires that a neuron can be regarded as a mechanism capable of representing and manipulating complex numbers in accordance with the arithmetical operations of addition and multiplication.There are many ways in which to interpret neuronal function in terms of complex addition and multiplication.The model presented by (MacLennan, 1999) is adequate for the purposes of this paper, where it is shown how the representation of complex numbers can be encoded as the rate and relative phase of axonal impulse.From this encoding, complex multiplication is associated with the strength of a synaptic connection as the signal passes through it and complex addition is associated with the summing of the neuronal inputs.
Thus at the lowest level of computation in our model, we assume that the operation being performed by a neuron can be represented as complex addition and multiplication.

4.2.0.15
In area V1, each neuron makes use of information available to it in real time.There is evidence that contextual information is projected to widespread regions in V1 in an anticipatory manner.Since the spatial changes in the visual field tend to be predictable from previous visual inputs, anticipatory contextual inputs can arrive in time to be integrated in an adaptive manner with ongoing feedforward input.In order to express the properties of widespread contextual integration in a more formal manner, however, we will use the mathematical convenience of assuming that each of the distinct mathematical processes to be described occurs in a step-wise fashion.This more constrained approach allows not only each distinct part of the process to be formulated, but also formulates the inter-relationships between the various sub-processes.Although it is claimed that this approach is appropriate for the purposes of this paper, it must be acknowledged that the question of how such "contextual integration" actually occurs in the neuronal system remains open.

4.2.0.16
At the finest scale of connectivity via short-range intrinsic connections, each neuron of a local map is treated as if it were connected to every other neuron minicolumn of that local map.While this is not literally true, considerations of poly-synaptic interactions at this local scale, and the real-time, anticipatory nature of visual processing means that it is a reasonable approximation of the functional connectivity.Consequently, we can assume that each neuron in a local map can sum the outputs of all other neurons in that local map which have been multiplied by unique complex numbers.We call such a collection of parallel computations a local computation.See Fig. 5, which is a schematic diagram of a local computation.
4.2.0.17Although it is commonly accepted that the cortex has a massively parallel architecture, currently there exists no comprehensive model to describe these dynamics.The absence of such a model means that in any particular cortical process, we cannot be sure which aspects of the process are parallel and which are intrinsically sequential.We will employ the following notation to show how the inherently sequential steps of PaSH-FFT can be mapped into neural circuitry.Let the symbol ⊙ denote the composition of two local computations as follows: given arbitrary local computations A and B to operate on a signal in sequence let Note that the operator is to the right of the input signal it operates on, which is enclosed in left and right parenthesis ().

4.2.0.21
With these concepts in hand, we can now identify the sequential steps of PaSH-FFT.In this special case, the size of the input signal is the square of the size of the local computation and represents two iterations of GOSH-FFT, as described in Section 3. The identification of the sequential steps also suggests the sequence of connections that the input signal must traverse.We now illustrate this with PaSH-FFT, given an input signal s, then the application of PaSH-FFT would be expressed as follows: Given the assumed neural parallelism, a count of the number of components on the right hand side of the equals sign in Equation 5, reveals that a Fourier transform of the entire visual field can be completed by the signal traversing a sequential path connecting five neurons.Likewise, an inverse Fourier transform can be delivered in cortical circuitry as follows: (s)inversePaSH − FFT =(s)P ⊙ I ⊙ P ⊙ I ⊙ P (6)

Cortical manifestation of convolution
We now progress to the issue of how convolution could be implemented in cortical architecture.To this end, we describe the various computational constraints imposed by the computational requirements of convolution and argue that the known cortical architecture satisfies these constraints.

4.3.0.22
The key to the solution of the convolution problem in the neurological domain is provided by the Convolution Theorem, the same one employed by numerous digital signal processing applications.This theorem was discussed in Section 2.2.The importance of the theorem is that the convolution of two functions in the spatial domain can be achieved by the multiplication of the functions in the frequency domain.The implications of this theorem to the cortical convolution conundrum are significant.In our model, the components of the Fourier transform of the function the input signal is to be convolved with are represented by connection weights.Then, once the input signal has been transformed to the frequency domain the required convolutions can be performed by mere multiplications.In cortical terms, each component of the signal, in the frequency domain, must traverse a connection to one more neuron to achieve the desired multiplication.However, the resulting convolution, in the frequency domain, must be transformed back to the spatial domain to complete the convolution.This is achieved with an inverse Fourier transform.Accordingly, the sequence of connections along the path that terminates in the output of a convolved value in the spatial domain is thus given by: It is assumed that each component of the input signal traverses parallel paths along the network.Thus, the net time cost to complete a convolution is equivalent to the time required for a component of the input signal to traverse a path connecting 10 neurons.This path is composed of five short-range intrinsic connections and five long-range connections.

Analysis
The plausibility of the cortical model of convolution proposed in this paper is fundamentally predicated on the assumptions made in its formulation.Consequently, we summarise these assumptions along with the arguments offered to justify them before we provide an analysis of the model's parameterisation: 1.The number of long-range patchy connections is adequate to achieve a redistribution of the global signal via transform M10.This was argued in Section 4.1 and heavily relied on a conclusion based on a review article reported in (Alexander & Wright, 2006).
2. A first order approximation of the number of minicolumns in a local map as 10,000.This was discussed in Section 2.1 and relied on the work reported in (Siegel, 1990).
3. The number of short-range intrinsic connections is adequate to consider each local map as being fully connected.This was discussed in Section 4.2 and relied on the work reported in (Siegel, 1990).
4. A first order approximation of the number of minicolumns in the global map is 10, 000 2 = 100 million.This was discussed in Section 2.1 and was based on the work reported in (Murphy et al., 1998) and Assumption 2.

4.4.0.24
The first assumption is possibly the most critical as it establishes the fundamental architectural relationship between the local and global maps and is essential to PaSH-FFT.The second two assumptions implied that a Fourier transform of the portion of the signal represented in a local map would be completed by each component of the signal traversing one cortical connection and that on completion of the first iteration of PaSH-FFT, the global signal consists of 10,000 local discrete Fourier transforms each of which is at the scale of a local map.Then, on completion of the second iteration of PaSH-FFT, the 10,000 local Fourier transforms would be transformed into a global Fourier transform of size 10, 000 2 = 100 million, which by the fourth assumption represents the size of the global signal.From this we are able to assert that the input spatial signal would be transformed to frequency space at a cost of the signal traversing a path connecting four neurons.With the signal in frequency space, we employed 194 Fourier Transform Applications www.intechopen.comthe Convolution Theorem to assert that with each component of the global signal traversing one additional connection, the state of the signal would represent a convolved signal in frequency space.This assertion was predicated on the assumption that the weight of each of these last connections represented the Fourier weight of the appropriate gaussian.The final step was to transform the convolved signal back from frequency space to the spatial domain.This was achieved with the inverse PaSH-FFT, which would be completed at the additional cost of the signal traversing a path connecting a further five neurons.Putting these three steps together, we arrived at a total path length of 10 connections for the global input signal to be transformed into a representation of global convolution of the visual field.We also note that this analysis accounted for a single global convolution of the input signal.However, there will be many global convolutions required, possibly up to one for every orientation preference and spatial frequency preference represented in a local map.Although the input spatial signal needs only to be transformed into the frequency domain once, each distinct convolution would require a distinct set of parallel paths to transform the signal back into the spatial domain.Consequently, the multiple convolutions would not necessarily result in a longer path.Accordingly, we assert that the transform, PaSH-FFT, with appropriate parameterisation, would deliver a global convolution of the visual field.Moreover, this output signal is generated within the required time constraints imposed by observed contextual modulation.Given our assumptions, the lowest number of iterations required to complete a Fourier transform is two.Consequently, 10 represents the length of the shortest path (see equation 7) possible to deliver a global convolution via PaSH-FFT.

Discussion
The signal processing literature describes many different types of fast Fourier transforms (FFT).Although any one of them represents an alternative candidate to PaSH-FFT, the problem to address is accounting for how they might be implemented within the known constraints of cortical architecture.All fast Fourier transforms need to rearrange components between their intermediate steps of multiply and add.PaSH-FFT derives its rearrangements of components with the transform M10 that, as argued, is compatible with the distribution and quantity of long-range cortical connections.If any other FFT could be substituted for PaSH-FFT in the model, one would need to account for the rearrangement phase of that FFT within the known connectivity of area V1.

5.0.0.25
Another issue worthy of some discussion pertains to the Fourier transform and the absence of empirical evidence that would irrefutably demonstrate its cortical implementation.Part of the explanation for this lack of evidence may be provided by the role the Fourier transform plays in the vision process as suggested by this paper.That is, PaSH-FFT was shown to be a means to an end (convolution), not the end itself.Consequently, the question of finding neurons through empirical experimentation that measures response properties of neurons that closely model the profile of a Fourier transform may remain unanswered for some time to come.need computation-like synchronisation or state update.Synchronisation can be provided by considering "Small World" relationships.(Gao et al., 2001) have shown that a "Small World" network needs only a small fraction of long-range couplings to obtain a great improvement in both stochastic resonance and synchronisation in network connectivity of bistable oscillators.We suggest that the known topology of the visual cortex (Zeki, 1993) if considered as a "Small World" network can provide the foregoing benefits.They would be consistent with the long-range and short-range connectivities of V1 to retinal neurons which have the required bistable oscillator condition provided by on-centre or off-centre neurons responses to light and dark and including those with colour opponency properties.The long and short-range selectivity for connections can be dynamic based on the neuron threshold levels and spatial frequency channels (Dudkin, 1992).The system updates a neuronal state only when new information indicates a change in the input signal. 5.0.0.27 The cortical implementation of PaSH-FFT was discussed in Section 4.2 where it was argued that the known connectivity of area V1 was sufficient to support its cortical implementation.
It was then argued that this implementation could deliver the required convolution in a 'small' number of sequential steps.However, the argument did not rule out the possibility of an alternative mechanism that would deliver the required convolution in fewer steps than PaSH-FFT.It would appear that without a sufficiently developed model of the brain's parallelism, it is unlikely that a mathematical proof of a lower bound for the minimum number of sequential steps could be produced.Currently, the only bound that we can be sure of is that the required convolution could not be completed in one step.The question of determining the minimum lower bound remains an open question. 5.0.0.28 The role of the frequency domain was at the heart of the solution to the cortical convolution conundrum proposed in this paper.However, the possibility of performing the convolution in the spatial domain without resorting to the frequency domain cannot be ruled out by any argument presented in this paper.Although it is unclear how this could be accomplished without resorting to a highly asymmetric model of the distribution of the connectivity of long-range connections.In any case, the search for an explanation of how the dynamic reconfiguration implied by the analysis of this paper is actually accomplished is likely to provide many different conjectures along the way.One possible avenue in this endevour might be provided by further tracer experiments such as those reported in (Angelucci et al., 2002).

Summary and conclusion
This paper reviewed the evidence for long-range contextual modulation and concluded that it implied cortical convolution at the scale of the visual field.This resulted in the need to address the problem of how such long-range convolution could be accounted for with known cortical connectivity and within known time constraints.The paper proposed a solution to the problem that emerged from a mathematical analysis of cortical connectivity to account for the implied constraints of long-range convolution.In particular, it was argued that the known distribution of the long-range patchy connections and extrastriate connections is adequate to provide the means by which the global visual signal can be transformed into frequency space where the convolution can be performed.The main thrust of the argument was that • represents a plausible cortical mechanism to account for long-range contextual modulation; • suggests a theoretical explanation of how the brain might be wired to achieve large scale Fourier analysis; • opens up the possibility of explaining other cortical processes via frequency space computations.
It is the conclusion of this paper that the processing of the visual signal in the frequency domain via a fast Fourier transform plays a fundamental role in primate vision.

Appendix
In this appendix, we present the pseudo-code for GOSH-FFT and a mathematical proof of GOSH-FFT.

Appendix A
This section presents a formal statement of GOSH-FFT in MatLab-like pseudo-code. Notation: x = Complex array specifying the input signal base = 7 α , where α is an integer greater than zero n = 7 β , where β/α , is an integer greater than zero.This section presents the mathematical proof of GOSH-FFT.
Notation: The symbol % will be employed to mean modular arithmetic.Let B = 7 m , where m is a positive integer.N = B p , where p is a positive integer.
Let M denote the compound transform M10 m from SHIA.

Fig. 1 .
Fig. 1.Displays the two-level addressing scheme of SHIA: (a) Hexagonal and (b) Rectangular.In the latter case, each rectangle has a designated positive integer address expressed in base five.An example of this addressing scheme is displayed in Fig.1 (b).
Fig. 2 (a) displays an image represented in a four level SHIA, size is 7 4 = 2401.Fig. 2 (b) represents the effect of applying M10 2 = M100 to this image.

Fig. 2 .
Fig. 2. Displays (a) an image of a duck represented on a four-level SHIA; (b) the result of applying SHIA transform M10 twice to the image displayed in (a).There are four observable effects: 1) multiple near copies of the input image (a), 2) each copy is rotated by the same angle, 3) each copy is scaled by the same amount, 4) applying M10 twice to the image displayed in (b) results in the image displayed in (a).

187
Cortical Specification of a Fast Fourier Transform Supports a Convolution Model of Visual Perception www.intechopen.com Fig. 3. Displays the results of applying the special case of GOSH-FFT, that is PaSH-FFT, to image of Fig. 2 (a), with n=4 and m=2.The four sub figures display intermediate results of PaSH-FFT: (a) on completion of first iteration of PaSH-FFT to Fig. 2; (b) Fourier transform on completion of second iteration; (c) on completion of first iteration of inverse PaSH-FFT; (d) Inverse PaSH-FFT on completion of second iteration.

189Cortical
Specification of a Fast Fourier Transform Supports a Convolution Model of Visual Perception www.intechopen.combetween columns of Layer 2/3 and similarly patchy extra-striate feedback connections to area V1. 4.1.0.12

Fig. 4 .
Fig. 4. Displays a schematic diagram of the pinwheel like structures of visual area V1, extracted from Figure 10 page 43 of Bruce et al. (2003).

Fig. 5 .
Fig. 5. Displays a schematic diagram of a computational unit.The circles represent neurons and the straight lines connecting the circles represent cortical connections.Each neuron depicted at the top of the figure outputs a value x i .The neuron depicted at the bottom of the figure inputs the sum of each x i multiplied by weight w i .input signal is the frequency domain and the weights are associated with a set of inverse primitive roots of unity, then the resulting local computation is an inverse Fourier transform, denoted I. (See Equation 2 for a definition of an inverse Fourier transform.)If the input signal is a frequency domain and the weights represent Fourier components, then the resulting local computation is a convolution in the frequency domain, denoted C. (See Equations 3 and 4.)Table1provides a summary of this notation.
implementation of PaSH-FFT in cortical architecture is a highly simplistic model of the parallelism inherent in the cortex.The model employed did not take into account at least two well accepted features of this parallel architecture.First, the system itself somehow synchronises the flow of the signal.Second, the cortex does not 195 Cortical Specification of a Fast Fourier Transform Supports a Convolution Model of Visual Perception www.intechopen.com www.intechopen.comthese long-range connections facilitated the transformation of the signal into and out of the frequency domain via a new fast Fourier transform named PaSH-FFT.A mathematical proof of the most general form of this FFT, GOSH-FFT, was provided in the appendix along with MatLab-like pseudo-code to facilitate the implementation of GOSH-FFT in computer software.6.0.0.29 It was shown that, to a first order approximation, a cortical implementation of PaSH-FFT could account for the large scale convolution implied by known models of contextual modulation.The significance of PaSH-FFT is that it:

F
, ..., f B−1 n−1 , denote a sequence of N points in the input signal.The proof is by induction on p.When p=1, GOSH-FFT is simply a DFT.Assume the GOSH-FFT computes a Fourier transform for all levels less than p.M −1 ( f x%B x/B )= f 0 0 ,..., f 0 n−1 ,..., f B−1 0 ,..., f B−1 n−1 .We now have B sub-signals,each of which is composed of n points.Then, by the induction hypothesis, we can apply GOSH-FFT to obtain B individual transforms of the B sub-signals; u)) = e((u/B)+(u%B)n)= 1, e n ,...,e (B−1)n , e 1 , e n+1 ,...,e (B−1)n+1 ,...,e n−1 , e 2n−1 ,...,e Bn−1 Perform a local DFT on each of the n groups of B points.The general term is: (u/B)%N e q(u/B)+((u%B)n))%N ((u/B)%N e (u%B)rBn%N e q((u/B)+((u%B))n))%N Table1provides a summary of this notation.At the next higher scale of connectivity each local map is assumed to have access to ongoing activity of every other local map via long-range patchy connections and striate-extrastriate interactions.These provide the means by which the results of local computation can be transported to another local map as input for a further local computation.We denote such a projection as P to represent the class of transformations (as described in Section 4.1).