

A \swn\;can be seen as one which combines the properties of a regular and random network.  To analyse these \swn, they are usually represented by concepts derived from graph theory (See Appendix for formal definitions).  A node is used to represent a certain property and edges connect these concepts appropriately.  The structural properties of these graphs are characterised by their path length, $L(p)$, and clustering coefficient, $C(p)$.  As suggested by their names, the path length measures the typical separation between two edges in the graph and the clustering coefficient the `cliquishness' of a typical neighbourhood.  Thus, a \sw\;network is a graph which has short path length but a high clustering coefficient.

It is due to the strong mathematical foundations of graph theory and the intuitive analogy of the way one thinks in categories linking to other categories like a \sw\;network that motivated the investigation into these \swn.  We look at how accurate these \swn\;model certain networks in areas in Informatics such as the World-Wide Web (WWW) and in particular, the brain.

\subsection{History}
The work of Paul Erd\"{o}s \& Alfr\'{e}d R\'{e}nyi on random graphs in the 1960's provided the basis model of a network which tends to be used for comparison methods.  In many studies it is often shown that the \sw\;model characterises phenomenon better than random network-based ones.  \Swn\;was popularised by Stanley Milgram's `small-world experiment' in 1967 which implied the notion of the `six degrees of separation' phenomenon.  However, it was not until 1998 with \ws\;seminal paper entitled `Collective dynamics of `small-world' networks' which garnered much interest and research into the area.  Malaquais et al.\cite{Mala} provided a good theoretical argument towards why a \sw\;architecture would work as a model of the human mind, in particular appealing to the way people categorise and group concepts together, e.g. types of food.  More concrete support for this is found in research by Postma et al.\cite{Postma} which will be discussed later.

Another highly influential paper was published in 1999 by \bb\;et al.\cite{Bara} which criticises both the random and \ws\;model as incomplete - they lacked the incorporation of two major aspects of `real' world networks.  First, both models assume a network starts and ends with a fixed number of nodes.  In reality, many real world networks are open and form by the continuous addition of new nodes throughout the lifetime of the network, e.g. the WWW grows with the addition of new webpages.  Second, both models are wired and rewired randomly whereas many real networks exhibit preferential connectivity, e.g. a new webpage is more likely to include links to well-known pages with high connectivity.  More importantly, \bb\;et al. discovered that the inclusion of these two factors led to a power law distribution, $P(k) \approx k^{-\gamma}$.  This finding is of particular interest because recent studies in networks in Informatics have shown to exhibit a power law distribution, e.g. for a \fl\;\nw\;of activated voxels\cite{Eguilez}.

However, caution and interest also emerges through the \ml\;analysis of this power law distribution as dealt with by Amaral et al.\cite{Amaral}, more specifically the robustness of a power law distributed network against random error.  One of the main features of these networks is the high occurrence of `hubs', i.e. nodes with degrees that are significantly higher than the average.  Since failures occur at random and the vast majority of nodes are those with small degree, the likelihood that a hub would be affected is almost negligible.\footnote{http://en.wikipedia.org/wiki/Scale-free\_network}  The caution is naturally in the event of removing one or more of these `hubs'; the result would be a \nw\;consisting of a sparse graph which would give a less \ml ly rich structure to the \nw\;and hence, less information.  In short, scale-free networks are highly robust against errors, but potentially vulnerable to planned attack\cite{Amaral}.

Thus, the history of \swn\;arose from a sound and continually developing \ml\;theory with practical applications to \nw s and can be seen to relate to some \cn ist models of the brain.  This already demonstrates a rich prospect of research in the theory of \swn\;alone, without even knowing its applications.  However, this is just one of the aspects of \swn\;which generates interest.  Our aim is to see how much potential there is for the brain to be modelled as a \swn\;and vice-versa.


We begin by looking at the initial effectiveness of \swn\;for modelling specific types of systems related to the brain.  In \ws\cite{Watts} seminal paper they give the influential example of the completely mapped neural \nw\;of the (nematode) worm, \emph{C. Elegans}\cite{Achacoso} where an edge joins two neurons if they are connected by either a synapse or a gap junction. This \nw\;exhibited \sw\;properties with $L>L_{rand}$ and $C \gg C_{rand}$.  A property of path length of a random \nw, $L_{rand}$, is that it is short, which \swn s are also characterised by.  Here, although it is seen that $L>L_{rand}$, the actual figures are $2.65 > 2.25$ which is only $~ 15\%$ difference and for practical reasons, it is fairly justified to take $L \approx L_{rand}$ (as we shall see later).  Thus, we take the notion of $C \gg C_{rand}$ (which is certainly justified since $0.28 \gg 0.05$ here) along with $L \approx L_{rand}$ as the properties of \swn s.

Furthermore, one of the proposed future work areas from the paper was the plausibility of the brain having a \swn\;structure.  A study that showed \swn s of coupled phase oscillators synchronising as readily as a standard model presented the possible relevance of it to the observed synchronisation of widely separated neurons in the visual cortex\cite{Gray}.

In 2006, Bassett and Bullmore\cite{Bassett} published a very promising paper entitled ``Small-Brain Networks'' which provided further support for using \sw\;models for understanding the structure and function of the brain.  They presented results from studies of the visual cortex of the macaque monkey and the cat from several researchers and deduced \swn\;structures from them.  To quote: \textit{``The first graphical analyses of mammalian cortical \nw, in the early 1990s, preceded the mathematical development of the \sw\;model but identified many features of anatomical connectivity that would later be recognized as compatible with it"}.

In 1991, Felleman and Van Essen\cite{Felleman} compiled an anatomical connectivity matrix that summarised 305 axonal \cn s between 32 areas of the visual cortex in the macaque monkey.  They found that on producing several levels of cortical processing by differentiating connections with regard to the (laminar) structure of the cortex, results showed the existence of multiple separated parallel processing streams with fewer \cn s between the streams.

In particular, the segregation of the ventral and dorsal streams was confirmed by Young et al.\cite{Young1} who had began considering anatomical connectivity matrices from tract-tracing studies of the macaque monkey and cat also.  Their connectivity matrix was drawn from 301 \cn s and 30 areas in the visual cortex of the macaque monkey with \cn s being coded as 0 (weak/absent) to 2(strong \& reciprocal).  Similarly, Scannell et al.\cite{Scannell1}\cite{Scannell2} analysed results from tract-tracing of the cat with a connectivity matrix derived from 1139 \cn s between 65 cortical areas and again, similarly coded with 1 for weak/sparse up to 3 for strong/dense.

Both analyses of matrices were done using nonmetric multidimensional scaling (NMDS) which plots strongly connected areas in closely while maximising the graphical distance between non-connected areas.  The resulting graphs showed strong interconnectivity between the dorsal and ventral areas confirming Felleman and Van Essen in Young et al.'s studies and for Scannell et al., four distinct clusters of strong local interconnectivity was seen - designated visual, auditory, somatosensorimotor and frontolimbic - with relatively sparse connectivity between the clusters themselves. Furthermore, \bbb\;notes that in a study by Hilgetag et al.\cite{Hiltegag}, \textit{the larger scale macaque and cat cortical connectivity matrices described above were formally shown to have small-world properties, that is, relatively high clustering ($\gamma \gg 1$) and short path lengths ($\lambda \sim 1$) compared to random networks}.

A less data driven approach was taken by Postma et al.\cite{Postma} where they examined the semantic associative network for a set of words and measured the characteristic path length and clustering coefficient at various stages of development.  They artificially created a semantic net in two ways by using data taken from the MRC psycholinguistic database containing word-association and age-of-acquisition data published by Michael Wilson\footnote{http://ling.ed.ac.uk/help/mrc} online.  The \cn s were defined by a study by Kiss et al.\cite{Kiss} where people were asked to associate freely on target words.  The association data were used for two models - first, in combination with age-of-acquisition data where they were translated into age-dependent semantic \nw s and second, they were trained without the age-of-acquisition data as a hetero-associative memory (HAM) model of semantic development.

For the first model the presence of an associative relation between any two concepts is represented by the $N \times N$ matrix $\textbf{A}$.  Each element $A(t, i, j) = A(t, j, i)$ of the matrix $\textbf{A(t)}$ ($t$ represents age in years $\times 100$) specifies the presence ($A(t, i, j) = 1$) or absence ($A(t, i, j) = 0$) of an association between the concepts $i$ and $j$ at time $t$.  The total number of concepts $N$ is acquired at age $t = 700$ and for any concept $h$ that is not yet acquired, it is represented by $A(t, h, j) = A(t, j, h) = 0, 1 \leq j \leq N$.  Also, for each target word $w$, a response word $r$ was generated by a proportion $\lambda(w,r)$ of the subjects.  Thus, $A(700,w,r) = s[\lambda(w,r)],$ with $s[x] =$ the step function,\footnote{i.e. $s[x] = 1, x>0$ and $= 0$ otherwise.}  and for $t<300$, $A(t,w,r) = s[aoa(w) - t] \times s[aoa(r) - t] \times A(800,w,r)$, with $aoa(w)$ being the age of acquisition of word $w$.
For the second model Postma et al. presented the concepts acquired to a HAM model consisting of $M$ in and output nodes and $M^2$ adaptive weights.  Associations among concepts were represented by means of clipped Hebbian learning, i.e at each presentation of a pattern-pair, weights are set to $w(i,j) = min[1, w(i,j)' + \lambda p(i)q(j)].$  (Aside: Note that the usage of Hebbian learning is subject to criticism which we shall deal with later.)  Nevertheless, Postma et al. sets $\lambda = 0.2$, trains the network seven times to simulate the developmental stages and at each stage pairs of concepts are randomly selected with a probability proportional to their association strength.  We also note that they do not give any justification for the choice of these numbers, once can only presume that they should be arbitrary for independence of results.  They incorporate stochastic synapses to accommodate the association of a single input vector to multiple output vectors which is a modification to traditional HAM modules which associate single input vectors to single output ones.  For $p(j)$ an input vector, elements of output vector $q(i)$ is defined as:
$$q(i) = \Sigma_{j=1}^{M} f(w_{i,j}) p(j),$$
with $f:\mathbb{R} \rightarrow \{0,1\}, f(x) = 1,$ for $rand<x, f(x) = 0$ otherwise, and $rand$ is a random value taken from the uniform distribution on the unit interval.

Thus, the resulting tables are as follows: \\
\begin{center}
\begin{tabular}{| c | c | c | c | c | c | c | c | c |}
\hline			
  \textbf{$t$} &  \textbf{100} & \textbf{200} & \textbf{300} & \textbf{400} & \textbf{500} & \textbf{600} & \textbf{700} & \textbf{adult} \\
  \hline
  $N_{1}$ &  - & 30 & 322 & 728 & 1066 & 1185 & 1205 & 8210 \\
  \hline
  $N_{2}$ &  20 & 35 & 266 & 677 & 999 & 1011 & 1205 & - \\
  \hline
  $k_{1}$ &  - & 3.2 & 10.5 & 14.7 & 17.0 & 17.7 & 17.1 & 58.7 \\
  \hline
  $k_{2}$ &  1.2 & 3.0 & 9.4 & 12.1 & 16.4 & 17.0 & 18.0 & - \\
\hline
\end{tabular}
\end{center}
where the `adult' column specifies $N_1$ and $k_1$ for the entire set of concepts for which association data are available for the first model and $N_2$ and $k_2$ are associated with the HAM model.

The analysis of the results were then done by calculating the clustering coefficients and path lengths.  Postma et al. then compared these values with ones obtained from a random network (letting $p = 1$) so that they got $L \approx L(1)$ and $C \gg C(1)$ which is akin to \ws\;criteria from earlier.  Overall, the main findings of Postma et al.'s paper is that the characteristic path length remains almost constant despite expansion of the network ($C \approx 0.13, L \approx 2.7$) and that these values indicate small-worldliness.  It is again noteworthy that the binary association network produced here are rather crude approximations of the human associative strengths and that the almost stagnant path length didn't form until the later stages of development ( $\approx \frac{400-500}{100} years$.  However, the conclusion which mentions the apparent tendency of the human semantic \nw\;to optimise small-worldliness by maintaining a constant path length and low clustering coefficient despite increasing number of concepts is also discussed in \bbb's paper from earlier.
