Snowplough Turns

DDRM · Post by **DDRM** » Fri 07 Feb 2025, 09:57

Hi Richard,

Hmm, that took some thinking about, but I think the answer is yes, it is using Prim's algorithm. It's "explanation" rather fudges over the interesting bit, which is the loop at 390 - 440.

What it does is to find (lines 280 - 330) the node that isn't yet in the tree, but has been marked as having a near neighbour, and finds the one with the nearest known neighbour (stored in key(v) ), and records it as u. The first time round this is just the starting node, which we've given a key value of 0, but we'll see later these key values get filled in with the best potential link that we know about so far...

It then adds that node to the minimum spanning tree (MST, line 360).

In the next loop, it goes through all the nodes, and if they aren't already in the MST, and they have a link to our new member u which is closer than one we already know about, it updates the key(v) to this new value, and marks u as it's prospective parent.

Then we go back to the start, and look at all the nodes that aren't already in the MST, find the one with the closest link to an existing member, add it to the MST, and update any close links it has to nodes outside the tree.

That doesn't directly answer your question about why the matrix is symmetrical, with graph(u,v) = graph(v,u). The answer to your implicit question about whether this represents redundancy, or is core to the algorithm is "yes, the information is redundant, but it makes the algorithm easier to implement".

Remember we can choose any arbitrary node as our starting point - so we don't know whether we are going to use a link with u as the parent, joining in v, or v as the parent, joining in u. By having both elements in the array, we don't have to worry about it - we can just look up the link in the table either way round.

Of course we COULD use some sort of system like numbering the nodes, and only storing each edge once, perhaps as edge(lower node, higher node): then when we wanted to check it, we'd need to work out whether u or v was lower, and then access either edge(u,v) or edge(v,u) accordingly - but making that test is slightly tricky, and probably slows the algorithm down more than the cost of the extra memory (if one can compare time and space!).

You and I both grew up in an era where space was usually limiting, but these days that's rarely the case, so a more space-requiring, but faster, implementation is likely to be preferred.

One further comment: an alternative way of encoding the information would be to store a list of edges (with their start and end nodes), rather than a square array of nodes (with the values of their edge weights. That tends to be slightly more complex to work with, but would be preferred in a "sparse" graph (i.e. one where most of the edge values were 0 - there were rather few links). That's often the case with very large networks, since points far apart often aren't connected. In those cases you might well go down the "edge list" route, which would be much more space-efficient (and potentially faster, too, since you only have to iterate through the list of edges, which could be small relative to looking at n x n entries in a matrix).

It also has the advantage in this case that you could sort your list of edges by value, and stop iterating through it when you've found a potential "cheapest link" - so for Prim (and some of the related algorithms like Kruskal) it might actually be a better choice - though that will tend to depend on the size and density of the graph.

Hope that's helpful and/or interesting!

D

Richard Russell · Post by **Richard Russell** » Fri 07 Feb 2025, 10:51

DDRM wrote: ↑Fri 07 Feb 2025, 09:57 Hope that's helpful and/or interesting!

Yes, thanks. A worthwhile addition to the code might therefore be to confirm that the matrix is symmetric, as a 'sanity check' that the data has been entered correctly (I did actually do this to check my entries, but I removed it again as being non-essential and not part of the DeepSeek code).

This all leaves me wondering how DeepSeek generated that code. Was there a BBC BASIC version of the Prim algorithm somewhere in its training data? That seems highly unlikely. Was there a C or Python (or something) implementation in its training data which it 'translated' to BBC BASIC? More plausible I guess. Or did it work out the BBC BASIC code from scratch from a description of the algorithm?

We're unlikely ever to know. But those sceptics who say that a Large Language Model is no more than a text-prediction engine, capable of determining statistically what the next word is likely to be based on the training dataset, would have difficulty in explaining how such a simplistic mechanism could lead to it being able to write competent BBC BASIC code!

It remains uncertain to what extent transformer-based LLMs will scale, in the sense of becoming more 'intelligent' as the size of the training data is increased. There seemed to be signs during 2024 that the scaling was starting to fail, with larger datasets sometimes resulting in worse overall performance. This caused some to question whether this approach could ever approach Artificial General Intelligence.

But the 'reasoning' capability built into DeepSeek and some of the newer AIs from other vendors does seem to be paying off. One thing's for sure, developments in the field seem to be happening so rapidly that we won't have long to wait to find out.

DDRM · Post by **DDRM** » Fri 07 Feb 2025, 13:07

I agree it's difficult to say. I'm not sure how the "reasoning" they describe is generated, or how it relates to the actual underlying process.

To say they are "simply" text prediction engines is rather a simplification. It's true that they are basically trained on the ability to predict the "most likely" next word in a given context, but underlying that ability is a network of tens of billions of weights, which in a sense codify the word relationships of the training text - and that in turn reflects the thoughts and understanding of the people who wrote that text.

One reason they are rather good at programming is that a large chunk of their training material is the help files etc for all sorts of computer stuff, as well as code repositories - so you'd expect them to be strong on syntax, and to have "seen" code for lots of things, often including comments about what it does.

One reason they are quite good at translating from one programming language to another is probably that language is tokenised before processing, and that process of tokenising it probably means that similar concepts (for loops etc) map together from different languages - so they are good at transferring what they "know" in one context to another. Interestingly, it also underlies their strength as translators, and increasingly allows them to make decent pictures from text prompts: the text is tokenised, and then those tokens can be interpreted as image components.

There are two ways that models are advancing: one is increasing the complexity of the models themselves - i.e. the number of links and their weightings. Another is to use more (and more varied) training data. Both have been very important in getting us where we are today, but both have problems for further development.

The size of the models is related to their cost, energy use and speed - and bigger models need more training data to get the weights where they need to be!

Training data is also getting problematic - for one thing, the internet has often been used as a source, but that is getting full of AI-generated stuff, and it has been shown that when you start training these models on their own outputs, you surprisingly quickly see "model collapse", where it degenerates to a gibbering wreck! There's also the issue that people are getting increasingly strident in their opposition to having their work ripped off by the models. Also, as the mass of random human thought from the web becomes larger, it dilutes out useful material (like programming manuals, or medical textbooks).

I agree that the performance of these LLMs is remarkably good - but then you run across cases where it is remarkably bad, too! That makes it hard to rely on. My own view is that they are still quite a long way from artificial general intelligence - they lack any "model of the world", and they can't integrate knowledge (except inasmuch is captured within their links) - they have no real "reasoning" capacity of their own.

Another long screed, arguably straying far from BBC BASIC - sorry!

Richard Russell · Post by **Richard Russell** » Fri 07 Feb 2025, 14:18

DDRM wrote: ↑Fri 07 Feb 2025, 13:07 I'm not sure how the "reasoning" they describe is generated, or how it relates to the actual underlying process.

There is some information available. It's referred to as the 'Chain of Thought' reasoning model, and DeepSeek can be asked to output the steps it is taking during the process, although it can make the output very verbose.

As I understand it, it has always been possible to encourage LLMs to use a step-by-step approach by explicitly prompting them to do so, but DeepSeek and some other modern AIs will automatically use Chain of Thought when the answer to the question involves reasoning.

It's true that they are basically trained on the ability to predict the "most likely" next word in a given context, but underlying that ability is a network of tens of billions of weights, which in a sense codify the word relationships of the training text - and that in turn reflects the thoughts and understanding of the people who wrote that text.

Whilst that is true, you've omitted to mention Transformers (the T in ChatGPT) which is the key feature of these LLMs, and the one thing above any other that has resulted in their spectacular rise in capability. This YouTube video about Transformers is interesting, but I find it hard to follow:

https://www.youtube.com/watch?v=wjZofJX0v4M&t=956s

Richard Russell · Post by **Richard Russell** » Sun 09 Feb 2025, 18:56

DDRM wrote: ↑Fri 07 Feb 2025, 13:07 My own view is that they are still quite a long way from artificial general intelligence - they lack any "model of the world", and they can't integrate knowledge (except inasmuch is captured within their links) - they have no real "reasoning" capacity of their own.

Hmm, I can't really agree.

Firstly on lacking a "model of the world": isn't that exactly what they have acquired from their training data? Specifically they've 'learned' how things relate, i.e. what concepts are 'connected with' other concepts. Isn't that what we mean by a 'model of the world'?

Secondly, "they can't integrate knowledge". They clearly 'contain' knowledge (i.e. facts), although the precise mechanism isn't well understood - apparently their 'factual storage' is believed to reside within the MLP (Multi-Level Perceptron) which contains two-thirds of the weights that are adjusted during their training stage.

If on the other hand you mean they don't acquire 'new' knowledge from their interactions with users, my understanding is that it's not that they fundamentally can't but that the capability to do so is deliberately disabled to prevent them being 'corrupted' by false or biassed information.

Lastly "they have no real reasoning capability". Again my understanding is that it's precisely this capability which has been added to the newest and most advanced AIs, like DeepSeek.

I certainly don't claim to be an expert on any of this, what little understanding I've acquired has been from the series of YouTube videos by 3Blue1Brown which are hard-going but interesting.

One thing in particular that I found fascinating is that (in ChatGPT-3 at least) the vector encoding everything the model 'knows' about a particular word - or 'token' to be precise - is exactly 12,288 elements long. In other words, the AI represents all the concepts and nuances associated with the word within a 12,288-dimensional space.

Richard Russell · Post by **Richard Russell** » Wed 12 Feb 2025, 19:24

DDRM wrote: ↑Fri 07 Feb 2025, 13:07 they have no real "reasoning" capacity of their own.

Here is one reference to reasoning in LLMs. Of course whether you accept that it is "real" reasoning depends on what you mean by 'real'! If you mean 'human-like' reasoning, perhaps they don't, but whether that matters is questionable.

LLMs differ in architecture from the human brain in some important ways, not least that the basic unit of information they process is a 'token' (often an entire word, hence 'language model') which probably explains why they are relatively speaking bad at mathematics and counting letters in a word!

But maybe those limitations won't be that difficult to find workarounds for.

DDRM · Post by **DDRM** » Thu 13 Feb 2025, 09:15

Hi Richard,

Thanks, that's a really interesting website (as well as page!) - I'll certainly be spending some time on it!

It's interesting that the final section quotes a paper which concludes that they DON'T reason, as people would understand it! That would be my perspective too, based on what we've been taught about transformers (and is echoed by my colleagues who actually work with/write them) - but things like chain-of-thought prompting (which now seems to be incorporated "behind the scenes" in their interfaces) can certainly give that feel. I still think that what they are doing is fundamentally "looking up relationships" in their training data, encoded as the link weights within themselves - hence capturing the implicit reasoning of the people who created that training data (i.e. wrote the text, etc).

The interesting question to my mind is whether that's also basically what WE do - in other words, if our "world model" emerges out of the complex "training data" we receive from our senses. Certainly the spectacular advances in capabilities of LLMs over the last couple of years raise that possibility.

My feeling is that that ISN'T the way we work. One reason is that people learn spectacularly faster than LLMs: typically people will learn after a few, or only one, example, and can transfer that knowledge/understanding to related problems. AI models in general (including LLMs) typically need hundreds or thousands of training examples to get "quite good" at something, and that something tends to be rather specific.

Because LLMs have been trained on a HUGE range of information, covering a wide range of concepts, and because of tokenisation, which effectively maps similar concepts "close together" (whatever that means in multi-thousand, or even multi-billion, dimensional space!), they DO acquire some ability to "argue by analogy" - but I'm not sure they are doing that in the same way that we do (but it might be! We don't understand how humans do it, either, as far as I know).

Another argument could point both ways: people have around 10^14 synapses (connections between neurons), which is still ~ 3 orders of magnitude more than the latest LLMs (though bear in mind a lot of those are just there to help us breathe and keep our hearts beating!). Maybe when LLMs get to that size they WILL think like us. The counter argument is that to train such a huge network you'd need a simply astronomical amount of data to get all the weights to a sensible value - while people really don't, which suggests they use a different approach (and / or are "pretrained" by evolution - at lease some "weights" are encoded genetically, at least to a sensible "baseline value").

That's my understanding - but you probably have more opportunity to read the LLM literature than I do at the moment, since my focus is elsewhere...

I certainly agree with your point that just because people think one way it means that computers have to mimic that - it may well be the case that they can achieve similar goals in a totally different way.

Richard Russell · Post by **Richard Russell** » Thu 13 Feb 2025, 11:48

DDRM wrote: ↑Thu 13 Feb 2025, 09:15 The interesting question to my mind is whether that's also basically what WE do - in other words, if our "world model" emerges out of the complex "training data" we receive from our senses.

Exactly. If one takes a too analytical approach there is a danger of trying to interpret the workings of one system we don't really understand (the apparently 'intelligent' capabilities of an LLM) based on our subjective experience of another system we don't understand (the human brain).

I'm more in the camp of saying that if it looks like a duck, walks like a duck and quacks like a duck, it's a duck!

If the smallest building blocks, which we do understand, of an LLM and the human brain were very different in their internal operation that could be taken as evidence to suggest that they aren''t analogous. But in fact the Perceptron was modelled on our understanding at the time of how a neuron works, which I don't think has changed that much since.

My feeling is that that ISN'T the way we work. One reason is that people learn spectacularly faster than LLMs: typically people will learn after a few, or only one, example, and can transfer that knowledge/understanding to related problems. AI models in general (including LLMs) typically need hundreds or thousands of training examples to get "quite good" at something, and that something tends to be rather specific.

My feeling is that one needs to consider the (offline) 'learning/training' process and the (real-time) 'thinking' process separately. It may well be that the human brain has a much more efficient way of setting the 'weights' in the neural network than the iterative approach of LLMs. But that doesn't necessarily mean that the way 'intelligence' arises from that network, once created, is fundamentally different between the two.

After all the big advance which led to LLMs being possible, apart from the sheer scale achievable with modern microelectronics, was the discovery of a mathematical algorithm for iteratively adjusting the weights, based on back-propagation and the local gradients of a multi-dimensional surface. Without that we could have built the network, but have had no way of setting the weights.

Maybe when LLMs get to that size they WILL think like us. The counter argument is that to train such a huge network you'd need a simply astronomical amount of data to get all the weights to a sensible value - while people really don't, which suggests they use a different approach

There are some indications that scaling alone isn't resulting in the improvements hoped for. On the other hand I have seen it argued that the matric used to assess the performance of LLMs is flawed, because for the great majority of the tasks they are asked to perform there is no single 'correct' answer against which theirs can be compared.

I'm also unsure whether the subjective assessment that the human brain doesn't need vast amounts of training data to develop is actually true, Do we really understand the process by which the brain acquires information from the senses, and just how much is acquired during its development?

Of course this may well not be 'language' data, and the really big difference between LLMs and the brain must surely be that 'language' (in the form of 'tokens') is absolutely key to the LLM, but (presumably) not to the brain. Whether this is a fundamental difference, which can't be resolved without starting again with an entirely different approach, remains to be seen.

I'm certainly not of the view that the brain is so complex and special that it will never be possible to develop a comparable Artificial General Intelligence. But how long that will take, and what the architecture will look like, remains unknown (as does whether it's a good idea!).

Richard Russell · Post by **Richard Russell** » Fri 14 Feb 2025, 12:56

Richard Russell wrote: ↑Thu 13 Feb 2025, 11:48 Whether this is a fundamental difference, which can't be resolved without starting again with an entirely different approach, remains to be seen.

Developments in this field are happening at breakneck speed! According to the video below a recent paper describes a promising approach to 'reasoning without language' (at least, in a much more subtle way than Chain of Thought):

https://www.youtube.com/watch?v=ZLtXXFcHNOU

Richard Russell · Post by **Richard Russell** » Fri 14 Feb 2025, 16:00

DDRM wrote: ↑Thu 13 Feb 2025, 09:15 It's interesting that the final section quotes a paper which concludes that they DON'T reason, as people would understand it! That would be my perspective too, based on what we've been taught about transformers (and is echoed by my colleagues who actually work with/write them)...

I still think that what they are doing is fundamentally "looking up relationships" in their training data, encoded as the link weights within themselves - hence capturing the implicit reasoning of the people who created that training data (i.e. wrote the text, etc).

This is in danger of becoming off-topic for a BBC BASIC forum, even if DeepSeek is the best AI so far at writing BBC BASIC code. I definitely don't want to risk my posts being reported to the admin, so if anybody is unhappy please let me know (politely).

With that said, I'm not sure that "capturing the implicit reasoning of the people who created that training data" is any different from 'learning how to reason' which is presumably what the human brain does during its development. This is what Gemini AI has to say about that:

"According to current scientific understanding, the ability to reason is not entirely innate but rather a combination of innate brain structures that provide a foundation for reasoning, which is then significantly developed and refined through learning and experience throughout life".

Can I suggest that you check out how DeepSeek responded to a classic reasoning test: scroll a little more than halfway down to Armageddon with a Twist here to read the question and its response. Does that 'feel' like genuine reasoning to you?

BBC BASIC forum

Snowplough Turns

Re: Snowplough Turns

Re: Snowplough Turns

Re: Snowplough Turns

Re: Snowplough Turns

Re: Snowplough Turns

Re: Snowplough Turns

Re: Snowplough Turns

Re: Snowplough Turns

Re: Snowplough Turns

Re: Snowplough Turns