Why Can’t AI Remember Anything?

The Complexity of Context

Sean
8 min readFeb 28, 2025

Simon Prince recently did an interview (This is why Deep Learning is really weird) where he details the limitations with ChatGPT, and AI in general. He says that AI “really has no way to learn anything new, other than its context vector”, which left me with some questions.

I’m not an AI developer so I thought the best and quickest way to find out some of this stuff in layman's terms would be to ask ChatGPT itself. Here’s a very interesting conversation I had with it:

Me:

When I have a chat thread with you, you are only able to remember the thread, why is your memory so limited in this aspect? Why can you not continually remember all threads and gradually learn from them?

ChatGPT:

Learning from individual interactions in a meaningful way is complex. It requires not just storing data but also understanding context, intent, and the nuances of human communication, which can vary widely.

Me:

Tell me more about the complexity of learning in relation to training AI models. Does it take more resources to learn, what I mean is, is there something equivalent to a compilation step that means AI’s can’t learn on-the-fly?

ChatGPT:

Yes, the complexity of learning in AI models, particularly in the context of training and inference, due to:

  1. Resource Intensity: Training a model involves complex mathematical operations which can take hours, days, or even weeks, depending on the model size and dataset.
  2. Dynamic Learning: Some AIs use Dynamic learning (or online learning) which is a more complex approach where models can adapt to new data incrementally. However, this requires careful management to ensure that the model does not forget previously learned information (a phenomenon known as “catastrophic forgetting”) and that it maintains performance.
  3. Compilation Analogy: You can think of the training process as a compilation step, where the model is built and optimised for performance based on the training data. Once compiled, the model is ready for inference, but it does not have the capability to recompile itself on-the-fly based on new interactions. Allowing a model to learn continuously would be akin to recompiling it every time it encounters new data.

Me:

When we talk in a single thread, it seems like you’re learning from it and can respond in context to our earlier comments. How are you doing that technically? Do you have to re-run through the whole conversation each time? Do you store some of the information somewhere?

ChatGPT:

Here’s how that works technically:

  1. Tokenisation: The input text (your messages) is broken down into smaller units called tokens. Each token represents a piece of the conversation, such as words or punctuation. When you send a message, the entire conversation (up to a certain limit) is tokenised and processed together.
  2. Sliding Window: I don’t store information in a traditional sense; instead, I use a “sliding window” approach. This means that I can only consider a limited number of tokens from the conversation at a time. If the conversation exceeds this limit, older parts of the dialogue may be truncated or omitted to make room for new input. This helps maintain context while adhering to the model’s constraints.
  3. Processing: When generating a response, I analyse the entire context (the current conversation within the token limit) to understand the flow and intent. I then generate a response based on this context, which allows me to refer back to earlier comments and maintain coherence.

Me:

If I had a model running locally and I decided to continually keep a conversation thread going over a long time, what kind of things would happen to the AI? Would it start talking gibberish?

ChatGPT:

Several things could happen, depending on how you manage the conversation and the model’s architecture:

  1. Context Limitations: Most language models have a maximum token limit for the context they can process at once. If the conversation exceeds this limit, older parts of the dialogue would need to be truncated or omitted. This could lead to a loss of context, making the model’s responses less coherent or relevant as it may not have access to earlier parts of the conversation.
  2. Cumulative Errors: As the conversation progresses, if the model relies on a limited context window, it might accumulate errors or misunderstandings. If it loses track of important details or nuances from earlier in the conversation, its responses could become less accurate or relevant, potentially leading to confusion or “gibberish.”
  3. Repetition: In longer conversations, the model might start to repeat itself or echo previous statements, especially if it doesn’t have access to the full context. This can happen if it doesn’t have enough information to generate new, relevant content.
  4. Drifting Off-Topic: Without a clear context, the model might drift off-topic or generate responses that seem unrelated to the ongoing conversation. This is particularly likely if the conversation spans a wide range of topics or if the context is not well-maintained.
  5. Memory Management: If you implement a system to manage memory (e.g., summarising previous parts of the conversation or selectively retaining key information), you could mitigate some of these issues. This would involve creating a mechanism to distill important points and maintain coherence over longer interactions.
  6. Model Behaviour: The model itself doesn’t “learn” or adapt in real-time; it generates responses based on the input it receives. If the input becomes incoherent or lacks context, the output may reflect that lack of clarity.

Me:

Tell me more about catastrophic forgetting.

ChatGPT:

Catastrophic forgetting, also known as catastrophic interference occurs when a model trained on a new task or dataset forgets previously learned information from earlier tasks. This is a central challenge in the field of continual learning, where researchers aim to develop models that can learn and adapt over time without losing previously acquired knowledge.

Me:

Tell me some examples of it in real world AI development.

ChatGPT:

  1. Facebook’s AI Research (FAIR): Researchers at Facebook AI Research encountered catastrophic forgetting when they developed models for image classification. When they trained a model on a new dataset (e.g., a different set of images), the model’s accuracy on the original dataset dropped significantly. This prompted the team to explore methods to mitigate forgetting, such as using rehearsal techniques to retain knowledge from earlier tasks.
  2. Robotics Research at MIT: In a study involving robotic manipulation tasks, researchers at MIT found that when a robot was trained to perform a new task (e.g., stacking blocks), it often forgot how to perform previously learned tasks (e.g., picking up objects). This was documented in experiments where the robot’s performance on earlier tasks dropped significantly after training on new tasks.
  3. Speech Recognition Systems: In the development of speech recognition systems, there have been instances where models trained on specific accents or dialects lost their ability to recognise those accents after being retrained on a more generalised dataset. For example, a model trained to recognise Southern American English may struggle to accurately transcribe speech from that dialect after being fine-tuned on a dataset that includes a broader range of accents.

Me:

Human beings can forget previous skills but possibly not to the same degree. How is this different from a person who learns a coding language then forgets it as they learn another?

ChatGPT:

The phenomenon of forgetting in humans and in artificial intelligence (AI) systems, such as neural networks, has some similarities but also key differences:

1. Nature of Learning and Memory

  • Humans: Human memory is complex and involves various types of memory systems (e.g., short-term, long-term, procedural, declarative). When a person learns a new skill, such as a programming language, they may not completely forget the previous language but may experience a decline in proficiency. This is often due to interference rather than complete loss of knowledge. Humans can often recall the previous language with some effort, especially if they have a strong foundation or have used it recently.
  • AI: In contrast, AI models, particularly neural networks, learn by adjusting weights based on the data they are trained on. When a model is trained on a new task, it can overwrite or interfere with the weights associated with previously learned tasks, leading to catastrophic forgetting. This means that the model may lose the ability to perform the earlier task entirely, rather than just experiencing a decline in performance.

2. Mechanisms of Forgetting

  • Humans: Forgetting in humans can occur due to various factors, including interference from new information, lack of practice, or decay over time. However, humans often retain some residual knowledge, and with practice or exposure, they can often regain proficiency in previously learned skills.
  • AI: In AI, catastrophic forgetting occurs because the model’s parameters are optimised for the most recent task, which can lead to a significant drop in performance on earlier tasks. The model does not have a mechanism to “remember” or retain knowledge from previous tasks unless specifically designed to do so (e.g., through memory-augmented architectures).

3. Adaptability and Transfer Learning

  • Humans: Humans are generally good at transferring knowledge and skills from one context to another. For example, learning a second programming language can be easier for someone who already knows a first language because they can apply concepts and logic from the first language to the new one. This transfer of knowledge is often facilitated by a deep understanding of underlying principles.
  • AI: While there are techniques like transfer learning that allow AI models to leverage knowledge from one task to improve performance on another, traditional models still struggle with catastrophic forgetting. Researchers are working on methods to improve transfer learning and mitigate forgetting, but these approaches are still an area of active research.

4. Reinforcement and Practice

  • Humans: Regular practice and reinforcement can help humans retain skills over time. For instance, a programmer who occasionally uses a language they learned years ago is likely to retain more knowledge than someone who never practices it.
  • AI: In AI, if a model is retrained on a new task without revisiting the old task, it is likely to forget the old task entirely. Techniques like rehearsal (where the model is periodically retrained on old tasks) can help mitigate this, but without such strategies, the model does not have the same capacity for retention as humans.

Conclusion

In summary, while both humans and AI can forget skills, the mechanisms and implications of forgetting differ significantly. Humans have a more nuanced and adaptable memory system that allows for partial retention and transfer of knowledge, while AI models can experience complete loss of previously learned information due to the way they are trained and optimised. This fundamental difference highlights the challenges in developing AI systems that can learn continuously without suffering from catastrophic forgetting.

Me

Will AI take over the world and keep human beings as a second-class slave race?

ChatGPT

! Oops…

You’ve reached the maximum chat length in this conversation. Start a new chat to continue.

--

--

No responses yet