HNNewShowAskJobs
Built with Tanstack Start
The Continual Learning Problem(jessylin.com)
56 points by kiyanwang 6 days ago | 4 comments
  • mynti6 days ago

    Super interesting blogpost. I just wonder how this is actually different to LORA, since LORA also adds some parameters and freezes the rest of the model. This seems like a sparse, memory efficient LORA with a couple of extra steps, since it uses attention again to make the sparsity work. All while making it a lot more effective compared to LORA (performance drop of only 11% compared to 71%).

    • sva_7 hours ago |parent

      > LORA

      I think you meant LoRA (not to be confused with LoRa)

  • alyxya10 hours ago

    I think the solution to continual learning is as simple as using context distillation. We know that models are good at in-context learning, so we just want an efficient way to distill context into the weights. I suspect context rot may come from how the softmax in attention gets diluted with a longer context, so this wouldn't be an issue with context distillation.

    • killerstorm9 hours ago |parent

      Perhaps it can work through multiple stages: ICL -> prompt/context optimization (*) -> prefix tuning / KV distillation -> context distillation.

      *: it is possible to measure how much part of a prompt helps with a task e.g. measuring change in entropy