HNNewShowAskJobs
Built with Tanstack Start
Just Ask for Generalization (2021)(evjang.com)
38 points by jxmorris12 4 days ago | 5 comments
  • xg152 days ago

    (2021), still very interesting. Especially the "post-overfitting" training strategy is unexpected.

    • dev_hugepages2 days ago |parent

      This is talking about the double descent phenomenon (https://en.wikipedia.org/wiki/Double_descent)

    • luckystarr2 days ago |parent

      I remember vaguely that this was observed when training GPT-3 (probably?) as well. Just trained on and on, and the error went up and then down again. Like a phase transition in the model.

  • esafak2 days ago

    The low sample efficiency of RL is well explained.

  • 2 days ago
    [deleted]