Day 2023-11-18

7063.

toot by https://idf.social/@arjen

"Evaluating LLMs is a minefield"
Arvind Narayanan & Sayash Kapoor
<https:www.cs.princeton.edu/~arvindn/talks/evaluating_llms_minefield/>

"In short, many things can go wrong when we are trying to evaluate LLMs’ performance on a certain task or behavior in a certain scenario.

It has big implications for reproducibility: both for research on LLMs and research that uses LLMs to answer a question in social science or any other field."

Great slidedeck!

mastodon_bookmark

7062.

toot by https://hachyderm.io/@anderseknert

hachyderm.io/@anderseknert/111432497260726495

No, pair programming does not replace code reviews. That the reviewer(s) of a change is unfamiliar with the code is a *feature*, as that’s how *literally everyone* except for the author(s) will read it later.

mastodon_bookmark

7064.

toot by https://mastodonapp.uk/@MarkHoltom

mastodonapp.uk/@MarkHoltom/111430804517126114

mastodon_bookmark

3 bookmarks for 2023-11-18

toot by https://idf.social/@arjen

toot by https://hachyderm.io/@anderseknert

toot by https://mastodonapp.uk/@MarkHoltom