7 Comments
User's avatar
Monkyyy's avatar

Ive been summarizing this sort of thing as "having taste", if your still doing this sort of research maybe find a bunch of bad vibe coders give them this task again. Then give them reading material(in the form of hour's of youtube playlists) from different programming paradigms, randomize the order they are given each playlist, tell the vibe coders to implement the "programming taste" of the paradigm they last, see what happens to these benchmarks.

Hōrōshi バガボンド's avatar

this is happening btw. (slightly different form tho) are you interested?

Monkyyy's avatar

its unclear if this is you giving me a link, or asking to try my hand I designing a playlist for vibe coders to learn or asking me to be a vibe coder for it; what exactly is happening?

Hōrōshi バガボンド's avatar

my bad for vibe posting. all three actually

I'm designing an experiment where participants solve a task with AI. half get a cheatsheet with common footguns, half don't.

your playlist idea maps to what I call "multiphase" (didn't put lots of effort into naming it yet tbh) where phase 1 acquires the vocabulary for domain concerns with LLMs and then phase 2 using it in directed prompts. I have enough data for phase 2 already (with a 100% success rate across 3 model families when the right words are in the prompt) what's still missing is phase 1

lmk if you're interested in any of these

Monkyyy's avatar

I wont mind trying my hand writing a cheat sheet or trying to teach vibe coders; but I know nothing about databases.

Shouldnt you hedge your bets on cheatsheets by having maybe 3 different versions?

Hōrōshi バガボンド's avatar

it's not about dbs I didn't decide on the task yet. currently mining for the ones that have typical production/scale footguns which LLMs often miss but get resolved via proper vocabulary each time.

Hōrōshi バガボンド's avatar

that would be the very next step! This one was just what I could do in a relatively short amount of time with the resources I had.

Depending on how this one is received, I'd seriously consider it.