@kc
WTF
Sadly the joke isn't funny at all
Do you have an explanation for this?
The regression could be caused by accessibility being generally underrepresented.
I would assume this representation to decline with the visibility of the projects. Meaning large well known projects contain more accessibility than obscure code snippets in the dark corners of the internet.
If this is the case an increase of the training data by scraping the last bit of code would lead to a statistically worse representation of accessibility
The worse performance with expert guidance is "interesting". It shows again the core problem of LLMs or any existing AI. It doesn't, and can't reason.
Nevertheless i would expect that providing the expert guidance would increase the statistical correlation to the intended outcome.
But I could also imagine that there is a threshold of underrepresentation. Below which the expert guidances are stronger correlated to random outcomes than to the intended outcome
Tongue in cheek, there is a simple solution
The AI competitors could "solve" this by increasing the representation of accessibility in the training data by financing a massive push for accesdibility.
That would be money well spent even when AI fails in the end. But I sadly don't expect it to happen