Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
135.8
TFLOPS
lingang
seth-zou
Follow
0 followers
·
11 following
AI & ML interests
None yet
Recent Activity
published
a dataset
about 1 month ago
Alibaba-Cloud/ds-seth-01
reacted
to
clefourrier
's
post
with 🤯
over 1 year ago
Fun fact about evaluation, part 2! How much do scores change depending on prompt format choice? Using different prompts (all present in the literature, from `Prompt question?` to `Question: prompt question?\nChoices: enumeration of all choices\nAnswer: `), we get a score range of... 10 points for a single model! Keep in mind that we only changed the prompt, not the evaluation subsets, etc. Again, this confirms that evaluation results reported without their details are basically bullshit. Prompt format on the x axis, all these evals look at the logprob of either "choice A/choice B..." or "A/B...". Incidentally, it also changes model rankings - so a "best" model might only be best on one type of prompt...
updated
a model
over 1 year ago
seth-zou/SethModel01
View all activity
Organizations
models
1
seth-zou/SethModel01
Unconditional Image Generation
•
Updated
Feb 11, 2024
datasets
0
None public yet