Are there best practices for comparing model performance beyond benchmark data when they may have different underlying datasets?
Are there best practices for comparing model performance beyond benchmark data when they may have different underlying datasets?
1 comments
You can also break down by task here: https://paperswithcode.com/sota
For churn, you might go to time series forecasting first: https://paperswithcode.com/task/time-series-forecasting
They have this subtask which is a bit different because it's about novel products rather that continued sales, for example:
https://paperswithcode.com/task/new-product-sales-forecastin...
But you get the idea of how they organise by task. I'm curious about other benchmarks and interfaces too and would like to see others.
I think HuggingFace and Kaggle have some overlap with different tasks that have benchmarks.