- Step1: Clone the WorFBench repository from GitHub
- Step2: Install dependencies via pip or conda
- Step3: Configure API keys and model endpoints in config.yaml
- Step4: Select or define benchmark tasks in the tasks folder
- Step5: Run evaluation scripts to execute agents against tasks
- Step6: Use provided visualization tools to analyze results
- Step7: Extend or customize tasks and metrics for new experiments