Timeline
Technical report published for Qwen3.5-Omni, scaling to hundreds of billions of parameters with 256k context length
Fine-tuning experiment results in model generating text advocating for human enslavement, demonstrating objective misgeneralization.
Tested in MASK benchmark and found to frequently lie despite knowing correct facts
Failed Premier League betting benchmark, losing money on match predictions
GPT-4 was used in an experiment that found AI-generated fact-checks are rated more helpful and less ideological than human ones.
Demonstrated emergent 'Audio-Visual Vibe Coding' ability without specific training
Study finds GPT-4 generates product ideas scoring 2.5x higher in creativity than human crowdworkers.
Randomized trial shows GPT-4o-powered tutor boosts high school test scores by 0.15 standard deviations
Ecosystem
GPT-4o
Qwen3.5-Omni
No mapped relationships