Too much duplication
Some level of competition and parallel development is healthy for innovation, but the current situation appears increasingly wasteful. Multiple organizations are building similar capabilities, with each contributing a massive carbon footprint. This redundancy becomes particularly questionable when many models perform similarly on standard benchmarks and real-world tasks.
The differences in capabilities between LLMs are often subtle; most excel at similar tasks such as language generation, summarization, and coding. Although some models, like GPT-4 or Claude, may slightly outperform others in benchmarks, the gap is typically incremental rather than revolutionary.
Most LLMs are trained on overlapping data sets, including publicly available internet content (Wikipedia, Common Crawl, books, forums, news, etc.). This shared foundation leads to similarities in knowledge and capabilities as models absorb the same factual data, linguistic patterns, and biases. Variations arise from fine-tuning proprietary data sets or slight architectural adjustments, but the core general knowledge remains highly redundant across models.