1 min read

Google Using Anthropic's Claude to Benchmark Its Gemini AI

Google is reportedly using Anthropic's AI model, Claude, to compare and evaluate the performance of its own advanced AI model, Gemini, according to internal correspondence reviewed by TechCrunch. This revelation raises questions about potential violations of Anthropic's terms of service and industry best practices regarding AI benchmarking.

Internal Google communications, seen by TechCrunch, indicate that contractors working on Gemini are tasked with rating the model's responses against outputs produced by Claude. This involves assessing responses based on criteria such as truthfulness and verbosity, with contractors allotted up to 30 minutes per prompt to compare Gemini's performance to Claude's.

The correspondence reveals instances where Claude's responses explicitly identify it as "Claude, created by Anthropic," prompting contractors to question the propriety of using a competitor's model for internal testing.

One internal chat highlighted Claude's stricter safety settings compared to Gemini. Contractors noted instances where Claude refused to engage with prompts deemed unsafe, such as role-playing a different AI assistant, while Gemini generated responses flagged as "huge safety violations" for containing inappropriate content.

Anthropic's terms of service prohibit customers from accessing Claude "to build a competing product or service" or "train competing AI models" without prior approval. Notably, Google is a significant investor in Anthropic.

When contacted by TechCrunch, Shira McNamara, a spokesperson for Google DeepMind, which oversees Gemini, acknowledged that Google does compare model outputs for evaluation purposes but denied using Anthropic models to train Gemini.

"Of course, in line with standard industry practice, in some cases we compare model outputs as part of our evaluation process," McNamara stated. "However, any suggestion that we have used Anthropic models to train Gemini is inaccurate."

Anthropic, when contacted by TechCrunch, did not immediately respond to requests for comment.

This revelation follows a separate TechCrunch report last week detailing concerns from Google contractors regarding Gemini's potential to generate inaccurate information on sensitive topics like healthcare, prompting them to express unease about their role in evaluating the model's responses in areas outside their expertise.

TechCrunch's reporting underscores the complexities and ethical considerations surrounding the rapid advancement of AI, particularly in the context of intense competition and evolving industry best practices.