2 min read

Microsoft-Backed Startup Fastino Debuts Task-Optimized Enterprise AI Models That Run on CPUs

Fastino, a San Francisco-based startup, has emerged from stealth mode with a $7 million pre-seed funding round led by Insight Partners and M12, Microsoft’s Venture Fund, reports VentureBeat. The company is developing task-optimized enterprise AI models that offer improved performance at a lower cost, running efficiently on CPUs instead of requiring expensive GPUs.

Fastino's AI models are designed to excel in specific enterprise functions, such as structuring textual data, supporting retrieval-augmented generation (RAG) pipelines, task planning and reasoning, and generating JSON responses for function calling.

The company's CEO and co-founder, Ash Lewis, explained that the inspiration for Fastino stemmed from challenges encountered while building a developer agent technology called DevGPT, which utilized OpenAI's API.

“We were spending close to a million dollars a year on the API,” Lewis said to VentureBeat. “We didn’t feel like we had any real control over that.”

Fastino's approach differentiates itself from traditional large language models (LLMs) by focusing on task-specific optimization rather than creating general-purpose models. This aligns with a growing trend towards smaller language models (SLMs) that offer greater efficiency and cost-effectiveness.

However, Fastino distinguishes its models as "task-optimized" rather than SLMs, emphasizing that the term "small" often carries connotations of reduced accuracy. The company aims to establish a new model category that transcends the traditional size-based classifications.

A key differentiator for Fastino's models is their ability to run on CPUs, eliminating the need for GPU accelerators. This efficiency is achieved through various techniques that minimize the computational demands of the models.

“If we’re just talking absolutely simple terms, you just need to do less multiplication,” Lewis said. “A lot of our techniques in the architecture just focus on doing less tasks that require matrix multiplication.”

The models deliver responses in milliseconds, demonstrating their efficiency even on low-power hardware like a Raspberry Pi.

"I think a lot of enterprises are looking at TCO [total cost of ownership] for embedding AI in their application," said George Hurn-Maloney, Fastino's co-founder. "So the ability to remove expensive GPUs from the equation, I think, is obviously helpful, too."

While not yet generally available, Fastino is already collaborating with industry leaders in consumer devices, financial services, and e-commerce, including a major North American device manufacturer for home and automotive applications.

"Our ability to run on-prem is really good for industries that are pretty sensitive about their data," explained Hurn-Maloney. "The ability to run these models on-prem and on existing CPUs is quite enticing to financial services, healthcare and more data sensitive industries."