Microsoft launches three new foundational AI models to compete with rivals

Microsoft introduces three new foundational AI models, intensifying competition with OpenAI, Google, and Meta in the fast-growing AI landscape.

Shivangi Yadav

Apr 6, 2026 - 07:55

Microsoft has introduced three new foundational artificial intelligence models through its research division, Microsoft AI, marking another step in the company's effort to strengthen its presence in the rapidly evolving multimodal AI landscape.

The newly released models are designed to handle text, voice, and image generation tasks, reflecting Microsoft's broader strategy of building a comprehensive in-house AI ecosystem while continuing its partnership with OpenAI.

Among the new offerings is MAI-Transcribe-1, a speech-to-text model capable of transcribing audio in 25 different languages. According to Microsoft, the model operates approximately 2.5 times faster than its existing Azure Fast transcription service.

The second model, MAI-Voice-1, focuses on audio generation. It allows users to produce up to 60 seconds of audio in just one second and supports creating custom voice outputs, enabling re personalised and scalable audio content generation.

The third model, MAI-Image-2, is designed for video generation tasks. It had previously been introduced on MAI Playground — Microsoft's testing environment for large language and generative models — on March 19. With the latest announcement, all three models are now available through Microsoft Foundry, while the transcription and voice models are also accessible via MAI Playground.

These models were developed by Microsoft's MAI Superintelligence team, a research group led by Mustafa Suleyman. The team was established in November 2025 to advance next-generation AI systems.

Suleyman described the initiative as part of Microsoft's broader vision for what it calls "Humanist AI," emphasising systems designed with human interaction and usability at their core. He noted that the models are trained with a focus on practical, real-world communication and applications. He indicated that more releases are expected in the near future across Microsoft's platforms.

In a competitive market dominated by major players such as Google and OpenAI, Microsoft is positioning pricing as a key differentiator. The company stated that its models are intended to be more cost-effective than competitors' alternatives.

Pricing for the models starts at $0.36 per hour for MAI-Transcribe-1. MAI-Voice-1 is priced starting at $22 per one million characters, while MAI-Image-2 begins at $5 per one million tokens for text input and $33 per one million tokens for image output.

Despite expanding its own AI model portfolio, Microsoft continues to maintain a strong partnership with OpenAI. Suleyman reiterated this commitment in recent interviews, noting that a renegotiation of the partnership has given Microsoft greater flexibility to pursue its own superintelligence research initiatives.

Microsoft has invested more than $13 billion in OpenAI and integrates its models across a range of products through a long-term agreement. At the same time, the company is adopting a dual approach similar to its strategy in semiconductor development — building its own capabilities while continuing to collaborate with external partners.

The release of these new models underscores Microsoft's ambition to compete more directly in the foundational AI space, offering developers and enterprises a broader set of tools for building next-generation applications.

Tags:

What's Your Reaction?

Like 0

Dislike 0

Love 0

Funny 0

Angry 0

Sad 0

Wow 0

Shivangi Yadav Shivangi Yadav reports on startups, technology policy, and other significant technology-focused developments in India for TechAmerica.Ai. She previously worked as a research intern at ORF.