Google brings voice-powered prompting to Docs and Keep

Google has introduced voice-based prompting in Docs and Keep, allowing users to create, edit, and organise content using spoken commands to improve productivity.

Shivangi Yadav

May 23, 2026 - 07:56

Google brings voice-powered prompting to Docs and Keep

At its annual Google I/O developer conference, Google announced a series of new voice-powered features coming to its Workspace applications, including Docs, Keep, and Gmail. The updates are designed to make interacting with productivity tools more conversational by allowing users to speak naturally to Gemini rather than relying primarily on typing.

The company believes voice input can significantly improve how users create documents, organise information, capture ideas, and retrieve important details from their accounts. By integrating voice-based prompting directly into Workspace applications, Google aims to streamline tasks that traditionally require multiple manual steps and lengthy text input.

One of the most significant additions is the arrival of Google Docs. The new feature allows users to generate document drafts using spoken instructions. Rather than manually entering information into a document, users can describe what they want, and Gemini will create a draft based on those requests.

During a demonstration at the event, Google showed how a user could ask Gemini to gather résumé information stored in Google Drive, combine it with event logistics pulled from an email, and even include humorous personal anecdotes within the same document. The AI system then automatically assembled the requested information into a structured draft.

Traditionally, completing a task of this nature would require users to type instructions manually and often engage in several rounds of follow-up prompts. Users might write short commands, make corrections, add context, and continue refining the request over multiple conversational turns. According to Google, this process can be time-consuming and interrupt creative or productive workflows.

The company’s new approach leverages voice as a more natural input method. Instead of breaking instructions into multiple prompts, users can speak in complete sentences and provide several requests simultaneously. This allows them to communicate complex ideas more efficiently while reducing the number of interactions required to complete a task.

Google also highlighted the system’s ability to understand conversational changes during a request. If a user modifies a detail or changes their mind midway through speaking, Gemini can recognise the adjustment and incorporate the updated instruction into the final result without requiring the conversation to restart.

Google CEO Sundar Pichai said the company’s long-term vision extends beyond simply creating document drafts. In the future, users are expected to be able to create and edit documents entirely through voice interactions, enabling a more hands-free productivity experience.

Beyond Docs, Google is introducing new voice-based capabilities to Google Keep. The feature is designed to make capturing ideas faster and more convenient by allowing users to record their thoughts verbally rather than typing them.

Users can speak freely into the application, and Gemini will automatically transcribe the audio while organising the content into a structured note or task list. Rather than receiving a raw transcription, users obtain a formatted output that is easier to review, edit, and act upon later.

This functionality mirrors features that have gained popularity of specialised note-taking platforms in recent years. Applications such as Voicenotes and AudioPen introduced similar AI-powered voice-to-note capabilities several years ago. More recently, dictation-focused products, including Wispr Flow, Monologue, and Aqu, have incorporated comparable functionality into their voice-based typing and productivity tools.

Google has also been investing heavily in this category. Earlier this month, the company introduced Rambler, its own dictation-focused product built directly into Gboard. Rambler works across multiple applications and provides users with an AI-powered voice input experience that goes beyond traditional speech-to-text.

The company is now bringing similar capabilities directly into Gmail as part of the latest Workspace updates. With the new voice-powered experience, users can engage in conversational interactions with Gemini to locate information stored in their emails.

For example, users can ask Gemini questions about upcoming travel plans, reservation details, or scheduled appointments. During demonstrations, Google showed examples such as asking for information about a future flight, retrieving the confirmation code associated with an Airbnb reservation, or finding the time of a physician appointment.

Instead of manually searching through inboxes and reading multiple messages, users can ask questions in natural language and receive the information they need. The feature is intended to make email management more efficient while reducing the effort required to locate specific details buried within large inboxes.

Google’s latest announcements reflect a broader trend occurring throughout the technology industry. As artificial intelligence becomes integrated into an increasing number of products and services, users are becoming more comfortable interacting with systems through detailed and conversational requests.

Many modern AI applications encourage users to submit longer prompts that include multiple instructions, contextual details, and complex requirements. While typing such requests is certainly possible, voice often provides a faster and more intuitive way to communicate lengthy ideas and multi-step tasks.

Speaking naturally allows users to express thoughts continuously without worrying about formatting or typing speed. This can be particularly useful when describing complicated workflows, planning projects, brainstorming ideas, or requesting information that requires several connected actions.

Another factor driving the shift toward voice interactions is the growing sophistication of modern AI models. Current-generation systems have become increasingly capable of understanding conversational context, recognising and interpreting changes that occur during a single spoken request.

For example, a user can begin describing one task, modify a requirement midway through the sentence, and continue speaking without interruption. Advanced language models can often determine the user’s final intent and generate an appropriate response based on the updated instructions.

Google appears to view this capability as an important component of future productivity experiences. By incorporating voice-powered prompting into Docs, Keep, Gmail, and potentially additional Workspace applications, the company is moving toward interfaces that feel more conversational and less dependent on traditional keyboard-based interactions.

As AI continues to reshape how people work, communicate, and organise, Slack’s latest Workspace enhancements demonstrate the company’s commitment to expanding voice as a primary way to interact with digital tools. The new features aim to simplify document creation, note-taking, information retrieval, and everyday productivity tasks, while enabling users to communicate with technology more naturally and efficiently.

Tags:

What's Your Reaction?

Like 0

Dislike 0

Love 0

Funny 0

Angry 0

Sad 0

Wow 0

Shivangi Yadav Shivangi Yadav reports on startups, technology policy, and other significant technology-focused developments in India for TechAmerica.Ai. She previously worked as a research intern at ORF.