Automatic text-to-speech

Speech2 leverages the power of cloud computing and AI to deliver transcripts of audio or video files in less than real time.

Any questions? Feel free to contact us.


Quick turnaround

The time it takes to transcribe your content depends on the language and the length of your content, but you should expect the transcript in less time than it would take to play it back.

Multiple languages

Speech2 currently supports English (US) and Japanese. More languages are on the way!

Flexible download options

In addition to downloading the plain text of the transcript, there are also options to download text with timestamps and subtitles in WebVTT format.

Customized to fit your needs

Speech2 can be customized to fit your needs: we can do custom API integrations for both source files and the transcripted content. Contact us with a brief summary of what you’d like, and we’ll get back to you as soon as possible.

How much does Speech2 cost?

30 minutes free

The first 30 minutes of audio or video content you transcribe is free — no credit card required to start.


Each minute transcribed is $0.05 (USD). No premiums for video content, no minimums, no long-term contracts. You only pay for what you use.

About us

Speech2 is a service run by KotobaMedia, an independent software engineering consultant based in Tokyo, Japan.

Contact Us

[email protected]