【原】用Azure认知服务开发一个语音翻译机，学英文很爽快

小小明代码实体 2021-11-30

展开全文

最近CSDN开展了《0元试用微软 Azure人工智能认知服务，精美礼品大放送》，当前目前活动还在继续，热心的我已经第一时间报名参与，只不过今天才有时间实际的试用。

目前活动要求博文形式分享试用语音转文本、文本转语音、语音翻译、文本分析、文本翻译、语言理解中三项以上的服务。

目前我在试用了语音转文本、文本转语音、语音翻译功能后，决定做一个实时语音翻译机，使用后效果是真不错。

下面我们看看如何操作吧，首先我们进入：https://portal./并登录。

获取密钥

在搜索框输入认知服务并确认：

然后可以创建语音服务：

然后输入名称，选择位置，选择免费定价，新增资源组并选择：

之后，点击创建。创建过程中会显示正在部署：

部署完成后，点击转到资源：

然后我们点击密钥和终结点，查看密钥和位置/区域：

有两个密钥任选一个即可，位置/区域也需要记录下来，后面我们的程序就需要通过密钥和位置来调用。

Azure 认知服务初体验

Azure 认知服务文档：https://docs./zh-cn/cognitive-services/

按文档要求，我们首先安装Azure 语音相关的python库：

pip install azure-cognitiveservices-speech

首先我们体验一下语音转文本：

测试语音转文本

文档：https://docs./zh-cn/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-python

复制官方的代码后，简单修改下实现从麦克风识别语音：

import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = "59392xxxxxxxxxx559de", "chinaeast2"
speech_config = speechsdk.SpeechConfig(
    subscription=speech_key, region=service_region, speech_recognition_language="zh-cn")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("说：", end="")
result = speech_recognizer.recognize_once()
print(result.text)

speech_recognition_language决定了语言，这里我设置为中文。

我运行后，对麦克风说了一句话，程序已经准确的识别出我说的内容：

说：微软人工智能服务非常好用。

测试文本转语音

文档：https://docs./zh-cn/cognitive-services/speech-service/get-started-text-to-speech?tabs=script%2Cwindowsinstall&pivots=programming-language-python

借助文档我们还可以实现将转换完成的语音保存起来，但这里我只演示直接声音播放出来:

from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig

speech_config.speech_synthesis_language = "zh-cn"
audio_config = AudioOutputConfig(use_default_speaker=True)
speech_synthesizer = SpeechSynthesizer(
    speech_config=speech_config, audio_config=audio_config)

text_words = "微软人工智能服务非常好用。"
result = speech_synthesizer.speak_text_async(text_words).get()
if result.reason != speechsdk.ResultReason.SynthesizingAudioCompleted:
    print(result.reason)

感觉转换效果很好。

测试语音翻译功能

文档地址：https://docs./zh-cn/cognitive-services/speech-service/get-started-speech-translation?tabs=script%2Cwindowsinstall&pivots=programming-language-python

经测试，语音翻译同时包含了语音转文本和翻译功能：

from_language, to_language = 'zh-cn', 'en'
translation_config = speechsdk.translation.SpeechTranslationConfig(
    subscription=speech_key, region=service_region, speech_recognition_language=from_language)
translation_config.add_target_language(to_language)
recognizer = speechsdk.translation.TranslationRecognizer(
    translation_config=translation_config)


def speakAndTranslation():
    result = recognizer.recognize_once()
    if result.reason == speechsdk.ResultReason.TranslatedSpeech:
        return result.text, result.translations[to_language]
    elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
        return result.text, None
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print(result.no_match_details)
    elif result.reason == speechsdk.ResultReason.Canceled:
        print(result.cancellation_details)


speakAndTranslation()

这里执行后并说一句话，结果：

('大家好才是真的好。', 'Everyone is really good.')

可以同时获取原始文本和译文，所以我们后面的语音翻译工具，也都使用该接口。

语音翻译机开发

程序的大致逻辑结构：

完整代码：

"""
小小明的代码
CSDN主页：https://blog.csdn.net/as604049322
"""
__author__ = '小小明'
__time__ = '2021/10/30'

import azure.cognitiveservices.speech as speechsdk

from azure.cognitiveservices.speech.audio import AudioOutputConfig

speech_key, service_region = "59xxxxde", "chinaeast2"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region,
                                       speech_recognition_language="zh-cn")
speech_config.speech_synthesis_language = "zh-cn"
audio_config = AudioOutputConfig(use_default_speaker=True)
speech_synthesizer = speechsdk.SpeechSynthesizer(
    speech_config=speech_config, audio_config=audio_config)

from_language, to_language = 'zh-cn', 'en'
translation_config = speechsdk.translation.SpeechTranslationConfig(
    subscription=speech_key, region=service_region, speech_recognition_language=from_language)
translation_config.add_target_language(to_language)
recognizer = speechsdk.translation.TranslationRecognizer(
    translation_config=translation_config)


def speakAndTranslation():
    result = recognizer.recognize_once()
    if result.reason == speechsdk.ResultReason.TranslatedSpeech:
        return result.text, result.translations[to_language]
    elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
        return result.text, None
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print(result.no_match_details)
    elif result.reason == speechsdk.ResultReason.Canceled:
        print(result.cancellation_details)


def speak(text_words):
    result = speech_synthesizer.speak_text_async(text_words).get()
    #     print(result.reason)
    if result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("识别取消:", cancellation_details.reason)
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("错误详情：", cancellation_details.error_details)


while True:
    print("说：", end=" ")
    text, translation_text = speakAndTranslation()
    print(text)
    print("译文：", translation_text)
    if "退出" in text:
        break
    if text:
        speak(translation_text)

简单的运行了一下，中间的打印效果如下：

说： 我只想进转过山和大海。
译文： I just want to go in and out of the mountains and the sea.
说： 也穿越，人山人海。
译文： Also through, the sea of people and mountains.
说： 我曾经目睹这一切全部都随风飘然。
译文： I've seen it all blow in the wind.
说： 转眼成空。
译文： It's empty.
说： 问，世间能有几多愁？
译文： Q, how much worry can there be in the world?
说： 退出。
译文： quit.

最终的语音功能也只有各位亲自体验了噢。