最近CSDN开展了《0元试用微软 Azure人工智能认知服务,精美礼品大放送 》,当前目前活动还在继续,热心的我已经第一时间报名参与,只不过今天才有时间实际的试用。
目前活动要求博文形式分享试用语音转文本、文本转语音、语音翻译、文本分析、文本翻译、语言理解中三项以上的服务。
目前我在试用了 语音转文本、文本转语音、语音翻译 功能后,决定做一个实时语音翻译机,使用后效果是真不错。
下面我们看看如何操作吧,首先我们进入:https://portal./并登录。
获取密钥
在搜索框输入 认知服务 并确认:
然后可以创建语音服务:
然后输入名称,选择位置,选择免费定价,新增资源组并选择:
之后,点击创建。创建过程中会显示正在部署:
部署完成后,点击转到资源:
然后我们点击密钥和终结点,查看密钥和位置/区域:
有两个密钥任选一个即可,位置/区域也需要记录下来,后面我们的程序就需要通过密钥和位置来调用。
Azure 认知服务初体验
Azure 认知服务文档:https://docs./zh-cn/cognitive-services/
按文档要求,我们首先安装Azure 语音相关的python库:
pip install azure-cognitiveservices-speech
首先我们体验一下语音转文本:
测试语音转文本
文档:https://docs./zh-cn/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-python
复制官方的代码后,简单修改下实现从麦克风识别语音:
import azure. cognitiveservices. speech as speechsdk
speech_key, service_region = "59392xxxxxxxxxx559de" , "chinaeast2"
speech_config = speechsdk. SpeechConfig(
subscription= speech_key, region= service_region, speech_recognition_language= "zh-cn" )
speech_recognizer = speechsdk. SpeechRecognizer( speech_config= speech_config)
print ( "说:" , end= "" )
result = speech_recognizer. recognize_once( )
print ( result. text)
speech_recognition_language决定了语言,这里我设置为中文。
我运行后,对麦克风说了一句话,程序已经准确的识别出我说的内容:
说:微软人工智能服务非常好用。
测试文本转语音
文档:https://docs./zh-cn/cognitive-services/speech-service/get-started-text-to-speech?tabs=script%2Cwindowsinstall&pivots=programming-language-python
借助文档我们还可以实现将转换完成的语音保存起来,但这里我只演示直接声音播放出来:
from azure. cognitiveservices. speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure. cognitiveservices. speech. audio import AudioOutputConfig
speech_config. speech_synthesis_language = "zh-cn"
audio_config = AudioOutputConfig( use_default_speaker= True )
speech_synthesizer = SpeechSynthesizer(
speech_config= speech_config, audio_config= audio_config)
text_words = "微软人工智能服务非常好用。"
result = speech_synthesizer. speak_text_async( text_words) . get( )
if result. reason != speechsdk. ResultReason. SynthesizingAudioCompleted:
print ( result. reason)
感觉转换效果很好。
测试语音翻译功能
文档地址:https://docs./zh-cn/cognitive-services/speech-service/get-started-speech-translation?tabs=script%2Cwindowsinstall&pivots=programming-language-python
经测试,语音翻译同时包含了语音转文本和翻译功能:
from_language, to_language = 'zh-cn' , 'en'
translation_config = speechsdk. translation. SpeechTranslationConfig(
subscription= speech_key, region= service_region, speech_recognition_language= from_language)
translation_config. add_target_language( to_language)
recognizer = speechsdk. translation. TranslationRecognizer(
translation_config= translation_config)
def speakAndTranslation ( ) :
result = recognizer. recognize_once( )
if result. reason == speechsdk. ResultReason. TranslatedSpeech:
return result. text, result. translations[ to_language]
elif result. reason == speechsdk. ResultReason. RecognizedSpeech:
return result. text, None
elif result. reason == speechsdk. ResultReason. NoMatch:
print ( result. no_match_details)
elif result. reason == speechsdk. ResultReason. Canceled:
print ( result. cancellation_details)
speakAndTranslation( )
这里执行后并说一句话,结果:
('大家好才是真的好。', 'Everyone is really good.')
可以同时获取原始文本和译文,所以我们后面的语音翻译工具,也都使用该接口。
语音翻译机开发
程序的大致逻辑结构:
完整代码:
"""
小小明的代码
CSDN主页:https://blog.csdn.net/as604049322
"""
__author__ = '小小明'
__time__ = '2021/10/30'
import azure. cognitiveservices. speech as speechsdk
from azure. cognitiveservices. speech. audio import AudioOutputConfig
speech_key, service_region = "59xxxxde" , "chinaeast2"
speech_config = speechsdk. SpeechConfig( subscription= speech_key, region= service_region,
speech_recognition_language= "zh-cn" )
speech_config. speech_synthesis_language = "zh-cn"
audio_config = AudioOutputConfig( use_default_speaker= True )
speech_synthesizer = speechsdk. SpeechSynthesizer(
speech_config= speech_config, audio_config= audio_config)
from_language, to_language = 'zh-cn' , 'en'
translation_config = speechsdk. translation. SpeechTranslationConfig(
subscription= speech_key, region= service_region, speech_recognition_language= from_language)
translation_config. add_target_language( to_language)
recognizer = speechsdk. translation. TranslationRecognizer(
translation_config= translation_config)
def speakAndTranslation ( ) :
result = recognizer. recognize_once( )
if result. reason == speechsdk. ResultReason. TranslatedSpeech:
return result. text, result. translations[ to_language]
elif result. reason == speechsdk. ResultReason. RecognizedSpeech:
return result. text, None
elif result. reason == speechsdk. ResultReason. NoMatch:
print ( result. no_match_details)
elif result. reason == speechsdk. ResultReason. Canceled:
print ( result. cancellation_details)
def speak ( text_words) :
result = speech_synthesizer. speak_text_async( text_words) . get( )
# print(result.reason)
if result. reason == speechsdk. ResultReason. Canceled:
cancellation_details = result. cancellation_details
print ( "识别取消:" , cancellation_details. reason)
if cancellation_details. reason == speechsdk. CancellationReason. Error:
if cancellation_details. error_details:
print ( "错误详情:" , cancellation_details. error_details)
while True :
print ( "说:" , end= " " )
text, translation_text = speakAndTranslation( )
print ( text)
print ( "译文:" , translation_text)
if "退出" in text:
break
if text:
speak( translation_text)
简单的运行了一下,中间的打印效果如下:
说: 我只想进转过山和大海。
译文: I just want to go in and out of the mountains and the sea.
说: 也穿越,人山人海。
译文: Also through, the sea of people and mountains.
说: 我曾经目睹这一切全部都随风飘然。
译文: I've seen it all blow in the wind.
说: 转眼成空。
译文: It's empty.
说: 问,世间能有几多愁?
译文: Q, how much worry can there be in the world?
说: 退出。
译文: quit.
最终的语音功能也只有各位亲自体验了噢。