由 DALL·E 3 生成,prompt:A person and a machine are engaged in two-way communication through a microphone and speakers. The person, standing on the left, speaks into the microphone while the machine on the right, resembling a sleek, futuristic robot, responds through speakers. The setting is a modern, well-lit room with a professional atmosphere. The person looks focused and engaged, and the machine's digital display shows sound waves indicating speech. 语音交互系统简介 语音交互系统主要由自动语音识别(Automatic Speech Recognition, 简称 ASR)、自然语言处理(Natural Language Processing, 简称 NLP)和文本到语音合成(Text to Speech,简称 TTS)三个环节构成。ASR 相当于人的听觉系统,NLP 相当于人的大脑语言区域,TTS 相当于人的发声系统。 如何构建语音对话机器人 本文将完全利用开源方案构建语音对话机器人。
本文基于 Gradio 实现的交互界面如图: 你可以基于系统麦克风采集音频,通过 Whisper 转录为文本,调用 DeepSeek v2 API 后,再将对话输出经过 ChatTTS 合成为语音,点击播放即可听到来自机器人的声音。 硬件环境:RTX 3060, 12GB 显存 软件环境信息(Miniconda3 + Python 3.8.19): pip list Package Version ----------------------------- -------------- absl-py 2.0.0 accelerate 0.25.0 aiofiles 23.2.1 aiohttp 3.8.6 aiosignal 1.3.1 altair 5.1.2 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.0.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 asttokens 2.4.1 astunparse 1.6.3 async-lru 2.0.4 async-timeout 4.0.3 attrs 23.1.0 audioread 3.0.1 Babel 2.15.0 backcall 0.2.0 backports.zoneinfo 0.2.1 beautifulsoup4 4.12.3 bitarray 2.8.2 bitsandbytes 0.41.1 bleach 6.1.0 blinker 1.6.3 cachetools 5.3.1 cdifflib 1.2.6 certifi 2023.7.22 cffi 1.16.0 charset-normalizer 2.1.1 click 8.1.7 colorama 0.4.6 comm 0.2.2 contourpy 1.1.1 cpm-kernels 1.0.11 cycler 0.12.1 Cython 3.0.3 debugpy 1.8.1 decorator 5.1.1 defusedxml 0.7.1 distro 1.9.0 dlib 19.24.2 edge-tts 6.1.8 editdistance 0.8.1 einops 0.8.0 einx 0.2.2 encodec 0.1.1 exceptiongroup 1.1.3 executing 2.0.1 face-alignment 1.4.1 fairseq 0.12.2 faiss-cpu 1.7.4 fastapi 0.108.0 fastjsonschema 2.19.1 ffmpeg 1.4 ffmpeg-python 0.2.0 ffmpy 0.3.1 filelock 3.12.4 Flask 2.1.2 Flask-Cors 3.0.10 flatbuffers 23.5.26 fonttools 4.43.1 fqdn 1.5.1 frozendict 2.4.4 frozenlist 1.4.0 fsspec 2023.9.2 future 0.18.3 gast 0.4.0 gitdb 4.0.10 GitPython 3.1.37 google-auth 2.23.3 google-auth-oauthlib 1.0.0 google-pasta 0.2.0 gradio 4.32.2 gradio_client 0.17.0 grpcio 1.59.0 h11 0.14.0 h5py 3.10.0 httpcore 0.18.0 httpx 0.25.0 huggingface-hub 0.23.2 hydra-core 1.0.7 idna 3.4 imageio 2.31.5 importlib-metadata 6.8.0 importlib-resources 6.1.0 inflect 7.2.1 ipykernel 6.29.4 ipython 8.12.3 ipywidgets 8.1.3 isoduration 20.11.0 itsdangerous 2.1.2 jedi 0.19.1 Jinja2 3.1.2 joblib 1.3.2 json5 0.9.25 jsonpointer 2.4 jsonschema 4.19.1 jsonschema-specifications 2023.7.1 jupyter 1.0.0 jupyter_client 8.6.2 jupyter-console 6.6.3 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.1 jupyter_server_terminals 0.5.3 jupyterlab 4.2.1 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.2 jupyterlab_widgets 3.0.11 keras 2.13.1 kiwisolver 1.4.5 langdetect 1.0.9 latex2mathml 3.77.0 lazy_loader 0.3 libclang 16.0.6 librosa 0.9.1 llvmlite 0.41.0 loguru 0.7.2 lxml 4.9.3 Markdown 3.5 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.7.3 matplotlib-inline 0.1.7 mdtex2html 1.2.0 mdurl 0.1.2 mistune 3.0.2 more-itertools 10.1.0 mpmath 1.3.0 multidict 6.0.4 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nemo_text_processing 1.0.2 nest-asyncio 1.6.0 networkx 3.1 notebook 7.2.0 notebook_shim 0.2.4 numba 0.58.0 numpy 1.22.4 oauthlib 3.2.2 omegaconf 2.3.0 onnx 1.14.1 onnxoptimizer 0.3.13 onnxsim 0.4.33 openai 1.6.1 openai-whisper 20230918 opencv-python 4.8.1.78 opt-einsum 3.3.0 orjson 3.9.9 overrides 7.7.0 packaging 23.2 pandas 2.0.3 pandocfilters 1.5.1 parso 0.8.4 peft 0.7.1 pickleshare 0.7.5 Pillow 10.0.1 pip 24.0 pkgutil_resolve_name 1.3.10 platformdirs 3.11.0 playsound 1.3.0 pooch 1.7.0 portalocker 2.8.2 praat-parselmouth 0.4.3 prometheus_client 0.20.0 prompt_toolkit 3.0.45 protobuf 4.25.1 psutil 5.9.5 pure-eval 0.2.2 pyarrow 13.0.0 pyasn1 0.5.0 pyasn1-modules 0.3.0 PyAudio 0.2.12 pycparser 2.21 pydantic 2.5.3 pydantic_core 2.14.6 pydeck 0.8.1b0 pydub 0.25.1 Pygments 2.16.1 pynini 2.1.5 pynvml 11.5.0 PyOpenGL 3.1.7 pyparsing 3.1.1 python-dateutil 2.8.2 python-json-logger 2.0.7 python-multipart 0.0.9 pytz 2023.3.post1 PyWavelets 1.4.1 pywin32 306 pywinpty 2.0.13 pyworld 0.3.0 PyYAML 6.0.1 pyzmq 26.0.3 qtconsole 5.5.2 QtPy 2.4.1 referencing 0.30.2 regex 2023.10.3 requests 2.32.3 requests-oauthlib 1.3.1 resampy 0.4.2 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.6.0 rpds-py 0.10.4 rsa 4.9 ruff 0.4.7 sacrebleu 2.3.1 sacremoses 0.1.1 safetensors 0.4.3 scikit-image 0.18.1 scikit-learn 1.3.1 scikit-maad 1.3.12 scipy 1.7.3 semantic-version 2.10.0 Send2Trash 1.8.3 sentencepiece 0.1.99 setuptools 69.5.1 shellingham 1.5.4 six 1.16.0 smmap 5.0.1 sniffio 1.3.0 sounddevice 0.4.5 SoundFile 0.10.3.post1 soupsieve 2.5 sse-starlette 1.8.2 stack-data 0.6.3 starlette 0.32.0.post1 streamlit 1.29.0 sympy 1.12 tabulate 0.9.0 tenacity 8.2.3 tensorboard 2.13.0 tensorboard-data-server 0.7.1 tensorboardX 2.6.2.2 tensorflow 2.13.0 tensorflow-estimator 2.13.0 tensorflow-intel 2.13.0 tensorflow-io-gcs-filesystem 0.31.0 termcolor 2.3.0 terminado 0.18.1 threadpoolctl 3.2.0 tifffile 2023.7.10 tiktoken 0.3.3 timm 0.9.12 tinycss2 1.3.0 tokenizers 0.19.1 toml 0.10.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.0 torch 2.1.0+cu121 torchaudio 2.1.0+cu121 torchcrepe 0.0.22 torchvision 0.16.0+cu121 tornado 6.3.3 tqdm 4.63.0 traitlets 5.14.3 transformers 4.41.2 transformers-stream-generator 0.0.4 trimesh 4.0.0 typeguard 4.3.0 typer 0.12.3 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.0 tzdata 2023.3 tzlocal 5.1 uri-template 1.3.0 urllib3 2.2.1 uvicorn 0.25.0 validators 0.22.0 vector_quantize_pytorch 1.14.8 vocos 0.1.0 watchdog 3.0.0 wcwidth 0.2.13 webcolors 1.13 webencodings 0.5.1 websocket-client 1.8.0 websockets 11.0.3 Werkzeug 3.0.0 WeTextProcessing 0.1.12 wget 3.2 wheel 0.43.0 widgetsnbextension 4.0.11 win32-setctime 1.1.0 wrapt 1.15.0 yarl 1.9.2 zipp 3.17.0 WebUI 代码如下(目前只是演示基本功能,比较简陋):
在此基础上,可以增加更多功能:
如果环境搭建遇到困难,可以私信获取完整项目。 点击下方卡片,关注“慢慢学AIGC” |
|