웹소켓 프로토콜을 사용한 양방향 Realtime 스트리밍용 음성합성 API 입니다.
- Input Message : LLM Output tokens
- Output Message : Audio chunks (base64 encoded)
Zone | IP | Port | Path |
---|
STG | 10.40.101.161 | 36000 | tvoice/tts/ws/v1 |
PRD | - | - | tvoice/tts/ws/v1 |
- url : ws://{host}:{ip}/tvoice/tts/ws/v1
params | type | required | description |
---|
svc | string | O | service name: [adot, aster, ...] |
model | string | O | model name: [axtts-3-0, ...] |
아래 모든 Message는 JSON 타입으로 정의한다.
key | type | required | description |
---|
api_key | string | O | VoiceGen AI 팀에서 발행 |
poc_id | string | X | Default: None |
request_id | string | X | Default: None |
voice_settings | object | O | "voice_setting": {"voice": "emily"} |
apply_tn | bool | X | Default: False |
text | string | O | "text": "sos" |
- svc query 와 api_key 가 matching 되지 않으면, socket 종료된다.
- voice 는 현재 emily, sophie, jemma 를 지원한다.
key | type | required | description |
---|
text | string | O | "text": "{llm tokens}" |
key | type | required | description |
---|
text | string | O | "text": "eos" |
key | type | description |
---|
audio | string | base64 encoded, 24khz, pcm, 16bits, mono |
key | type | description |
---|
is_final | bool | |
input_sentence | string | |
input_tn_sentence | string | |
import asyncio
import websockets
import os, json, base64
from openai import AsyncOpenAI
# Define API keys and voice ID
OPENAI_API_KEY = '***'
host = 'ws://127.0.0.1:36001'
tts_path = '/tvoice/tts/ws/v1'
# Construct the TTS query
service = 'adot'
model = 'axtts-3-0'
tts_query = f'?svc={service}&model={model}'
voice = 'emily'
apply_tn = True
tts_api_key = '***'
# Set OpenAI API key
aclient = AsyncOpenAI(api_key=OPENAI_API_KEY)
async def stream(audio_stream):
idx = 1
print("Started streaming audio")
async for chunk in audio_stream:
if chunk:
print('Recv Chunk {} : {}'.format(idx, len(chunk)))
idx += 1
async def text_to_speech_input_streaming(voice, text_iterator):
uri = host + tts_path + tts_query
async with websockets.connect(uri) as websocket:
# 1. Send Setup and SOS Message
await websocket.send(json.dumps({
"text": "sos",
"voice_settings": {"voice": voice},
"apply_tn": apply_tn,
"api_key": tts_api_key,
"poc_id": "adot.agent",
"request_id": "123456789"
}))
async def listen():
"""Listen to the websocket for audio data and stream it."""
while True:
try:
message = await websocket.recv()
data = json.loads(message)
if data.get("audio"):
audio_chunk = base64.b64decode(data["audio"])
yield audio_chunk
elif data.get("is_final"):
print(data)
break
else:
print(data)
except websockets.exceptions.ConnectionClosed as e:
print(f"Connection closed by server: {e.code}, {e.reason}")
break
listen_task = asyncio.create_task(stream(listen()))
# 2. Send Text Messages
async for text in text_iterator:
if text != "":
#print("Send Text: ", text)
await websocket.send(json.dumps({"text": text}))
# 3. Send EOS Message
await websocket.send(json.dumps({"text": "eos"}))
await listen_task
async def chat_completion(query):
"""Retrieve text from OpenAI and pass it to the text-to-speech function."""
response = await aclient.chat.completions.create(model='gpt-4o-mini', messages=[{'role': 'user', 'content': query}],
temperature=1, stream=True)
async def text_iterator():
async for chunk in response:
delta = chunk.choices[0].delta
if delta.content is not None:
yield delta.content
await text_to_speech_input_streaming(voice, text_iterator())
# Main execution
if __name__ == "__main__":
user_query = "안녕, 만나서 반가워."
asyncio.run(chat_completion(user_query))