웹소켓 프로토콜을 사용한 양방향 Realtime 스트리밍용 음성합성 API 입니다.
- Input Message : LLM Output tokens
 
- Output Message : Audio chunks (base64 encoded)
 
| Zone | IP | Port | Path | 
|---|
| STG | 10.40.101.161 | 36000 | tvoice/tts/ws/v1 | 
| PRD | 172.18.171.204 | 36000 | tvoice/tts/ws/v1 | 
 
- url : ws://{host}:{ip}/tvoice/tts/ws/v1
 
| params | type | required | description | 
|---|
| svc | string | O | service name: [adot, aster, ...] | 
| model | string | O | model name: [axtts-3-0, ...] | 
 
아래 모든 Message는 JSON 타입으로 정의한다.
| key | type | required | description | 
|---|
| api_key | string | O | VoiceGen AI 팀에서 발행 | 
| poc_id | string | X | Default: None | 
| request_id | string | X | Default: None | 
| voice_settings | object | O | "voice_setting": {"voice": "emily"} | 
| apply_tn | bool | X | Default: False | 
| text | string | O | "text": "sos" | 
 
- svc query 와 api_key 가 matching 되지 않으면, socket 종료된다.
 
- voice 는 현재 emily, sophie, jemma 를 지원한다.
 
| key | type | required | description | 
|---|
| text | string | O | "text": "{llm tokens}" | 
 
| key | type | required | description | 
|---|
| text | string | O | "text": "eos" | 
 
| key | type | description | 
|---|
| audio | string | base64 encoded, 24khz, pcm, 16bits, mono | 
 
| key | type | description | 
|---|
| is_final | bool |  | 
| input_sentence | string |  | 
| input_tn_sentence | string |  | 
 
import asyncio
import websockets
import os, json, base64
from openai import AsyncOpenAI
# Define API keys and voice ID
OPENAI_API_KEY = '***'
host = 'ws://127.0.0.1:36001'
tts_path = '/tvoice/tts/ws/v1'
# Construct the TTS query
service = 'adot'
model = 'axtts-3-0'
tts_query = f'?svc={service}&model={model}'
voice = 'emily'
apply_tn = True
tts_api_key = '***'
# Set OpenAI API key
aclient = AsyncOpenAI(api_key=OPENAI_API_KEY)
async def stream(audio_stream):
    idx = 1
    
    print("Started streaming audio")
    async for chunk in audio_stream:
        if chunk:
            print('Recv Chunk {} : {}'.format(idx, len(chunk)))
            idx += 1
async def text_to_speech_input_streaming(voice, text_iterator):
    uri = host + tts_path + tts_query 
    async with websockets.connect(uri) as websocket:
        # 1. Send Setup and SOS Message
        await websocket.send(json.dumps({
            "text": "sos",
            "voice_settings": {"voice": voice},
            "apply_tn": apply_tn,
            "api_key": tts_api_key,
            "poc_id": "adot.agent",
            "request_id": "123456789"
        }))
        async def listen():
            """Listen to the websocket for audio data and stream it."""
            while True:
                try:
                    message = await websocket.recv()
                    data = json.loads(message)
                    if data.get("audio"):
                        audio_chunk = base64.b64decode(data["audio"])
                        yield audio_chunk
                    elif data.get("is_final"):
                        print(data)
                        break
                    else:
                        print(data)
                except websockets.exceptions.ConnectionClosed as e:
                    print(f"Connection closed by server: {e.code}, {e.reason}")
                    break
        listen_task = asyncio.create_task(stream(listen()))
        # 2. Send Text Messages
        async for text in text_iterator:
            if text != "":
                #print("Send Text: ", text)
                await websocket.send(json.dumps({"text": text}))
        # 3. Send EOS Message
        await websocket.send(json.dumps({"text": "eos"}))
        await listen_task
async def chat_completion(query):
    """Retrieve text from OpenAI and pass it to the text-to-speech function."""
    response = await aclient.chat.completions.create(model='gpt-4o-mini', messages=[{'role': 'user', 'content': query}],
    temperature=1, stream=True)
    async def text_iterator():
        async for chunk in response:
            delta = chunk.choices[0].delta
            if delta.content is not None:
                yield delta.content
    await text_to_speech_input_streaming(voice, text_iterator())
# Main execution
if __name__ == "__main__":
    user_query = "안녕, 만나서 반가워."
    asyncio.run(chat_completion(user_query))