Realtime TTS

개요

웹소켓 프로토콜을 사용한 양방향 Realtime 스트리밍용 음성합성 API 입니다.

Input Message : LLM Output tokens
Output Message : Audio chunks (base64 encoded)

규격 연동

Endpoint

Zone	IP	Port	Path
STG	10.40.101.161	36000	tvoice/tts/ws/v1
PRD	-	-	tvoice/tts/ws/v1

url : ws://{host}:{ip}/tvoice/tts/ws/v1

Query Parameter

params	type	required	description
svc	string	O	service name: [adot, aster, ...]
model	string	O	model name: [axtts-3-0, ...]

Send Message

아래 모든 Message는 JSON 타입으로 정의한다.

1. Setup and SOS Message

key	type	required	description
api_key	string	O	VoiceGen AI 팀에서 발행
poc_id	string	X	Default: None
request_id	string	X	Default: None
voice_settings	object	O	"voice_setting": {"voice": "emily"}
apply_tn	bool	X	Default: False
text	string	O	"text": "sos"

svc query 와 api_key 가 matching 되지 않으면, socket 종료된다.
voice 는 현재 emily, sophie, jemma 를 지원한다.

2. Text Message

key	type	required	description
text	string	O	"text": "{llm tokens}"

3. EOS Message

key	type	required	description
text	string	O	"text": "eos"

Receive Message

1. Audio chunks

key	type	description
audio	string	base64 encoded, 24khz, pcm, 16bits, mono

2. Final Message

key	type	description
is_final	bool
input_sentence	string
input_tn_sentence	string

Example

OpenAI chatgpt 연동 예제

import asyncio
import websockets
import os, json, base64
from openai import AsyncOpenAI

# Define API keys and voice ID
OPENAI_API_KEY = '***'

host = 'ws://127.0.0.1:36001'
tts_path = '/tvoice/tts/ws/v1'

# Construct the TTS query
service = 'adot'
model = 'axtts-3-0'
tts_query = f'?svc={service}&model={model}'

voice = 'emily'
apply_tn = True
tts_api_key = '***'

# Set OpenAI API key
aclient = AsyncOpenAI(api_key=OPENAI_API_KEY)


async def stream(audio_stream):
    idx = 1
    
    print("Started streaming audio")
    async for chunk in audio_stream:
        if chunk:
            print('Recv Chunk {} : {}'.format(idx, len(chunk)))
            idx += 1


async def text_to_speech_input_streaming(voice, text_iterator):
    uri = host + tts_path + tts_query 

    async with websockets.connect(uri) as websocket:
        # 1. Send Setup and SOS Message
        await websocket.send(json.dumps({
            "text": "sos",
            "voice_settings": {"voice": voice},
            "apply_tn": apply_tn,
            "api_key": tts_api_key,
            "poc_id": "adot.agent",
            "request_id": "123456789"
        }))

        async def listen():
            """Listen to the websocket for audio data and stream it."""
            while True:
                try:
                    message = await websocket.recv()
                    data = json.loads(message)
                    if data.get("audio"):
                        audio_chunk = base64.b64decode(data["audio"])
                        yield audio_chunk
                    elif data.get("is_final"):
                        print(data)
                        break
                    else:
                        print(data)
                except websockets.exceptions.ConnectionClosed as e:
                    print(f"Connection closed by server: {e.code}, {e.reason}")
                    break

        listen_task = asyncio.create_task(stream(listen()))

        # 2. Send Text Messages
        async for text in text_iterator:
            if text != "":
                #print("Send Text: ", text)
                await websocket.send(json.dumps({"text": text}))

        # 3. Send EOS Message
        await websocket.send(json.dumps({"text": "eos"}))

        await listen_task


async def chat_completion(query):
    """Retrieve text from OpenAI and pass it to the text-to-speech function."""
    response = await aclient.chat.completions.create(model='gpt-4o-mini', messages=[{'role': 'user', 'content': query}],
    temperature=1, stream=True)

    async def text_iterator():
        async for chunk in response:
            delta = chunk.choices[0].delta
            if delta.content is not None:
                yield delta.content

    await text_to_speech_input_streaming(voice, text_iterator())


# Main execution
if __name__ == "__main__":
    user_query = "안녕, 만나서 반가워."

    asyncio.run(chat_completion(user_query))