Speech.speak() Workflow Documentation
This document provides a comprehensive overview of the Speech.speak() method execution flow in expo-edge-speech, detailing the complete text-to-speech synthesis process from user call to audio playback.
Overviewโ
The Speech.speak() method orchestrates a sophisticated text-to-speech synthesis process using Microsoft Edge TTS, implemented with a batch processing architecture that ensures reliable audio delivery through complete synthesis before playback.
Architecture Layersโ
expo-edge-speech follows a clean 3-layer architecture:
- ๐ฏ API Layer:
Speech- User-facing interface with input validation - โ๏ธ Core Layer:
Synthesizer,ConnectionManager,StateManager- Business logic coordination - ๐ง Services Layer:
NetworkService,AudioService,StorageService,VoiceService- Specialized functionality
Key Design Principlesโ
- ๐ Batch Processing: Complete synthesis before playback for reliability
- ๐ฏ Session Management: Unique session tracking for each synthesis request
- ๐ก๏ธ Error Recovery: Circuit breaker patterns and retry mechanisms
- ๐ฑ Platform Optimization: Native audio integration via expo-av
- โก Resource Management: Efficient memory and connection handling
Synthesis Workflow Overviewโ
// User initiates synthesis
await Speech.speak("Hello, world!", {
voice: 'en-US-AriaNeural',
rate: 1.0,
onStart: () => console.log('Playback started'),
onDone: () => console.log('Synthesis complete'),
onBoundary: (boundary) => console.log('Word:', boundary),
onError: (error) => console.error('Error:', error)
});
Process Flowโ
- ๐ Input Validation & Service Initialization
- ๐ Session Creation & SSML Generation
- ๐ Edge TTS Communication & Audio Synthesis
- ๐พ Complete Audio Collection & Storage
- ๐ต Audio Validation & Playback
- ๐งน Resource Cleanup & Session Completion
Detailed Workflow Sequenceโ
Phase 1: Initialization & Validationโ
// 1. User calls Speech.speak()
Speech.speak(text, options)
โ
// 2. Input validation and service initialization
Speech.initializeServices() // Lazy initialization on first call
โ
// 3. Delegate to Synthesizer
Synthesizer.speak(text, options)
Service Initialization Details:
- Lazy Loading: Services initialize only on first
speak()call - Configuration Locking: Prevents configuration changes after initialization
- Dependency Order: StorageService โ AudioService โ NetworkService โ VoiceService โ StateManager โ ConnectionManager โ Synthesizer
- Error Prevention: Failed initialization throws detailed errors
Phase 2: Session Management & SSML Generationโ
// 4. Create synthesis session
StateManager.createSynthesisSession(text, options)
โ
// Returns: { id: sessionId, connectionId: uniqueConnectionId }
// 5. Update session state
StateManager.updateSynthesisSession(sessionId, { state: 'Synthesizing' })
โ
// 6. Voice resolution
VoiceService.resolveVoice(options.voice, options.language)
โ
// 7. SSML generation
Synthesizer.generateSSML(text, resolvedVoice, options)
Session Management Features:
- Unique Session IDs: Each synthesis gets isolated session tracking
- Connection ID Mapping: Links sessions to specific Edge TTS connections
- State Tracking: Monitors synthesis progress through completion
- Concurrent Support: Multiple simultaneous synthesis sessions
Phase 3: Network Communication & Synthesisโ
// 8. Initiate synthesis
ConnectionManager.startSynthesis(ssml, connectionOptions)
โ
// 9. Connection management
ConnectionManager.establishNetworkConnection(connectionId, ssml, options)
โ
// 10. Edge TTS communication
NetworkService.synthesizeText(ssml, options, sessionId, connectionId)
Network Communication Process:
// NetworkService handles complete Edge TTS protocol
class NetworkService {
async synthesizeText(ssml, options, sessionId, connectionId) {
// 1. Establish WebSocket connection to Edge TTS
const connection = await this.createEdgeTTSConnection();
// 2. Send configuration and SSML
await this.sendConfiguration(connection);
await this.sendSSMLRequest(connection, ssml);
// 3. Collect ALL audio chunks (batch processing)
const audioChunks = [];
const boundaries = [];
for await (const message of connection) {
if (message.type === 'audio') {
audioChunks.push(message.data);
} else if (message.type === 'boundary') {
boundaries.push(this.parseBoundary(message));
}
}
// 4. Return complete response only after all data collected
return { audioChunks, boundaries };
}
}
Phase 4: Audio Processing & Storageโ
// 11. Process complete synthesis response
ConnectionManager receives SynthesisResponse { audioChunks[], boundaries[] }
โ
// 12. Store all audio chunks
for (const chunk of response.audioChunks) {
StorageService.addAudioChunk(connectionId, chunk);
}
โ
// 13. Process word boundaries
for (const boundary of response.boundaries) {
options.onBoundary?.(boundary); // Trigger user callback
}
โ
// 14. Prepare for playback
ConnectionManager.streamAudioToService(connectionId)
Storage Management:
- Connection-Scoped Buffers: Each synthesis gets isolated storage
- Memory Management: Configurable buffer limits with cleanup
- Data Integrity: Complete audio validation before playback
- Efficient Merging: Optimized audio chunk concatenation
Phase 5: Audio Playbackโ
// 15. Audio service handles playback
AudioService.speak(wrappedOptions, connectionId)
โ
// 16. Get complete audio buffer
const audioBuffer = StorageService.getMergedAudioData(connectionId);
โ
// 17. Validate and prepare audio
AudioService.validateEdgeTTSMP3(audioBuffer);
const tempFile = AudioService.createTempAudioFile(audioBuffer);
โ
// 18. Load and play audio
const audioPlayer = await AudioService.loadAudio(tempFile);
await AudioService.playAudio(audioPlayer);
Audio Playback Flow:
- Format Validation: Ensures MP3 format integrity
- Temporary File Creation: Converts buffer to playable file
- Expo AV Integration: Uses native audio capabilities
- Callback Coordination: Triggers user callbacks at appropriate times
Phase 6: Completion & Cleanupโ
// 19. Playback lifecycle events
AudioService triggers:
- options.onStart() // When playback begins
- options.onDone() // When playback completes successfully
- options.onError() // If playback fails
โ
// 20. Update session state
StateManager.updateSynthesisSession(sessionId, { state: 'Completed' });
โ
// 21. Resource cleanup
StorageService.cleanupConnection(connectionId);
ConnectionManager.releaseConnection(connectionId);
Technical Implementation Detailsโ
Batch Processing Architectureโ
The implementation uses batch processing rather than streaming for optimal reliability:
Benefits:
- โ Complete Validation: Full audio synthesis verification before playback
- โ Reliable Playback: No interruptions or quality degradation
- โ Error Prevention: Issues detected before user impact
- โ Predictable Performance: Consistent timing and resource usage
- โ Platform Compatibility: Works reliably across iOS, Android, and Web
Process Flow:
- Complete Synthesis: All audio chunks collected before proceeding
- Batch Storage: Entire audio buffer stored and validated
- Single Playback: One consolidated audio file played back
- Reliable Experience: No partial audio or interruption risks
Circuit Breaker Implementationโ
Protects against service failures and enables automatic recovery:
// Circuit Breaker States
enum CircuitBreakerState {
Closed, // Normal operation - all requests proceed
Open, // Service failing - reject requests immediately
HalfOpen // Testing recovery - allow limited test requests
}
class CircuitBreaker {
private state = CircuitBreakerState.Closed;
private failureCount = 0;
private successCount = 0;
async execute<T>(operation: () => Promise<T>): Promise<T> {
if (this.state === CircuitBreakerState.Open) {
if (this.shouldTestRecovery()) {
this.state = CircuitBreakerState.HalfOpen;
} else {
throw new Error('Circuit breaker is open - service unavailable');
}
}
try {
const result = await operation();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
}
Connection Managementโ
Handles resource limits and connection pooling:
Key Features:
- Connection Limits: Configurable maximum concurrent connections
- Pooling Strategy: Optional connection reuse for improved performance
- Resource Cleanup: Automatic connection lifecycle management
- App State Integration: Proper handling during app backgrounding
Configuration Options:
configure({
connection: {
maxConnections: 5, // Maximum concurrent connections
poolingEnabled: true, // Enable connection reuse
connectionTimeout: 10000, // Connection establishment timeout
circuitBreaker: {
failureThreshold: 5, // Failures before opening circuit
recoveryTimeout: 30000, // Time before testing recovery
testRequestLimit: 3 // Successful tests to close circuit
}
}
});
Error Handling & Recoveryโ
Comprehensive error management throughout the synthesis pipeline:
Error Categories:
- ๐ Network Errors: Connection failures, timeouts (retryable)
- ๐ Authentication Errors: Service access issues (non-retryable)
- ๐พ Resource Errors: Memory, storage issues (context-dependent)
- โ Validation Errors: Invalid SSML, format issues (non-retryable)
Recovery Strategies:
// Error handling with automatic retry
async function synthesizeWithRetry(text: string, options: SpeechOptions) {
const maxRetries = 3;
let lastError: Error;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await Speech.speak(text, options);
} catch (error) {
lastError = error;
if (isRetryableError(error) && attempt < maxRetries) {
const delay = calculateBackoffDelay(attempt);
await sleep(delay);
continue;
}
throw error;
}
}
}
Memory Managementโ
Efficient resource utilization with configurable limits:
Storage Configuration:
configure({
storage: {
maxBufferSize: 16 * 1024 * 1024, // 16MB maximum buffer
cleanupInterval: 30000, // 30 second cleanup cycle
warningThreshold: 0.8 // 80% usage warning
}
});
Memory Patterns:
- Connection-Scoped: Each synthesis gets isolated memory
- Automatic Cleanup: Periodic removal of completed sessions
- Warning System: Proactive memory usage monitoring
- Graceful Degradation: Configurable behavior when limits approached
Performance Characteristicsโ
Timing Considerationsโ
Synthesis Phase:
- Initial network connection: ~100-500ms
- SSML processing: ~50-200ms per 100 characters
- Audio synthesis: ~200-800ms per 100 characters
- Total synthesis time: Proportional to text length
Playback Phase:
- Audio validation: ~10-50ms
- File preparation: ~20-100ms
- Playback initiation: ~50-200ms
- Total playback latency: ~80-350ms
Memory Usage Patternsโ
Typical Usage:
- Short text (1-50 words): 1-3MB audio buffer
- Medium text (50-200 words): 3-10MB audio buffer
- Long text (200+ words): 10-30MB audio buffer
Mobile Optimization:
// Mobile-optimized configuration
configure({
connection: { maxConnections: 3 }, // Conservative mobile limits
storage: { maxBufferSize: 8 * 1024 * 1024 }, // 8MB mobile limit
network: { connectionTimeout: 8000 } // Longer timeout for mobile
});
Platform-Specific Considerationsโ
iOS:
- Silent mode handling via
playsInSilentModeIOS - Background limitations (Expo Go restrictions)
- Audio session management via expo-av
Android:
- Audio ducking support via
shouldDuckAndroid - Background execution capabilities
- Earpiece routing options
Web:
- Browser connection limits
- Audio format compatibility
- Manual audio session control
Troubleshooting Common Issuesโ
Synthesis Failuresโ
Network Issues:
// Check network connectivity
if (!navigator.onLine) {
console.log('Device is offline - synthesis will fail');
}
// Configure longer timeouts for slow networks
configure({
network: { connectionTimeout: 15000 }
});
Voice Resolution Issues:
// Verify voice availability
const voices = await Speech.getAvailableVoicesAsync();
const isVoiceAvailable = voices.some(v => v.identifier === 'en-US-AriaNeural');
if (!isVoiceAvailable) {
console.log('Requested voice not available - using default');
}
Memory Issuesโ
Buffer Overflow:
// Monitor memory usage
configure({
storage: {
warningThreshold: 0.6, // Early warning at 60%
maxBufferSize: 8 * 1024 * 1024, // Reduce limit for constrained devices
cleanupInterval: 15000 // More frequent cleanup
}
});
Audio Playback Issuesโ
Platform Audio Configuration:
// Ensure proper audio session setup
configure({
audio: {
autoInitializeAudioSession: true,
platformConfig: {
ios: {
playsInSilentModeIOS: true, // Essential for iOS
interruptionModeIOS: InterruptionModeIOS.DoNotMix
},
android: {
shouldDuckAndroid: true, // Better user experience
interruptionModeAndroid: InterruptionModeAndroid.DoNotMix
}
}
}
});
Integration Examplesโ
React Native Hookโ
import { useState, useCallback } from 'react';
import { Speech } from 'expo-edge-speech';
export function useSpeech() {
const [isSpeaking, setIsSpeaking] = useState(false);
const [error, setError] = useState<Error | null>(null);
const speak = useCallback(async (text: string, options = {}) => {
try {
setError(null);
setIsSpeaking(true);
await Speech.speak(text, {
...options,
onStart: () => {
setIsSpeaking(true);
options.onStart?.();
},
onDone: () => {
setIsSpeaking(false);
options.onDone?.();
},
onError: (err) => {
setError(err);
setIsSpeaking(false);
options.onError?.(err);
}
});
} catch (err) {
setError(err);
setIsSpeaking(false);
}
}, []);
const stop = useCallback(async () => {
await Speech.stop();
setIsSpeaking(false);
}, []);
return { speak, stop, isSpeaking, error };
}
Advanced Configurationโ
// Production configuration with monitoring
configure({
network: {
enableDebugLogging: false, // Disable in production
maxRetries: 3,
connectionTimeout: 8000
},
connection: {
maxConnections: 5,
poolingEnabled: true,
circuitBreaker: {
failureThreshold: 5,
recoveryTimeout: 30000,
testRequestLimit: 3
}
},
storage: {
maxBufferSize: 16 * 1024 * 1024,
cleanupInterval: 30000,
warningThreshold: 0.8
},
voice: {
cacheTTL: 3600000, // 1 hour cache
enableCaching: true
}
});
// Usage with comprehensive error handling
const speakWithHandling = async (text: string) => {
try {
await Speech.speak(text, {
voice: 'en-US-AriaNeural',
rate: 1.0,
onStart: () => console.log('๐ต Started speaking'),
onDone: () => console.log('โ
Completed successfully'),
onError: (error) => console.error('โ Speech error:', error),
onBoundary: (boundary) => {
// Real-time word highlighting
highlightWord(boundary.charIndex, boundary.charLength);
}
});
} catch (error) {
console.error('Failed to initiate speech:', error);
showErrorMessage('Speech synthesis failed. Please try again.');
}
};
For additional information, see:
- API Reference - Complete function documentation
- Configuration Guide - Setup and optimization
- Usage Examples - Practical implementation patterns
- Platform Considerations - Platform-specific requirements
Sequence Diagram (Batch Processing Implementation)โ
sequenceDiagram
actor User
participant Speech as Speech (API Layer)
participant SYN as Synthesizer (Core Layer)
participant SM as StateManager (Core Layer)
participant VS as VoiceService (Services Layer)
participant CM as ConnectionManager (Core Layer)
participant NS as NetworkService (Services Layer)
participant SS as StorageService (Services Layer)
participant AS as AudioService (Services Layer)
User->>Speech: speak(text, options)
Speech->>Speech: initializeServices() (if needed)
Speech->>SYN: speak(text, options)
activate SYN
SYN->>SM: createSynthesisSession(text, options)
activate SM
SM-->>SYN: Promise<SynthesisSession> (authoritativeSession with id & connectionId)
deactivate SM
note right of SYN: authoritativeSession.id is the sessionId
note right of SYN: authoritativeSession.connectionId is the connectionId
SYN->>SM: updateSynthesisSession(authoritativeSession.id, {state: 'Synthesizing'})
SYN->>SYN: processSession(authoritativeSession)
activate SYN
SYN->>VS: resolveVoice(session.options.voice, session.options.language)
activate VS
VS-->>SYN: Promise<EdgeSpeechVoice> (resolvedVoice)
deactivate VS
note right of SYN: Synthesizer generates SSML using generateSSML() from ssmlUtils
SYN->>SYN: generateSSML(session.text, resolvedVoice, options)
note right of SYN: Synthesizer calls ConnectionManager.startSynthesis with SSML and session options
SYN->>CM: startSynthesis(ssml, connectionOptions)
activate CM
CM->>CM: createAndManageConnection(ssml, options)
activate CM
CM->>SS: createConnectionBuffer(connectionId)
activate SS; SS-->>CM: (void); deactivate SS;
CM->>CM: establishNetworkConnection(connectionId, ssml, options)
activate CM
CM->>NS: synthesizeText(ssml, options, clientSessionId, connectionId)
activate NS
note over NS: NetworkService connects to WebSocket and sends SSML request
note over NS: Collects ALL audio chunks before resolving promise
loop For each audio chunk received from Edge TTS WebSocket
NS->>NS: handleBinaryMessage(audioChunk)
note over NS: Stores chunk internally, continues collecting
end
note over NS: All chunks received, boundaries parsed, WebSocket closed
NS-->>CM: Promise<SynthesisResponse> (complete response with all audioChunks and boundaries)
deactivate NS
loop For each audioChunk in response.audioChunks
CM->>SS: addAudioChunk(connectionId, audioChunk)
activate SS; SS-->>CM: (void); deactivate SS;
end
loop For each boundary in response.boundaries
CM->>CM: handleBoundaryEvent(connectionId, boundary)
activate CM
CM->>User: options.onBoundary(boundary) (via options callback)
deactivate CM
end
CM->>CM: streamAudioToService(connectionId)
activate CM
CM->>AS: speak(wrappedOptions, connectionId)
activate AS
note over AS: AudioService gets complete merged audio data from StorageService
AS->>SS: getMergedAudioData(connectionId)
activate SS; SS-->>AS: mergedAudioBuffer; deactivate SS;
AS->>AS: validateEdgeTTSMP3(mergedAudioBuffer)
AS->>AS: createTempAudioFile(mergedAudioBuffer)
AS->>AS: loadAudio(tempFileUri)
AS->>AS: playAudio()
note over AS: User callbacks (onStart, onDone, onError) are triggered by AudioService
AS->>User: options.onStart() (on playback start, if callback defined)
note over AS: Audio plays from complete buffer
alt Playback successful
AS->>User: options.onDone() (on playback finish)
AS->>SM: handleAudioStateChange(AudioPlaybackState.Completed, connectionId)
else Playback error
AS->>User: options.onError(playbackError)
AS->>SM: handleAudioStateChange(AudioPlaybackState.Error, connectionId)
end
AS-->>CM: (playback promise)
deactivate AS
deactivate CM
CM->>User: options.onSynthesisCompleted() (via options callback, if defined)
CM->>SM: updateSynthesisSession(options.clientSessionId, {state: 'Completed'})
alt If synthesis or connection error occurs
CM->>User: options.onError(error) (via options callback)
CM->>SM: updateSynthesisSession(options.clientSessionId, {state: 'Error'})
end
deactivate CM
deactivate CM
CM-->>SYN: Promise<{sessionId, connectionId}> (returnedConnectionInfo)
deactivate CM
deactivate SYN
SYN-->>Speech: Promise<void>
deactivate SYN
Speech-->>User: Promise<void>
Detailed Steps (Batch Processing Implementation)โ
-
UserAppcallsSpeech.speak(text, options). -
Speech.initializeServices(): If not already done, this initializes all core services (Synthesizer,StateManager,ConnectionManager,NetworkService,AudioService,StorageService,VoiceService).Service Initialization Implementation Details:
- Configuration Locking: Sets
SpeechAPI.configurationLocked = trueto prevent configuration changes after initialization - Dependency-Ordered Initialization: Services are initialized in specific dependency order:
StorageService(foundation service, singleton pattern)AudioService(depends on StorageService)NetworkService(depends on StorageService)VoiceService(singleton pattern with configuration)StateManager(depends on all previous services)ConnectionManager(coordinates all services, initialized last)Synthesizer(top-level coordinator, depends on all core services)
- Global Configuration Application: Applies
SpeechAPI.globalConfigto each service during initialization - Error Handling: Initialization failures throw detailed errors preventing incomplete service states
- Lazy Initialization Pattern: Services initialize only on first
speak()call, not at import time
- Configuration Locking: Sets
-
SpeechcallsSynthesizer.speak(text, options). -
SynthesizercallsStateManager.createSynthesisSession(text, options):StateManagergenerates a uniquesessionIdand a uniqueconnectionId.- It stores this new
SynthesisSession(with the generated IDs) in its internal map. - It returns the
Promise<SynthesisSession>(referred to asauthoritativeSession) containing these IDs.
-
Synthesizerupdates the session inStateManager:await stateManager.updateSynthesisSession(authoritativeSession.id, { state: ApplicationState.Synthesizing }). -
Synthesizercalls its internalprocessSession(authoritativeSession):- It passes the
authoritativeSession(which contains the correctidandconnectionIdfromStateManager). Synthesizer.resolveVoice(session.options.voice, session.options.language): Resolves the voice options (user-specified voice ID and language) to a specificEdgeSpeechVoiceobject usingVoiceService.Synthesizergenerates SSML: UsesgenerateSSML(session.text, resolvedVoice, session.options)fromssmlUtilsto create the SSML content with the resolved voice and options.- It calls
ConnectionManager.startSynthesis(ssml, newOptionsObject). ThenewOptionsObjectincludes the session IDs and all user options.
- It passes the
-
ConnectionManager.startSynthesis()processes the batch workflow:- Calls
createAndManageConnection(ssml, options)which creates connection buffer and sets up coordination. - Calls
establishNetworkConnection()which implements the batch processing approach.
Connection Management Implementation Details:
- Circuit Breaker Pattern: Implements three-state circuit breaker (
Closed,Open,HalfOpen) for service reliability- Closed State: Normal operation, requests processed normally
- Open State: Service temporarily unavailable due to repeated failures, requests immediately rejected
- HalfOpen State: Testing recovery by allowing limited test requests
- Connection Pooling and Limits: Enforces
maxConnectionslimit with optional connection queuing- Connection Limits: Prevents resource exhaustion by limiting concurrent connections
- Queue Management: When pooling enabled, queues requests when connection limit reached
- Connection Lifecycle: Manages active connections with proper cleanup and resource management
- Failure Detection and Recovery: Tracks failure patterns and implements recovery mechanisms
- Failure Count Tracking: Monitors consecutive failures for circuit breaker decision making
- Recovery Timeout: Configurable recovery period before attempting service restoration
- Success Count Monitoring: Tracks successful operations for circuit breaker state transitions
- App State Integration: Coordinates connection lifecycle with React Native app state changes
- Background Handling: Manages connections during app backgrounding
- Memory Management: Prevents connection leaks through proper app state subscription cleanup
- Calls
-
ConnectionManager.establishNetworkConnection()implements batch processing:- Calls
NetworkService.synthesizeText()with batch processing:synthesizeText(ssml, options, clientSessionId, connectionId). - Waits for complete response with all audio chunks and boundaries before proceeding.
- Processes complete response by storing all chunks and handling boundaries.
- Calls
-
NetworkService.synthesizeText()with batch processing:- Establishes WebSocket connection and sends SSML request.
- Collects ALL audio chunks internally before resolving the promise.
- Parses boundary events from WebSocket text messages.
- Returns complete
SynthesisResponseonly after all chunks are received and WebSocket closes. - All audio data is included in the response.audioChunks array.
-
Batch audio processing after complete synthesis:
- ConnectionManager receives complete response with all audioChunks and boundaries.
- Stores all chunks: Loops through
response.audioChunksand callsStorageService.addAudioChunk()for each. - Processes all boundaries: Loops through
response.boundariesand triggersonBoundarycallbacks. - Initiates playback: Calls
ConnectionManager.streamAudioToService(connectionId).
-
ConnectionManager.streamAudioToService():- Wraps user callbacks with connection cleanup logic.
- Calls
AudioService.speak(wrappedOptions, connectionId)(notplayStreamedAudio). - AudioService handles complete buffer playback.
-
AudioService.speak()handles final playback:- Gets complete merged audio buffer from
StorageService.getMergedAudioData(connectionId). - Validates MP3 format using
validateEdgeTTSMP3(). - Creates temporary audio file with complete buffer.
- Loads and plays audio using Expo AV (
loadAudio()thenplayAudio()). - Triggers user callbacks (
onStart,onDone,onError) during playback lifecycle.
- Gets complete merged audio buffer from
-
Result: Audio playback begins only after complete synthesis and all chunks are collected and stored.
- The synthesis process is completely finished before any audio playback starts.
- User experience includes a wait time proportional to the text length.
- Memory usage includes storing the complete audio buffer before playback.
Key Characteristics (Batch Processing Implementation):โ
-
Complete Audio Synthesis Before Playback:
- All audio chunks are collected and stored before any playback begins.
- Synthesis process completes entirely before
AudioService.speak()is called. - Provides reliable, consistent audio delivery without interruption.
-
Sequential Processing Architecture:
NetworkServicecollects all chunks internally before returning complete response.ConnectionManagerreceives complete response and stores all chunks sequentially.AudioServiceprocesses complete merged buffer for playback.- No concurrent synthesis and playback operations.
-
Robust Buffer Management:
- Complete audio buffer stored in
StorageServicebefore playback. - Single merged buffer operation ensures data integrity.
- Predictable memory usage patterns with complete buffer allocation.
- Complete audio buffer stored in
-
Maintained API Compatibility:
- All existing user callbacks (
onStart,onDone,onError,onBoundary) work unchanged. - Session management and state tracking remain consistent.
- SSML generation and voice resolution unchanged.
- All existing user callbacks (
-
Service Coordination for Batch Processing:
ConnectionManagercoordinates sequential synthesis then playback workflow.NetworkServiceprovides complete response with all audio data included.AudioServicehandles single playback operation with complete buffer.StorageServicemanages complete buffer storage and retrieval.
-
Reliable Error Handling:
- Complete synthesis validation before playback attempts.
- MP3 format validation on complete buffer.
- Consistent error propagation through service layers.
- Cleanup operations on complete connection data.
Error Recovery and Retry Implementation Patterns:
- Error Classification for Retry Decision Making: Categorizes errors into retryable and non-retryable types
- Network Errors: Connection timeouts, WebSocket failures - typically retryable
- Authentication Errors: Service access issues - typically non-retryable
- Resource Errors: Memory, storage issues - context-dependent retry logic
- Validation Errors: Invalid SSML, audio format issues - non-retryable
- Circuit Breaker Integration with Error Handling: Coordinates error responses with circuit breaker state
- Failure Threshold Tracking: Accumulates failures to trigger circuit breaker state changes
- Recovery Testing: Uses error patterns to determine when to test service recovery
- Automatic State Transitions: Error handling drives circuit breaker state machine
- Resource Cleanup Coordination on Failures: Ensures proper cleanup regardless of failure point
- Connection Resource Cleanup: Terminates WebSocket connections and clears connection buffers
- Audio Resource Cleanup: Stops audio playback and cleans up temporary audio files
- Session State Cleanup: Updates StateManager to reflect error conditions and removes failed sessions
- Retry Logic and Recovery Timeout Mechanisms: Implements sophisticated retry patterns
- Exponential Backoff: Increases retry delays to avoid overwhelming failing services
- Maximum Retry Limits: Prevents infinite retry loops with configurable retry count limits
- Recovery Timeout Coordination: Aligns retry timing with circuit breaker recovery periods
-
Performance Characteristics:
- Complete Synthesis Duration: Time-to-audio includes full text processing time.
- Predictable Memory Usage: Complete audio buffer stored before playback.
- Reliable User Experience: No audio interruption or quality degradation.
- Sequential Processing: Synthesis โ Storage โ Validation โ Playback.
-
Current Implementation Benefits:
- Reliability: Complete synthesis validation before playback.
- Consistency: Predictable timing and memory usage patterns.
- Simplicity: Clear sequential workflow without concurrent complexity.
- Quality: No audio artifacts from partial buffer playback.
This batch processing implementation ensures reliable audio delivery through complete synthesis and validation before playback begins, providing consistent user experience with predictable performance characteristics.