Skip to main content

Design Document: AAC Speech Recognition API

Description:
This document provides the complete design for the AAC Board Speech Recognition API, including module/class purposes, data fields, methods, routes, pre/post conditions, parameters, exceptions, and helper utilities.

The API is built with Express.js, supports audio upload and speech recognition via Python backends, and is optimized for Augmentative and Alternative Communication (AAC) devices.


Overview​

This software implements a robust speech-to-text REST API with:

  • Multiple speech-recognition backends (e.g., Google Speech API, offline Vosk)
  • Command mode for AAC command recognition
  • Word-level timing metadata
  • camelCase JSON responses
  • Health and capability endpoints
  • Consent-based logging of audio metadata
  • Fallback and tolerant JSON parsing of Python output

Purpose​

  • Provide a reliable speech recognition endpoint for AAC devices.
  • Expose system health information for device connectivity.
  • Support multiple audio formats and command modes.
  • Serve as a modern, extensible foundation for AAC-focused speech recognition systems.

Modules and Fields​

FieldTypePurpose
expressModuleMain HTTP framework for routing and middleware.
multerModuleParses audio uploads via multipart/form-data.
corsModuleEnables cross-origin requests (important for AAC devices).
fsModuleFile system access for logs and script validation.
spawnModuleExecutes Python speech recognition backend.
pathModuleFile path resolution.
appObjectExpress instance for routes and middleware.
PORTNumberAPI port (8080 default, environment override supported).
uploadMulter InstanceMemory-based upload handler (10MB limit).
LOG_DIRStringDirectory used for daily JSON log files.
SPEECH_SCRIPTStringPath to Python speech recognition script.
SUPPORTED_FORMATSArray<String>Allowed audio formats.
SERVER_START_TIMENumberMillisecond timestamp for uptime calculations.

Middleware​

JSON Parser​

app.use(express.json());

Purpose: Automatically parses incoming JSON request bodies. Pre-condition: Body must be valid JSON when JSON is expected. Post-condition: Parsed JSON becomes available via req.body.


CORS​

app.use(cors());

Purpose: Enables cross-origin access from AAC devices (web apps, tablets, mobile apps).


Request Timing Middleware​

app.use((req, res, next) => {
req.startTime = Date.now();
next();
});

Purpose: Tracks request processing time for diagnostics and response metadata.


File Upload (Multer)​

const upload = multer({
storage: multer.memoryStorage(),
limits: { fileSize: 10 * 1024 * 1024 }
});
  • Files stored in memory, not on disk.
  • Maximum size: 10MB.
  • Used only for the /upload endpoint.

Exceptions:

  • LIMIT_FILE_SIZE handled by global error handler → returns 413.
  • Missing file handled manually → returns 400.

Helper Functions​

parseUserAgent(ua)​

Extracts browser + device info for diagnostics.

detectAudioFormat(filename)​

Returns audio format from file extension (default "WAV").

logRequest(data, consentGiven)​

Writes request metadata to a daily JSON log file when allowed.

buildSuccessResponse(params)​

Standard camelCase success response wrapper.

buildErrorResponse(params)​

Standard camelCase error response wrapper.

parsePythonOutput(stdout)​

Parses JSON output from Python script. Supports both camelCase and old PascalCase formats.


Server Initialization​

app.listen(PORT, () => {
// Styled console startup banner
});

Purpose: Starts the Express.js server and prints endpoint help. Exceptions:

  • EADDRINUSE if another process occupies the port.

Routes and Methods


GET /health

Purpose​

System health, supported formats, uptime, and status of Python backend. Designed for AAC device connectivity checks.

Success Response Example​

{
"status": "ok",
"timestamp": "2025-01-01T00:00:00Z",
"uptime": 1234,
"uptimeFormatted": "0h 20m 34s",
"version": "2.0.0",
"services": {
"speechRecognition": true,
"logging": true
},
"supportedFormats": ["WAV","MP3","FLAC","AIFF","OGG","M4A","RAW","PCM"],
"endpoints": {
"health": "/health",
"upload": "/upload",
"formats": "/formats"
}
}

Exceptions: None.


GET /formats

Purpose​

Returns audio formats supported by the API and recommended settings for lowest latency.

Success Response​

{
"supportedFormats": [...],
"optimal": {
"format": "WAV",
"sampleRate": 16000,
"bitDepth": 16,
"channels": 1
},
"notes": [
"WAV format recommended for lowest latency",
"16kHz sample rate optimal",
"Mono audio preferred",
"Raw PCM supported with x-sample-rate header"
]
}

Exceptions: None.


POST /upload

Purpose​

Uploads audio for AAC-optimized speech recognition and returns:

  • Transcription
  • Confidence values
  • Processing metadata
  • AAC command detection
  • Word timing array
  • User & device metadata
  • Audio metadata

Headers​

HeaderPurpose
x-user-idOptional user ID
x-session-idFallback ID
x-logging-consent"true" → enable request logging
x-command-mode"true" → AAC command recognition
x-sample-rateFor RAW/PCM audio only

Form Fields (multipart/form-data)​

FieldTypeRequiredDescription
audioFileFile✔Audio data
commandModeBool✖Alternative to header
userIdString✖Alternative to header

Successful Response Example​

{
"success": true,
"transcription": "hello world",
"confidence": 0.92,
"service": "Google",
"processingTimeMs": 320,
"audio": {
"filename": "speech.wav",
"size": 10240,
"format": "WAV",
"duration": 1.23,
"sampleRate": 16000,
"channels": 1
},
"request": {
"timestamp": "2025-01-01T00:00:00Z",
"device": "Mobile",
"browser": "Chrome",
"userAgent": "Mozilla/5.0 ..."
},
"aac": {
"commandMode": true,
"commandType": "navigation",
"isCommand": true
},
"wordTiming": [
{ "word": "hello", "start": 0.0, "end": 0.4 },
{ "word": "world", "start": 0.5, "end": 0.9 }
]
}

Error Responses​

No file uploaded (400)​

{
"success": false,
"error": {
"code": "NO_FILE",
"message": "No audio file uploaded"
}
}

Python backend error (422)​

{
"success": false,
"error": {
"code": "AUDIO_ERROR",
"message": "Could not read audio"
}
}

File too large (413)​

{
"success": false,
"error": {
"code": "FILE_TOO_LARGE",
"message": "Audio file exceeds maximum size (10MB)"
}
}

Internal server error (500)​

{
"success": false,
"error": {
"code": "INTERNAL_ERROR",
"message": "An unexpected error occurred"
}
}

Status Codes​

StatusMeaning
200Successful recognition
400Missing/invalid inputs
413File too large
422Python-recognition error
500Server/internal failure

404 Handler

Response​

{
"success": false,
"error": {
"code": "NOT_FOUND",
"message": "Endpoint GET /foobar not found"
},
"availableEndpoints": {
"GET /health": "Health check and status",
"GET /formats": "Supported audio formats",
"POST /upload": "Transcription endpoint"
}
}

Global Error Handler

Example: File too large​

{
"success": false,
"error": {
"code": "FILE_TOO_LARGE",
"message": "Audio file exceeds maximum size (10MB)"
}
}

Example: Unexpected error​

{
"success": false,
"error": {
"code": "INTERNAL_ERROR",
"message": "An unexpected error occurred"
}
}

Summary Table​

MethodEndpointDescriptionSuccess ResponseError Response
GET/healthReturns server status & uptimeStatus metadataN/A
GET/formatsLists supported audio formatsFormat listN/A
POST/uploadAudio → transcriptionFull transcription response400, 413, 422, 500

Notes​

  • Run from this directory with:

    node .
  • Test suite:

    npm test
  • Logging only occurs with consent (x-logging-consent: true).

  • Python backend (speechRecognition.py) must exist and be executable.

  • Recommended audio settings: WAV, 16kHz, mono.


End of Document