AAC Board Speech Recognition API

A speech-to-text API designed for AAC (Augmentative and Alternative Communication) devices and applications. Optimized for low-latency voice command recognition to help developers integrate voice controls into games, apps, and assistive technologies.

Features

Multiple Recognition Engines - Google Speech Recognition + Vosk offline fallback
Command Mode - Optimized for short AAC commands with faster response times from AAC devices
Confidence Scoring - Filter low-confidence recognitions
Word-Level Timing - Get start/end times for each recognized word
Standardized JSON Responses - Consistent camelCase API format
Request Logging - Optional consent-based logging for analytics
Game Integration Ready - Drop-in JavaScript module included
Privacy Focused - Offline recognition available, logging requires consent

Quick Start
Installation
API Reference
Response Format
Game Integration
Configuration
Testing
Examples
Troubleshooting
Contributing
License

Quick Start - How to quickly run our API

# Clone the repository
git clone https://github.com/yourusername/aac-board-api.git
cd aac-board-api

# Install dependencies
npm install
pip install SpeechRecognition vosk numpy scipy --break-system-packages

# Start the server
node index.js

The API is now running at http://localhost:8080

Test it:

# Health check
curl http://localhost:8080/health

# Upload audio for transcription
curl -X POST http://localhost:8080/upload \
  -F "audioFile=@your-audio.wav"

Installation

Prerequisites

Requirement	Version	Download
Node.js	16+	nodejs.org
Python	3.8+	python.org
npm	8+	Included with Node.js

Step 1: Clone Repository

git clone https://github.com/yourusername/aac-board-api.git
cd aac-board-api

Step 2: Install Node.js Dependencies

npm install

Step 3: Install Python Dependencies

# Standard installation
pip install SpeechRecognition vosk numpy scipy

# If you get externally-managed-environment error (Python 3.11+)
pip install SpeechRecognition vosk numpy scipy --break-system-packages

# Or use a virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install SpeechRecognition vosk numpy scipy

Step 4: Download Vosk Model (Optional)

If Vosk failes to compile from python installation, an alternative method is to download the model directly into your system and the unzip within the project folder. This also enables it to work offline if internet access is a major concern.

For offline speech recognition:

mkdir -p model && cd model
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
cd ..

Other models available at alphacephei.com/vosk/models

Step 5: Start the Server

node index.js

📡 API Reference

Endpoints

Method	Endpoint	Description
`GET`	`/health`	Server health and status
`GET`	`/formats`	Supported audio formats
`POST`	`/upload`	Upload audio for transcription

GET `/health`

Returns server status, uptime, and service availability.

Response:

{
  "status": "ok",
  "timestamp": "2025-01-15T10:30:00.000Z",
  "uptime": 3600,
  "uptimeFormatted": "1h 0m 0s",
  "version": "2.0.0",
  "services": {
    "speechRecognition": true,
    "logging": true
  },
  "supportedFormats": ["WAV", "MP3", "FLAC", "OGG", "M4A"],
  "endpoints": {
    "health": "/health",
    "upload": "/upload",
    "formats": "/formats"
  }
}

GET `/formats`

Returns supported audio formats and optimal settings.

Response:

{
  "supportedFormats": ["WAV", "MP3", "FLAC", "AIFF", "OGG", "M4A", "RAW", "PCM"],
  "optimal": {
    "format": "WAV",
    "sampleRate": 16000,
    "bitDepth": 16,
    "channels": 1
  },
  "notes": [
    "WAV format recommended for lowest latency",
    "16kHz sample rate optimal for speech recognition",
    "Mono audio preferred (stereo will be converted)"
  ]
}

POST `/upload`

Upload an audio file for speech-to-text transcription.

Request:

Content-Type: multipart/form-data
Body: audioFile - The audio file to transcribe

Headers (Optional):

Header	Description
`x-command-mode`	Set to `"true"` for AAC command optimization
`x-user-id`	User identifier for logging
`x-session-id`	Session identifier (fallback for user-id)
`x-logging-consent`	Set to `"true"` to enable server-side logging

Example Request:

curl -X POST http://localhost:8080/upload \
  -H "x-command-mode: true" \
  -H "x-user-id: user123" \
  -F "audioFile=@recording.wav"

Success Response (200):

{
  "success": true,
  "transcription": "hello world",
  "confidence": 0.92,
  "service": "vosk",
  "processingTimeMs": 245,
  "audio": {
    "filename": "recording.wav",
    "size": 32000,
    "sizeBytes": 32000,
    "format": "WAV",
    "duration": 1.5,
    "sampleRate": 16000,
    "channels": 1,
    "mimeType": "audio/wav"
  },
  "request": {
    "timestamp": "2025-01-15T10:30:00.000Z",
    "device": "Desktop",
    "browser": "Chrome",
    "userAgent": "Mozilla/5.0..."
  },
  "aac": {
    "commandMode": true,
    "commandType": "communication",
    "isCommand": true,
    "suggestedActions": ["send_message", "repeat", "edit"]
  },
  "wordTiming": [
    { "word": "hello", "startTime": 0.12, "endTime": 0.45, "confidence": 0.94 },
    { "word": "world", "startTime": 0.48, "endTime": 0.82, "confidence": 0.90 }
  ]
}

Error Response (4xx/5xx):

{
  "success": false,
  "transcription": null,
  "processingTimeMs": 150,
  "error": {
    "code": "AUDIO_QUALITY_ISSUES",
    "message": "Audio appears silent or nearly silent",
    "details": [
      { "service": "google", "error": "Could not understand audio" },
      { "service": "vosk", "error": "No speech detected" }
    ]
  },
  "request": {
    "timestamp": "2025-01-15T10:30:00.000Z",
    "device": "Desktop",
    "browser": "Chrome"
  },
  "warnings": ["Audio volume is low"]
}

Response Format

All responses use camelCase keys for consistency.

Success Response Structure

Field	Type	Description
`success`	boolean	Whether transcription succeeded
`transcription`	string	Recognized text
`confidence`	number	Recognition confidence (0-1)
`service`	string	Recognition service used (`google`, `vosk`)
`processingTimeMs`	number	Processing time in milliseconds
`audio`	object	Audio file metadata
`request`	object	Request metadata
`aac`	object	AAC-specific information
`wordTiming`	array	Word-level timing (when available)
`user`	object	User identifier (if provided)
`warnings`	array	Non-fatal warnings

AAC Object

Field	Type	Description
`commandMode`	boolean	Whether command mode was enabled
`commandType`	string	Classified command type
`isCommand`	boolean	Whether recognized text is a known command
`suggestedActions`	array	Suggested follow-up actions

Command Types:

navigation - back, next, up, down, etc.
selection - select, choose, yes, no, etc.
communication - hello, thank you, help, etc.
media - play, pause, stop, etc.
freeform - Unclassified speech

Game Integration

We provide a drop-in JavaScript module for easy game integration.

Quick Integration

<script type="module">
  import { AACGameController } from './aac-voice-control.js';

  const voice = new AACGameController({
    apiUrl: 'http://localhost:8080',
    commandMode: true
  });

  // Map voice commands to game actions
  voice.mapCommand(['jump', 'hop'], () => player.jump());
  voice.mapCommand(['left', 'go left'], () => player.moveLeft());
  voice.mapCommand(['fire', 'shoot'], () => player.attack());

  // Or use common command mappings
  voice.mapCommonCommands({
    up: () => player.moveUp(),
    down: () => player.moveDown(),
    select: () => game.select(),
    pause: () => game.pause()
  });

  // Start listening
  voice.start();
</script>

Module Features

Continuous and single-shot listening modes
Multi-phrase command mapping
Confidence thresholds
Built-in UI panel (optional)
Event-based architecture

See aac-voice-control.js for full documentation.

Configuration

Environment Variables

Variable	Default	Description
`PORT`	`8080`	Server port
`VOSK_MODEL_PATH`	`model/vosk-model-small-en-us-0.15`	Path to Vosk model
`AAC_COMMAND_MODE`	`false`	Enable command mode by default
`PRELOAD_VOSK`	`true`	Preload Vosk model on startup
`NODE_ENV`	`development`	Environment (`production` disables auto-consent)

Example:

PORT=3000 VOSK_MODEL_PATH=./my-model node index.js

Command Mode

Enable for AAC devices to optimize for short commands:

# Via header
curl -H "x-command-mode: true" ...

# Via environment
AAC_COMMAND_MODE=true node index.js

Benefits:

Faster recognition for short phrases
Limited vocabulary reduces errors
Optimized for common AAC commands

Testing

Run API Tests

npm test

Run Python Tests

python test.py                    # Run all tests
python test.py --audio file.wav   # Test specific file
python test.py --record           # Record from microphone
python test.py --command-mode     # Test with command mode

Manual Testing

# Health check
curl http://localhost:8080/health

# Upload test audio
curl -X POST http://localhost:8080/upload \
  -F "audioFile=@tests/TestRecording.wav"

# With command mode
curl -X POST http://localhost:8080/upload \
  -H "x-command-mode: true" \
  -F "audioFile=@tests/TestRecording.wav"

Examples

Basic Transcription (Node.js)

const FormData = require('form-data');
const fs = require('fs');
const fetch = require('node-fetch');

async function transcribe(audioPath) {
  const form = new FormData();
  form.append('audioFile', fs.createReadStream(audioPath));

  const response = await fetch('http://localhost:8080/upload', {
    method: 'POST',
    body: form
  });

  const result = await response.json();
  
  if (result.success) {
    console.log('Transcription:', result.transcription);
    console.log('Confidence:', result.confidence);
  } else {
    console.error('Error:', result.error.message);
  }
}

transcribe('recording.wav');

Browser Integration

async function recordAndTranscribe() {
  // Get microphone access
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const mediaRecorder = new MediaRecorder(stream);
  const chunks = [];

  mediaRecorder.ondataavailable = (e) => chunks.push(e.data);
  
  mediaRecorder.onstop = async () => {
    const blob = new Blob(chunks, { type: 'audio/webm' });
    const formData = new FormData();
    formData.append('audioFile', blob, 'recording.webm');

    const response = await fetch('http://localhost:8080/upload', {
      method: 'POST',
      headers: { 'x-command-mode': 'true' },
      body: formData
    });

    const result = await response.json();
    console.log(result.transcription);
  };

  // Record for 3 seconds
  mediaRecorder.start();
  setTimeout(() => mediaRecorder.stop(), 3000);
}

Python Client

import requests

def transcribe(audio_path, command_mode=False):
    url = 'http://localhost:8080/upload'
    
    headers = {}
    if command_mode:
        headers['x-command-mode'] = 'true'
    
    with open(audio_path, 'rb') as f:
        files = {'audioFile': f}
        response = requests.post(url, files=files, headers=headers)
    
    result = response.json()
    
    if result['success']:
        print(f"Transcription: {result['transcription']}")
        print(f"Confidence: {result['confidence']:.1%}")
        print(f"Service: {result['service']}")
    else:
        print(f"Error: {result['error']['message']}")
    
    return result

# Usage
transcribe('recording.wav', command_mode=True)

Troubleshooting

Common Issues

EADDRINUSE: Port already in use

# Find process using port
lsof -i :8080

# Kill it
kill -9 <PID>

# Or use different port
PORT=8081 node index.js

Python module not found

# Install missing module
pip install SpeechRecognition --break-system-packages

# Or use virtual environment
python -m venv venv
source venv/bin/activate
pip install SpeechRecognition vosk numpy scipy

Vosk model not found

# Download and extract model
mkdir -p model && cd model
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip

Low recognition accuracy

Enable command mode: -H "x-command-mode: true"
Use WAV format at 16kHz mono
Reduce background noise
Speak clearly at moderate pace
Try the Vosk offline model

CORS errors in browser

The API includes CORS support. If issues persist:

// Ensure you're using the correct URL
const API_URL = 'http://localhost:8080';  // Not 127.0.0.1

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Development Setup

# Clone your fork
git clone https://github.com/yourusername/aac-board-api.git
cd aac-board-api

# Install dependencies
npm install
pip install -r requirements.txt

# Run tests
npm test
python test.py

# Start in development mode
npm run dev

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

SpeechRecognition - Python speech recognition library
Vosk - Offline speech recognition
Express.js - Web framework for Node.js
Lily Ulrey - For creation of our project logo for the website.

Contact

Issues: GitHub Issues
Discussions: GitHub Discussions

Made with ❤️ for the AAC community

Made with contrib.rocks.

Features​

Table of Contents​

Quick Start - How to quickly run our API​

Test it:​

Installation​

Prerequisites​

Step 1: Clone Repository​

Step 2: Install Node.js Dependencies​

Step 3: Install Python Dependencies​

Step 4: Download Vosk Model (Optional)​

Step 5: Start the Server​

📡 API Reference​

Endpoints​

GET /health​

GET /formats​

POST /upload​

Response Format​

Success Response Structure​

AAC Object​

Game Integration​

Quick Integration​

Module Features​

Configuration​

Environment Variables​

Command Mode​

Testing​

Run API Tests​

Run Python Tests​

Manual Testing​

Examples​

Basic Transcription (Node.js)​

Browser Integration​

Python Client​

Troubleshooting​

Common Issues​

Contributing​

Development Setup​

License​

Acknowledgments​

Contact​