Skip to main content

Project 002 Aac Api

Connecting AAC Devices to the world of Games

AAC Board Speech Recognition API

A speech-to-text API designed for AAC (Augmentative and Alternative Communication) devices and applications. Optimized for low-latency voice command recognition to help developers integrate voice controls into games, apps, and assistive technologies.

License: MIT Node.js Python


Features

  • Multiple Recognition Engines - Google Speech Recognition + Vosk offline fallback
  • Command Mode - Optimized for short AAC commands with faster response times from AAC devices
  • Confidence Scoring - Filter low-confidence recognitions
  • Word-Level Timing - Get start/end times for each recognized word
  • Standardized JSON Responses - Consistent camelCase API format
  • Request Logging - Optional consent-based logging for analytics
  • Game Integration Ready - Drop-in JavaScript module included
  • Privacy Focused - Offline recognition available, logging requires consent

Table of Contents


Quick Start - How to quickly run our API

# Clone the repository
git clone https://github.com/yourusername/aac-board-api.git
cd aac-board-api

# Install dependencies
npm install
pip install SpeechRecognition vosk numpy scipy --break-system-packages

# Start the server
node index.js

The API is now running at http://localhost:8080

Test it:

# Health check
curl http://localhost:8080/health

# Upload audio for transcription
curl -X POST http://localhost:8080/upload \
-F "audioFile=@your-audio.wav"

Installation

Prerequisites

RequirementVersionDownload
Node.js16+nodejs.org
Python3.8+python.org
npm8+Included with Node.js

Step 1: Clone Repository

git clone https://github.com/yourusername/aac-board-api.git
cd aac-board-api

Step 2: Install Node.js Dependencies

npm install

Step 3: Install Python Dependencies

# Standard installation
pip install SpeechRecognition vosk numpy scipy

# If you get externally-managed-environment error (Python 3.11+)
pip install SpeechRecognition vosk numpy scipy --break-system-packages

# Or use a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install SpeechRecognition vosk numpy scipy

Step 4: Download Vosk Model (Optional)

If Vosk failes to compile from python installation, an alternative method is to download the model directly into your system and the unzip within the project folder. This also enables it to work offline if internet access is a major concern.

For offline speech recognition:

mkdir -p model && cd model
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
cd ..

Other models available at alphacephei.com/vosk/models

Step 5: Start the Server

node index.js

📡 API Reference

Endpoints

MethodEndpointDescription
GET/healthServer health and status
GET/formatsSupported audio formats
POST/uploadUpload audio for transcription

GET /health

Returns server status, uptime, and service availability.

Response:

{
"status": "ok",
"timestamp": "2025-01-15T10:30:00.000Z",
"uptime": 3600,
"uptimeFormatted": "1h 0m 0s",
"version": "2.0.0",
"services": {
"speechRecognition": true,
"logging": true
},
"supportedFormats": ["WAV", "MP3", "FLAC", "OGG", "M4A"],
"endpoints": {
"health": "/health",
"upload": "/upload",
"formats": "/formats"
}
}

GET /formats

Returns supported audio formats and optimal settings.

Response:

{
"supportedFormats": ["WAV", "MP3", "FLAC", "AIFF", "OGG", "M4A", "RAW", "PCM"],
"optimal": {
"format": "WAV",
"sampleRate": 16000,
"bitDepth": 16,
"channels": 1
},
"notes": [
"WAV format recommended for lowest latency",
"16kHz sample rate optimal for speech recognition",
"Mono audio preferred (stereo will be converted)"
]
}

POST /upload

Upload an audio file for speech-to-text transcription.

Request:

  • Content-Type: multipart/form-data
  • Body: audioFile - The audio file to transcribe

Headers (Optional):

HeaderDescription
x-command-modeSet to "true" for AAC command optimization
x-user-idUser identifier for logging
x-session-idSession identifier (fallback for user-id)
x-logging-consentSet to "true" to enable server-side logging

Example Request:

curl -X POST http://localhost:8080/upload \
-H "x-command-mode: true" \
-H "x-user-id: user123" \
-F "audioFile=@recording.wav"

Success Response (200):

{
"success": true,
"transcription": "hello world",
"confidence": 0.92,
"service": "vosk",
"processingTimeMs": 245,
"audio": {
"filename": "recording.wav",
"size": 32000,
"sizeBytes": 32000,
"format": "WAV",
"duration": 1.5,
"sampleRate": 16000,
"channels": 1,
"mimeType": "audio/wav"
},
"request": {
"timestamp": "2025-01-15T10:30:00.000Z",
"device": "Desktop",
"browser": "Chrome",
"userAgent": "Mozilla/5.0..."
},
"aac": {
"commandMode": true,
"commandType": "communication",
"isCommand": true,
"suggestedActions": ["send_message", "repeat", "edit"]
},
"wordTiming": [
{ "word": "hello", "startTime": 0.12, "endTime": 0.45, "confidence": 0.94 },
{ "word": "world", "startTime": 0.48, "endTime": 0.82, "confidence": 0.90 }
]
}

Error Response (4xx/5xx):

{
"success": false,
"transcription": null,
"processingTimeMs": 150,
"error": {
"code": "AUDIO_QUALITY_ISSUES",
"message": "Audio appears silent or nearly silent",
"details": [
{ "service": "google", "error": "Could not understand audio" },
{ "service": "vosk", "error": "No speech detected" }
]
},
"request": {
"timestamp": "2025-01-15T10:30:00.000Z",
"device": "Desktop",
"browser": "Chrome"
},
"warnings": ["Audio volume is low"]
}

Response Format

All responses use camelCase keys for consistency.

Success Response Structure

FieldTypeDescription
successbooleanWhether transcription succeeded
transcriptionstringRecognized text
confidencenumberRecognition confidence (0-1)
servicestringRecognition service used (google, vosk)
processingTimeMsnumberProcessing time in milliseconds
audioobjectAudio file metadata
requestobjectRequest metadata
aacobjectAAC-specific information
wordTimingarrayWord-level timing (when available)
userobjectUser identifier (if provided)
warningsarrayNon-fatal warnings

AAC Object

FieldTypeDescription
commandModebooleanWhether command mode was enabled
commandTypestringClassified command type
isCommandbooleanWhether recognized text is a known command
suggestedActionsarraySuggested follow-up actions

Command Types:

  • navigation - back, next, up, down, etc.
  • selection - select, choose, yes, no, etc.
  • communication - hello, thank you, help, etc.
  • media - play, pause, stop, etc.
  • freeform - Unclassified speech

Game Integration

We provide a drop-in JavaScript module for easy game integration.

Quick Integration

<script type="module">
import { AACGameController } from './aac-voice-control.js';

const voice = new AACGameController({
apiUrl: 'http://localhost:8080',
commandMode: true
});

// Map voice commands to game actions
voice.mapCommand(['jump', 'hop'], () => player.jump());
voice.mapCommand(['left', 'go left'], () => player.moveLeft());
voice.mapCommand(['fire', 'shoot'], () => player.attack());

// Or use common command mappings
voice.mapCommonCommands({
up: () => player.moveUp(),
down: () => player.moveDown(),
select: () => game.select(),
pause: () => game.pause()
});

// Start listening
voice.start();
</script>

Module Features

  • Continuous and single-shot listening modes
  • Multi-phrase command mapping
  • Confidence thresholds
  • Built-in UI panel (optional)
  • Event-based architecture

See aac-voice-control.js for full documentation.


Configuration

Environment Variables

VariableDefaultDescription
PORT8080Server port
VOSK_MODEL_PATHmodel/vosk-model-small-en-us-0.15Path to Vosk model
AAC_COMMAND_MODEfalseEnable command mode by default
PRELOAD_VOSKtruePreload Vosk model on startup
NODE_ENVdevelopmentEnvironment (production disables auto-consent)

Example:

PORT=3000 VOSK_MODEL_PATH=./my-model node index.js

Command Mode

Enable for AAC devices to optimize for short commands:

# Via header
curl -H "x-command-mode: true" ...

# Via environment
AAC_COMMAND_MODE=true node index.js

Benefits:

  • Faster recognition for short phrases
  • Limited vocabulary reduces errors
  • Optimized for common AAC commands

Testing

Run API Tests

npm test

Run Python Tests

python test.py                    # Run all tests
python test.py --audio file.wav # Test specific file
python test.py --record # Record from microphone
python test.py --command-mode # Test with command mode

Manual Testing

# Health check
curl http://localhost:8080/health

# Upload test audio
curl -X POST http://localhost:8080/upload \
-F "audioFile=@tests/TestRecording.wav"

# With command mode
curl -X POST http://localhost:8080/upload \
-H "x-command-mode: true" \
-F "audioFile=@tests/TestRecording.wav"

Examples

Basic Transcription (Node.js)

const FormData = require('form-data');
const fs = require('fs');
const fetch = require('node-fetch');

async function transcribe(audioPath) {
const form = new FormData();
form.append('audioFile', fs.createReadStream(audioPath));

const response = await fetch('http://localhost:8080/upload', {
method: 'POST',
body: form
});

const result = await response.json();

if (result.success) {
console.log('Transcription:', result.transcription);
console.log('Confidence:', result.confidence);
} else {
console.error('Error:', result.error.message);
}
}

transcribe('recording.wav');

Browser Integration

async function recordAndTranscribe() {
// Get microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const mediaRecorder = new MediaRecorder(stream);
const chunks = [];

mediaRecorder.ondataavailable = (e) => chunks.push(e.data);

mediaRecorder.onstop = async () => {
const blob = new Blob(chunks, { type: 'audio/webm' });
const formData = new FormData();
formData.append('audioFile', blob, 'recording.webm');

const response = await fetch('http://localhost:8080/upload', {
method: 'POST',
headers: { 'x-command-mode': 'true' },
body: formData
});

const result = await response.json();
console.log(result.transcription);
};

// Record for 3 seconds
mediaRecorder.start();
setTimeout(() => mediaRecorder.stop(), 3000);
}

Python Client

import requests

def transcribe(audio_path, command_mode=False):
url = 'http://localhost:8080/upload'

headers = {}
if command_mode:
headers['x-command-mode'] = 'true'

with open(audio_path, 'rb') as f:
files = {'audioFile': f}
response = requests.post(url, files=files, headers=headers)

result = response.json()

if result['success']:
print(f"Transcription: {result['transcription']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Service: {result['service']}")
else:
print(f"Error: {result['error']['message']}")

return result

# Usage
transcribe('recording.wav', command_mode=True)

Troubleshooting

Common Issues

EADDRINUSE: Port already in use
# Find process using port
lsof -i :8080

# Kill it
kill -9 <PID>

# Or use different port
PORT=8081 node index.js
Python module not found
# Install missing module
pip install SpeechRecognition --break-system-packages

# Or use virtual environment
python -m venv venv
source venv/bin/activate
pip install SpeechRecognition vosk numpy scipy
Vosk model not found
# Download and extract model
mkdir -p model && cd model
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
Low recognition accuracy
  1. Enable command mode: -H "x-command-mode: true"
  2. Use WAV format at 16kHz mono
  3. Reduce background noise
  4. Speak clearly at moderate pace
  5. Try the Vosk offline model
CORS errors in browser

The API includes CORS support. If issues persist:

// Ensure you're using the correct URL
const API_URL = 'http://localhost:8080'; // Not 127.0.0.1

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Development Setup

# Clone your fork
git clone https://github.com/yourusername/aac-board-api.git
cd aac-board-api

# Install dependencies
npm install
pip install -r requirements.txt

# Run tests
npm test
python test.py

# Start in development mode
npm run dev

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

  • SpeechRecognition - Python speech recognition library
  • Vosk - Offline speech recognition
  • Express.js - Web framework for Node.js
  • Lily Ulrey - For creation of our project logo for the website.

Contact


Made with ❤️ for the AAC community

Contributors

Made with contrib.rocks.