Parakeet TDT - Ultra-Fast ASR

Powered by NVIDIA Parakeet TDT 0.6B on ZeroGPU.

Performance: 3000x faster than Whisper - transcribes 1 hour in ~1 second on GPU!

Capabilities:

  • Word-level timestamps
  • Long audio support (auto-chunking)
  • High accuracy (state-of-the-art on LibriSpeech)

API Endpoints for EagleEye:

  • POST /call/api_transcribe - Full transcription
  • POST /call/api_transcribe_segment - Short segment transcription

API Usage for EagleEye Integration

Full Transcription

from gradio_client import Client

client = Client("Cadayn/parakeet-zerogpu")

result = client.predict(
    audio_url="https://example.com/audio.mp3",
    api_name="/api_transcribe"
)
print(result)
# {"success": True, "text": "...", "words": [{"start_s": 0.0, "end_s": 0.5, "text": "Hello"}, ...]}

Segment Transcription (for streaming)

import base64

with open("segment.wav", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()

result = client.predict(
    audio_base64=audio_b64,
    start_s=30.0,  # Offset for timestamp alignment
    api_name="/api_transcribe_segment"
)
print(result)
# {"success": True, "words": [{"start_s": 30.5, "end_s": 31.0, "text": "word"}, ...]}