TSP Product Sports Tech / Computer Vision

Computer Vision Pipeline for Competitive Tennis

How Tech Stack Playbook engineered a multi-stage computer vision pipeline that transforms match footage into biomechanical analysis — pose estimation, distance calibration, shot classification, and annotated video.

33-Point
Pose Estimation
6-Stage
Processing Pipeline
Real-World
Distance Calibration
30+ FPS
Apple Silicon

Overview

A software-only pipeline that runs on a laptop against standard phone video — transforming footage into the kind of biomechanical analysis that previously required a sports science lab. Detect the player, estimate 33 body landmarks in real-time, calibrate to real-world distances, classify shot types and phases, and render annotated video output.

Born from a real need: analyzing serve mechanics and forehand technique during a competitive tennis comeback. Every feature exists because it solved an actual training problem.

The Gap in Tennis Analytics

Competitive players invest in coaching, but the feedback loop between playing and improving is limited by real-time observation. No accessible tool converts standard video into quantitative biomechanical analysis.

  • Human coaching can't measure joint angles or track movement in centimeters
  • Slow-motion replay provides no measurement, annotation, or structured data
  • Commercial motion capture requires $10K+ studios and reflective markers
  • Consumer apps like SwingVision offer shot counting but no true biomechanical analysis
  • No tool produces skeleton tracking, distance measurements, and shot phase classification from phone video

Six-Stage Pipeline

Each stage builds on the output of the previous — modular architecture enabling independent development, testing, and improvement.

01
Video Ingestion
OpenCV + FFmpeg handling standard formats from phones and coaching cameras, optimized for Apple Silicon.
02
Person Detection (YOLOv8)
Real-time detection with ByteTrack identity persistence, multi-person isolation, and simultaneous racket detection.
03
Pose Estimation (MediaPipe)
33 body landmarks per frame including hands and feet — 2x the detail of OpenPose's 17 keypoints.
04
Real-World Calibration
Height-based pixel-to-centimeter conversion with dynamic per-frame recalibration as the player moves.
05
Shot Classification
Serve, forehand, backhand detection with phase tagging: preparation, forward swing, contact zone, follow-through.
06
Annotated Video Output
Full skeleton overlay, joint dots, bounding boxes, distance labels, shot/phase labels — H.264 MP4 via FFmpeg.
Drawing a skeleton on a video is a demo. Measuring that your stance was 3 cm narrower on your forehand today versus last week — that's analysis.

Outcomes & Business Impact

33-Point Skeleton Tracking Full-body pose estimation with hands and feet detail essential for tennis biomechanics.
Real-World Measurements Joint distances in centimeters, not pixels — calibrated per-frame for camera distance changes.
Shot Phase Analysis Compare specific moments across sessions — trophy position, contact point, follow-through.
No Special Hardware Standard phone or camera video processed on a laptop — no lab, no sensors, no markers.

Technologies Used

YOLOv8 MediaPipe BlazePose OpenCV FFmpeg ByteTrack Python Apple Silicon H.264