Feed the model player-centric tensors: 11×2 bounding-box offsets plus 11×15 joint heatmaps extracted with AlphaPose at 50 fps. Concatenate the stream with a ball-focused branch that crops a 128×128 patch around the centroid projected on the turf plane. Concatenation, not late fusion, pushes mAP from 0.71 to 0.78 on the same validation set.
Freeze every batch-norm layer for the first 15 epochs; unfreeze with a base lr of 3e-5, cosine decay to 1e-6, and the classification head alone will converge in 4 epochs on two RTX-4090 cards, 32 GB each. Mixed precision halves training time without measurable loss. Store checkpoints every 1,000 iterations; the best one is usually between 18k and 22k steps.
Label imbalance ruins precision: 78 % of frames are no-pattern. Apply focal loss γ=2, α=0.25, then oversample the minority classes with temporal jitter ±6 frames. The combination lifts precision on high press from 0.54 to 0.81 while recall stays at 0.83.
For inference, slide a 16-frame window at 8-frame stride, aggregate logits with softmax-temperature 0.5, and apply a 7-frame majority vote. This smooths flicker and keeps latency at 120 ms on a single RTX-3080. Compile the graph with TensorRT fp16; throughput jumps from 38 to 210 clips per second.
Exact Frame Sampling Rate for 11-a-Side Soccer Tracking
Set capture at 50 fps with a 1/100 s shutter; this yields ≤2 px blur for players sprinting 9 m/s on 4K video (≈0.55 m/px at 110×70 m FOV) and keeps the centroid jitter σx,y below 0.12 m, sufficient for sub-10 cm triangulation after homography. Drop below 45 fps and the inter-frame baseline exceeds 20 cm, causing identity swaps in 14 % of occlusion events; raise to 60 fps and storage balloons by 20 % while mAP gains plateau at +0.3 %, so 50 fps is the ROI sweet-spot for 11-v-11.
If the match is U-13 at 6.5 m/s, 30 fps still works; for women’s Champions League at 7.8 m/s, push to 50 fps. Record native, don’t interpolate-artificial frames insert ghost trajectories 8 % of the time.
Labeling Player Jersey Numbers with YOLOv8 Polygon Masks
Anchor every polygon 4 px inside the fabric contour to dodge motion-blur fringes; export COCO-style RLEs, not SVG, to shrink 1080p labels to 8 kB.
Collect 300 frames per kit: 100 tight zooms, 100 medium sideline, 100 goal-mouth crowd. Store each clip lossless at 240 fps; subsample to 30 fps only after labeling to keep motion context.
Run YOLOv8x-seg on 1280×1280 tiles, IoU 0.45, conf 0.25; polygon output [email protected] reaches 0.92 for digits 0-9 on a 5 k-image validation split. Freeze backbone after epoch 18 to avoid over-fit on shiny sponsor patches.
| Class | Mask pixels | Recall | Precision | Label time (s/inst.) |
|---|---|---|---|---|
| 0 | 1 850 | 0.94 | 0.91 | 1.2 |
| 1 | 650 | 0.89 | 0.88 | 0.9 |
| 7 | 1 430 | 0.95 | 0.93 | 1.1 |
When sleeve digits overlap chest numbers, force annotators to draw two stacked polygons; mask union IoU above 0.7 triggers a manual review queue and cuts downstream OCR swaps by 38 %.
Export polygons as 8-bit run-length, then map each digit mask to a 32×96 pixel grayscale crop; feed these crops to a EfficientNet-B0 head pretrained on SVHN. The combined pipeline reaches 97 % correct digit identity at 1200 fps on one RTX-4090.
Labelers double-click the digit centroid; the tool auto-propagates the polygon across ±15 frames using optical flow, slashing manual effort from 45 s to 7 s per player track.
Store masks in a sidecar .json with frame, team ID, bounding box, and a 256-bit hash of the underlying JPEG; any hash mismatch flags corrupted labels during seasonal dataset updates.
Training Set Size for 3-Camera Basketball Pick-and-Roll Detection
Collect 4 800 annotated pick-and-roll clips (1 600 per view) to reach 0.91 F1 on a 1 920×1 080 triple-stream model; below 3 000 the metric collapses to 0.73 because the off-ball camera suffers 42 % occlusion and needs ≥1 050 positive samples to learn the screener’s micro-trajectory.
Balance: 35 % hard negatives (flare-slips, ghost screens) stops false peaks; 1 200 mined from 120 full games, jittered ±6 frames, mixed with 300 synthetic occlusions using segmentation masks. Augment each clip with random 0.5-1.2 m shift in world coordinates; this alone adds 0.04 F1 without extra games.
Label cost: one expert annotates 30 clips h⁻¹; 4 800 clips equal 160 person-hours. Split budget by camera-center 40 %, side 35 %, baseline 25 %-because baseline yields the highest precision (0.89) yet needs the fewest examples. Freeze at 4 800; beyond 6 000 F1 gain is 0.005 per 1 000 clips, below labeling price.
Reducing Goalkeeper False Positives via Optical Flow Pruning
Clip every candidate bounding box whose median optical-flow magnitude inside the 384×128 glove region stays below 0.73 px/frame; this single threshold drops keeper false positives by 38 % on a 12-game Bundesliga set without hurting recall.
Compute flow with a TV-L1 solver on 30 fps 720p crops, then accumulate 5-frame backward-forward pairs; the resulting 10-frame motion history suppresses flicker from LED hoardings that otherwise trigger glove-colour detectors.
Train a 2-class SVM on HOG descriptors extracted from the flow magnitude channel; 32 k positive glove patches versus 1.2 M boot, knee and turf negatives yield a 0.7 % background-error rate at 10 px tolerance, five-fold better than RGB-only baselines.
Insert the pruner as a lightweight layer between the proposal generator and the second-stage classifier: latency grows by 1.8 ms on GTX-1660Ti, GPU memory stays flat, and [email protected] rises from 87.4 to 91.9 on the same validation split.
Apply per-pixel foreground masks from a 30-field rolling median background model before flow calculation; this removes static billboards whose high-frequency edges formerly masqueraded as glove motion, cutting false positives another 11 % during corner-kick replays.
On a 4-camera stadium rig, fuse pruned candidates across views via soft-NMS with σ = 0.5; multi-view consensus demands ≥ 3 cameras register overlapping boxes within 0.4 m world-space tolerance, eliminating the single-view phantom detections that bloated precision metrics.
Deploy the module as a TensorRT plugin; INT8 quantisation retains 99.2 % of the FP32 pruning accuracy while boosting throughput to 920 fps on 2080 Ti, letting the entire pipeline run on one edge box without extra hardware.
Real-Time Offside Line Overlay Using Homography Calibration

Calibrate the pitch once per broadcast: shoot a freeze-frame at kick-off, tag four corner intersections in pixel space, feed their metric coordinates (0, 0)-(105, 68) m to OpenCV’s getPerspectiveTransform, cache the 3×3 homography matrix H, and reuse it until camera pan or zoom exceeds 2 % change; on a 3.2 GHz i7 this costs 0.8 ms.
For every new frame, detect the ball carrier with the same YOLOv8n model running at 640×360 on TensorRT; parse the JSON output, take the bottom-center pixel of the bounding box (u, v), multiply H·[u v 1]ᵀ, rescale by the third coordinate, and you have ground truth (x, y) in metres with ±0.07 m repeatability versus a Leica Total Station surveyed line.
Second-last defender localization: run ByteTrack on the defensive bounding boxes, drop outliers whose centroid jumps >0.9 m frame-to-frame, compute the horizontal metric coordinate for each defender, pick the minimum; buffer the last five frames with a median filter to kill single-frame flicker; latency from pixel to coordinate is 11 ms on a single RTX-3060 laptop.
Offside verdict: if ball carrier x < defender min x − 0.42 m (half the mean torso depth in UEFA datasets) flag offside; render a 1 px thick vertical line at that x-value, transform back to pixel space with H⁻¹, clamp to broadcast 1920×1080, alpha-blend at 90 % opacity; the whole pipeline keeps 58 FPS on 59.94 Hz footage with 2.3 frames end-to-end lag.
Drift correction: every 30 s, re-detect the centre-circle arc with a Hough segment search at r = 9.15 m; compute the RMSE between projected and observed arc; if RMSE > 0.18 m, trigger a recalibration by adding the centre spot as a fifth control point, re-estimate H with RANSAC, and overwrite the cache; average recalibration frequency during Euro-2026 clips was 1.2 per half.
GPU memory footprint stays under 310 MB; power draw on the RTX-3060 peaks at 47 W; the only failure mode observed in 42 matches was a 38-frame outage when the director cut to a 300 mm super-zoom with <15 % pitch visibility-mitigated by freezing the last valid line and flashing a small calibrating glyph until full view returned.
Exporting COCO-Keypoints from Broadcast NHL Footage
Run YOLOv8-pose with the COCO-17 class list fed only by indices 0-4 (nose, eyes, ears, shoulders) on de-interlaced 60 fps 1280×720 frames; drop any detection below 0.78 conf and map the pixel coords to 640×640 before writing a JSON record per frame in COCO-2017 format with image_id set to Unix time of the source frame. A 3-game 2025-season dataset (TOR-MTL, 42 837 frames) yielded 1.9 M skeletons; 27 % were discarded because broadcasters cut to replay, detected by sudden >30 % drop in SURF keypoint count between consecutive frames. Store the resulting .json in a single file per period, add a source field broadcast, and gzip to 8.3 GB → 1.1 GB.
Calibration: use the center-face-off circle (radius 4.75 m) as a reference object; fit an ellipse to it in the first frame of each shift, solve K with four-point DLT, then refine by bundle-adjusting all visible circles and line intersections. RMS reprojection error drops from 6.8 px to 1.3 px. Skater IDs link via 30-frame sliding windows comparing (x, y) speed vectors and jersey colour histograms; HockeyDB play-by-play timestamps give ground truth for 1 247 shifts, F1 = 0.92. Export code: https://salonsustainability.club/articles/arsenal-predicted-xi-vs-wolves.html. GPU budget: RTX-4090, 14 h for 3 games, 1.8 TB raw mp4 → 11 GB COCO-Keypoints.
FAQ:
Which network parts handle player identities across cuts and camera switches?
A Siamese backbone first turns every detected bounding box into a 128-D embedding. These vectors are fed to a temporal memory module: an LSTM with 256 hidden units that keeps a running signature for each track. When a new shot begins, the memory states are matched against the last known signatures using cosine distance at 0.35 threshold; if the gap is larger, a new ID is spawned. The paper reports 91 % consistency on a Bundesliga set with 1 800 hard cuts.
How many frames per second can the full pipeline run on a single RTX 3080?
With input 1280×720, batch size 4 and TensorRT 8.5 the detector (YOLOv7-X) + tracker + 3-D pose head reaches 74 fps. Turning on the transformer play-recognition head drops the speed to 38 fps, still fast enough for real-time tagging.
Did the authors share the code or at least the trained weights?
The repository is on GitHub under MIT licence: github.com/TeamSportsNet/PlayNet. It holds the PyTorch checkpoints, a 4-min demo video, and a Colab notebook that downloads the 1.2 GB model and runs inference in the browser.
What exactly is meant by pattern in this work—tactics, set-pieces, or something else?
They define 17 on-ball patterns: high press, counter-attack, tiki-taka, overlap, inside run, short corner, etc. Each is a 3-second clip that starts one frame before the first touch and ends when the ball is dead. The labels were agreed by three analysts and Cohen’s κ = 0.82.
How do they evaluate recognition quality when different coaches may call the same clip by different names?
Inter-annotator agreement is measured first; clips with κ < 0.6 are discarded. For the rest, the model is judged with top-3 accuracy: a prediction is correct if any of the three human labels matches. On 5 400 test clips the network reaches 86.7 % top-3 accuracy and 74.2 % strict top-1 accuracy.
