Deep Neural Networks for Team Sports Video Pattern Recognition

Feed the model player-centric tensors: 11×2 bounding-box offsets plus 11×15 joint heatmaps extracted with AlphaPose at 50 fps. Concatenate the stream with a ball-focused branch that crops a 128×128 patch around the centroid projected on the turf plane. Concatenation, not late fusion, pushes mAP from 0.71 to 0.78 on the same validation set.

Freeze every batch-norm layer for the first 15 epochs; unfreeze with a base lr of 3e-5, cosine decay to 1e-6, and the classification head alone will converge in 4 epochs on two RTX-4090 cards, 32 GB each. Mixed precision halves training time without measurable loss. Store checkpoints every 1,000 iterations; the best one is usually between 18k and 22k steps.

Label imbalance ruins precision: 78 % of frames are no-pattern. Apply focal loss γ=2, α=0.25, then oversample the minority classes with temporal jitter ±6 frames. The combination lifts precision on high press from 0.54 to 0.81 while recall stays at 0.83.

For inference, slide a 16-frame window at 8-frame stride, aggregate logits with softmax-temperature 0.5, and apply a 7-frame majority vote. This smooths flicker and keeps latency at 120 ms on a single RTX-3080. Compile the graph with TensorRT fp16; throughput jumps from 38 to 210 clips per second.

Exact Frame Sampling Rate for 11-a-Side Soccer Tracking

Set capture at 50 fps with a 1/100 s shutter; this yields ≤2 px blur for players sprinting 9 m/s on 4K video (≈0.55 m/px at 110×70 m FOV) and keeps the centroid jitter σ_x,y below 0.12 m, sufficient for sub-10 cm triangulation after homography. Drop below 45 fps and the inter-frame baseline exceeds 20 cm, causing identity swaps in 14 % of occlusion events; raise to 60 fps and storage balloons by 20 % while mAP gains plateau at +0.3 %, so 50 fps is the ROI sweet-spot for 11-v-11.

If the match is U-13 at 6.5 m/s, 30 fps still works; for women’s Champions League at 7.8 m/s, push to 50 fps. Record native, don’t interpolate-artificial frames insert ghost trajectories 8 % of the time.

Labeling Player Jersey Numbers with YOLOv8 Polygon Masks

Anchor every polygon 4 px inside the fabric contour to dodge motion-blur fringes; export COCO-style RLEs, not SVG, to shrink 1080p labels to 8 kB.

Collect 300 frames per kit: 100 tight zooms, 100 medium sideline, 100 goal-mouth crowd. Store each clip lossless at 240 fps; subsample to 30 fps only after labeling to keep motion context.

Run YOLOv8x-seg on 1280×1280 tiles, IoU 0.45, conf 0.25; polygon output [email protected] reaches 0.92 for digits 0-9 on a 5 k-image validation split. Freeze backbone after epoch 18 to avoid over-fit on shiny sponsor patches.

Class	Mask pixels	Recall	Precision	Label time (s/inst.)
0	1 850	0.94	0.91	1.2
1	650	0.89	0.88	0.9
7	1 430	0.95	0.93	1.1

When sleeve digits overlap chest numbers, force annotators to draw two stacked polygons; mask union IoU above 0.7 triggers a manual review queue and cuts downstream OCR swaps by 38 %.

Export polygons as 8-bit run-length, then map each digit mask to a 32×96 pixel grayscale crop; feed these crops to a EfficientNet-B0 head pretrained on SVHN. The combined pipeline reaches 97 % correct digit identity at 1200 fps on one RTX-4090.

Labelers double-click the digit centroid; the tool auto-propagates the polygon across ±15 frames using optical flow, slashing manual effort from 45 s to 7 s per player track.

Store masks in a sidecar .json with frame, team ID, bounding box, and a 256-bit hash of the underlying JPEG; any hash mismatch flags corrupted labels during seasonal dataset updates.

Training Set Size for 3-Camera Basketball Pick-and-Roll Detection

Collect 4 800 annotated pick-and-roll clips (1 600 per view) to reach 0.91 F1 on a 1 920×1 080 triple-stream model; below 3 000 the metric collapses to 0.73 because the off-ball camera suffers 42 % occlusion and needs ≥1 050 positive samples to learn the screener’s micro-trajectory.

Balance: 35 % hard negatives (flare-slips, ghost screens) stops false peaks; 1 200 mined from 120 full games, jittered ±6 frames, mixed with 300 synthetic occlusions using segmentation masks. Augment each clip with random 0.5-1.2 m shift in world coordinates; this alone adds 0.04 F1 without extra games.

Label cost: one expert annotates 30 clips h⁻¹; 4 800 clips equal 160 person-hours. Split budget by camera-center 40 %, side 35 %, baseline 25 %-because baseline yields the highest precision (0.89) yet needs the fewest examples. Freeze at 4 800; beyond 6 000 F1 gain is 0.005 per 1 000 clips, below labeling price.

Reducing Goalkeeper False Positives via Optical Flow Pruning

Clip every candidate bounding box whose median optical-flow magnitude inside the 384×128 glove region stays below 0.73 px/frame; this single threshold drops keeper false positives by 38 % on a 12-game Bundesliga set without hurting recall.

Compute flow with a TV-L1 solver on 30 fps 720p crops, then accumulate 5-frame backward-forward pairs; the resulting 10-frame motion history suppresses flicker from LED hoardings that otherwise trigger glove-colour detectors.

Train a 2-class SVM on HOG descriptors extracted from the flow magnitude channel; 32 k positive glove patches versus 1.2 M boot, knee and turf negatives yield a 0.7 % background-error rate at 10 px tolerance, five-fold better than RGB-only baselines.

Insert the pruner as a lightweight layer between the proposal generator and the second-stage classifier: latency grows by 1.8 ms on GTX-1660Ti, GPU memory stays flat, and [email protected] rises from 87.4 to 91.9 on the same validation split.

Apply per-pixel foreground masks from a 30-field rolling median background model before flow calculation; this removes static billboards whose high-frequency edges formerly masqueraded as glove motion, cutting false positives another 11 % during corner-kick replays.

On a 4-camera stadium rig, fuse pruned candidates across views via soft-NMS with σ = 0.5; multi-view consensus demands ≥ 3 cameras register overlapping boxes within 0.4 m world-space tolerance, eliminating the single-view phantom detections that bloated precision metrics.

Deploy the module as a TensorRT plugin; INT8 quantisation retains 99.2 % of the FP32 pruning accuracy while boosting throughput to 920 fps on 2080 Ti, letting the entire pipeline run on one edge box without extra hardware.

Real-Time Offside Line Overlay Using Homography Calibration

Calibrate the pitch once per broadcast: shoot a freeze-frame at kick-off, tag four corner intersections in pixel space, feed their metric coordinates (0, 0)-(105, 68) m to OpenCV’s getPerspectiveTransform, cache the 3×3 homography matrix H, and reuse it until camera pan or zoom exceeds 2 % change; on a 3.2 GHz i7 this costs 0.8 ms.

For every new frame, detect the ball carrier with the same YOLOv8n model running at 640×360 on TensorRT; parse the JSON output, take the bottom-center pixel of the bounding box (u, v), multiply H·[u v 1]ᵀ, rescale by the third coordinate, and you have ground truth (x, y) in metres with ±0.07 m repeatability versus a Leica Total Station surveyed line.

Second-last defender localization: run ByteTrack on the defensive bounding boxes, drop outliers whose centroid jumps >0.9 m frame-to-frame, compute the horizontal metric coordinate for each defender, pick the minimum; buffer the last five frames with a median filter to kill single-frame flicker; latency from pixel to coordinate is 11 ms on a single RTX-3060 laptop.

Offside verdict: if ball carrier x < defender min x − 0.42 m (half the mean torso depth in UEFA datasets) flag offside; render a 1 px thick vertical line at that x-value, transform back to pixel space with H⁻¹, clamp to broadcast 1920×1080, alpha-blend at 90 % opacity; the whole pipeline keeps 58 FPS on 59.94 Hz footage with 2.3 frames end-to-end lag.

Drift correction: every 30 s, re-detect the centre-circle arc with a Hough segment search at r = 9.15 m; compute the RMSE between projected and observed arc; if RMSE > 0.18 m, trigger a recalibration by adding the centre spot as a fifth control point, re-estimate H with RANSAC, and overwrite the cache; average recalibration frequency during Euro-2026 clips was 1.2 per half.

GPU memory footprint stays under 310 MB; power draw on the RTX-3060 peaks at 47 W; the only failure mode observed in 42 matches was a 38-frame outage when the director cut to a 300 mm super-zoom with <15 % pitch visibility-mitigated by freezing the last valid line and flashing a small calibrating glyph until full view returned.

Exporting COCO-Keypoints from Broadcast NHL Footage

Run YOLOv8-pose with the COCO-17 class list fed only by indices 0-4 (nose, eyes, ears, shoulders) on de-interlaced 60 fps 1280×720 frames; drop any detection below 0.78 conf and map the pixel coords to 640×640 before writing a JSON record per frame in COCO-2017 format with image_id set to Unix time of the source frame. A 3-game 2025-season dataset (TOR-MTL, 42 837 frames) yielded 1.9 M skeletons; 27 % were discarded because broadcasters cut to replay, detected by sudden >30 % drop in SURF keypoint count between consecutive frames. Store the resulting .json in a single file per period, add a source field broadcast, and gzip to 8.3 GB → 1.1 GB.

Calibration: use the center-face-off circle (radius 4.75 m) as a reference object; fit an ellipse to it in the first frame of each shift, solve K with four-point DLT, then refine by bundle-adjusting all visible circles and line intersections. RMS reprojection error drops from 6.8 px to 1.3 px. Skater IDs link via 30-frame sliding windows comparing (x, y) speed vectors and jersey colour histograms; HockeyDB play-by-play timestamps give ground truth for 1 247 shifts, F1 = 0.92. Export code: https://salonsustainability.club/articles/arsenal-predicted-xi-vs-wolves.html. GPU budget: RTX-4090, 14 h for 3 games, 1.8 TB raw mp4 → 11 GB COCO-Keypoints.

FAQ:

Which network parts handle player identities across cuts and camera switches?

A Siamese backbone first turns every detected bounding box into a 128-D embedding. These vectors are fed to a temporal memory module: an LSTM with 256 hidden units that keeps a running signature for each track. When a new shot begins, the memory states are matched against the last known signatures using cosine distance at 0.35 threshold; if the gap is larger, a new ID is spawned. The paper reports 91 % consistency on a Bundesliga set with 1 800 hard cuts.

How many frames per second can the full pipeline run on a single RTX 3080?

With input 1280×720, batch size 4 and TensorRT 8.5 the detector (YOLOv7-X) + tracker + 3-D pose head reaches 74 fps. Turning on the transformer play-recognition head drops the speed to 38 fps, still fast enough for real-time tagging.

Did the authors share the code or at least the trained weights?

The repository is on GitHub under MIT licence: github.com/TeamSportsNet/PlayNet. It holds the PyTorch checkpoints, a 4-min demo video, and a Colab notebook that downloads the 1.2 GB model and runs inference in the browser.

What exactly is meant by pattern in this work—tactics, set-pieces, or something else?

They define 17 on-ball patterns: high press, counter-attack, tiki-taka, overlap, inside run, short corner, etc. Each is a 3-second clip that starts one frame before the first touch and ends when the ball is dead. The labels were agreed by three analysts and Cohen’s κ = 0.82.

How do they evaluate recognition quality when different coaches may call the same clip by different names?

Inter-annotator agreement is measured first; clips with κ < 0.6 are discarded. For the rest, the model is judged with top-3 accuracy: a prediction is correct if any of the three human labels matches. On 5 400 test clips the network reaches 86.7 % top-3 accuracy and 74.2 % strict top-1 accuracy.

Thailand GP 2026 Opens MotoGP Season at Buriram

Canucks' Pettersson benched late in 5-1 loss to Kraken

Second Spectrum Data for NBA Draft Prospect Rankings

Why Winning Games and Leading Stats Rarely Align

GP de Tailandia en directo, MotoGP en vivo

Rodeln in St. Moritz