Results v3

In this post, we will cover:

Recap
Results of the experiment
Improvements to be made

Recap

We trained using YOLO v10n (previous was v8n), using a dataset of 252 in-game images (same as previous test), as well as 33 real-life images. We gave it the following classes:

# Classes
names:
LSV
MRAP
Pickup_Truck
Civvie_Car
Misc

For in-game data,

Images of vehicles were taken in different places, though all were done in bright daylight.
Distances of vehicles from camera also varied.
Most were taken from a drone, so they were mostly top-down view.

For real-life data,

Images of vehicles taken in similar places (road-side), mostly in daylight.
Similar distance of vehicle from camera.
All taken at eye-level or waist-level.

Sample images of real-life data. | | | |-|-| | | | | | |

Results of the experiment

Recall that one of the main objectives is to run this model on real-life data.

We have a small dataset of images of Singapore’s military assets, mainly consiting of the Light Strike Vehicle Mark 2. Images were mostly taken with a front or side-profile, taken at street level.

(This was the same test dataset used in the previous tests.)

We only really expect it to detect the LSV, since Arma 3’s Qilin is based on Singapore’s Light Strike Vehicle Mark 2.

THERE ARE 3 PARTS TO THE ANALYSIS

Recap of using ONLY IN-GAME data (copy-paste of results v2).
Results of using ONLY REAL-LIFE data (to compare the effectiveness of adding in-game data)
Results of using mixed data (in-game + real life)

Results of ONLY IN-GAME data

Unfortunately, while it occasionally can detect the LSV, it is highly unreliable and even if it detects the LSV, it is with low confidence ratings. Furthermore, there are multiple instances where background objects are mistakenly being detected.

Results of ONLY REAL-LIFE data

The model was only trained with the 33 real-life images. Only classes of LSV were labelled in the data.

The resulting model was tested on the same test set as the rest of the comparison.

Though there were some outliers which had high confidence rating when correctly detecting the LSV, there were many more with a low confidence score or no detections at all.

Results of mixed data

This performed better than solely using in-game data, with significantly higher confidence scores as well as less false-negatives.

Improvements to be made

More diverse training data

The training data, especially for the real-life images, is really limited. It is difficult to get images of the subject under different lighting conditions and different angles.

More diverse class range

Currently, we only have a very limited set of classes. To make the model more useful, we need to increase the amount of classes the model can detect. However, this comes at a cost of gathering more data.