Description
Aerial images are difficult to analyze due to their high resolution, non-intuitive structure, and the limited availability of domain-specific datasets. We created a new real-world dataset of agricultural areas in Sicily by extracting high-resolution images from Google Maps and manually annotating eight classes with polygon masks using Roboflow. Using this dataset, we explored two complementary approaches.
First, YOLOv11x-seg was trained for object detection and segmentation, achieving the best overall mAP among tested YOLO variants and reaching accuracy 0.823, precision 0.931, and recall 0.942 on test images. Second, we evaluated LLaMA-4 Maverick ability to detect objects of interest, by running it on a cloud service and using Groq API. Despite producing structured CSV outputs, the model showed limited recall, spatial imprecision, class ambiguities, and occasional hallucinations. Overall, YOLOv11x-seg proved reliable for fine-grained aerial analysis, while LLaMA-4 Maverick remained suitable only for coarse descriptions.