The increasing sophistication of multimodal models necessitates benchmarks that can rigorously evaluate their understanding and reasoning in complex, safety-pertinent, open-world scenarios. This study introduces M4R (Measuring Massive Multimodal Understanding and Reasoning), a large-scale benchmark uniquely designed to assess reasoning capabilities across diverse open spaces, comprehensively covering land, air, and water environments. M4R comprises approximately 2,000 videos and over 19,000 human-annotated question-answer pairs. These videos, varying in length (short, medium, long) and presenting tasks of tiered difficulty (interval-based choices and accuracy-based choices), encompass distinct operational domains: the land-based scenarios primarily focus on traffic environments, particularly traffic collisions and accident cases; the air-based scenarios center on airplane navigation; and the water-based scenarios involve ship movements. M4R systematically evaluates models on temporal-causal reasoning, spatial understanding, and intent and goal planning within these dynamic contexts. By providing a unified platform across this broad spectrum of domains, M4R aims to drive the development of more robust and generalizable AI systems. Benchmarking state-of-the-art multimodal models on our dataset reveals that even leading models, such as ChatGPT-4o and Gemini, achieve only around a 20% success rate, highlighting the significant challenges that remain in open-space multimodal reasoning.
Difficulty | Models | Size | Over. Avg. | Short Video Scenarios | Medium Video Scenarios | Long Video Scenarios | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avg. | Temporal | Spatial | Intent | Avg. | Temporal | Spatial | Intent | Avg. | Temporal | Spatial | Intent | ||||
Hard | GPT 4o | - | 21.26 | 24.05 | 25.32 | 30.34 | 16.5 | 25.19 | 27.57 | 31.07 | 16.91 | 14.66 | 5.5 | 30.5 | 8 |
Gemini 2.5 Pro 🥇 | - | 30.58 | 33.84 | 36.32 | 34.70 | 30.50 | 30.22 | 38.55 | 26.18 | 25.91 | 27.67 | 20.00 | 21.50 | 41.50 | |
Gemini 1.5 Pro | - | 20.55 | 23.2 | 23.88 | 23.2 | 22.5 | 21.61 | 26.66 | 19.04 | 19.12 | 16.84 | 6 | 25.5 | 19 | |
Claude 3.5 | - | 26.46 | 29.88 | 26.82 | 31.82 | 31.0 | 26.10 | 28.63 | 31.86 | 17.82 | 19.66 | 11.0 | 33.0 | 15.0 | |
InternVL2.5 | 26B | 20.55 | 20.33 | 25 | 28.5 | 8.5 | 25.66 | 31 | 32 | 14 | 15.66 | 13 | 17 | 17 | |
InternVL2.5 | 8B | 20.45 | 19.34 | 19 | 30.5 | 8.5 | 24.66 | 31 | 30 | 13 | 17.34 | 10.5 | 31.5 | 10 | |
InternVL2.5 | 4B | 17.45 | 17 | 16 | 19 | 15 | 21 | 25 | 21 | 17 | 14.34 | 11.5 | 26 | 5.5 | |
LLaVA Next | 32B | 17.05 | 19.67 | 15 | 33 | 11 | 14 | 9 | 22 | 11 | 17.5 | 7.5 | 35 | 10 | |
LLaVA Video | 7B | 17.28 | 18 | 13 | 31.5 | 9.5 | 18.67 | 16 | 26 | 14 | 15.16 | 7.5 | 29 | 9 | |
LLaVA OneVision | 7B | 14.67 | 15.16 | 8.5 | 27.5 | 9.5 | 15.34 | 15 | 17 | 14 | 13.5 | 8 | 23.5 | 9 | |
Qwen2.5 VL | 32B | 19.44 | 19.66 | 8.5 | 35 | 15.5 | 25.33 | 25 | 24 | 27 | 13.33 | 2 | 28 | 10 | |
Qwen2.5 VL | 7B | 19.72 | 22.66 | 8.5 | 30 | 29.5 | 22.66 | 21 | 31 | 16 | 13.84 | 3.5 | 30 | 8 | |
Medium | GPT 4o | - | 37.72 | 42.08 | 43.24 | 55.5 | 27.5 | 31.95 | 39.84 | 30.34 | 25.66 | 39 | 44.5 | 37 | 35.5 |
Gemini 2.5 Pro 🥇 | - | 39.78 | 43.73 | 40.19 | 49.5 | 41.5 | 32.63 | 36.79 | 31.44 | 29.66 | 43.0 | 44.0 | 39.5 | 45.5 | |
Gemini 1.5 Pro | - | 36.34 | 38.73 | 37.21 | 45 | 34 | 35.09 | 33.66 | 47.11 | 24.5 | 35.17 | 21 | 53.5 | 31 | |
Claude 3.5 | - | 37.51 | 39.89 | 30.68 | 45.0 | 44.0 | 35.80 | 35.79 | 48.11 | 23.5 | 36.84 | 33.0 | 39.5 | 38.0 | |
InternVL2.5 | 26B | 31.89 | 33.66 | 33.5 | 54 | 13.5 | 30.67 | 31 | 43 | 18 | 31.34 | 27.5 | 42.5 | 24 | |
InternVL2.5 | 8B | 34.49 | 33.66 | 31.5 | 57.5 | 12 | 35 | 37 | 48 | 20 | 34.83 | 33 | 44.5 | 27 | |
InternVL2.5 | 4B | 33.05 | 34.5 | 33 | 48.5 | 22 | 33.34 | 37 | 41 | 22 | 31.33 | 25.5 | 43 | 25.5 | |
LLaVA Next | 32B | 23.05 | 26 | 17 | 44.5 | 16.5 | 18 | 16 | 25 | 13 | 25.16 | 20.5 | 38 | 17 | |
LLaVA Video | 7B | 24.84 | 25.16 | 22 | 35 | 21 | 24.34 | 26 | 27 | 20 | 25 | 14.5 | 42.5 | 18 | |
LLaVA OneVision | 7B | 20.17 | 19.66 | 23 | 32 | 16 | 18.67 | 19 | 20 | 17 | 22.16 | 16 | 32.5 | 18 | |
Qwen2.5 VL | 32B | 30.95 | 30.5 | 16.5 | 46 | 29 | 32 | 31 | 40 | 25 | 30.34 | 14 | 50 | 27 | |
Qwen2.5 VL | 7B | 28.95 | 31.84 | 26.5 | 33 | 36 | 28.34 | 28 | 33 | 24 | 26.66 | 25.5 | 23 | 31.5 | |
Easy | GPT 4o | - | 41.42 | 43.84 | 44.5 | 37.53 | 49.5 | 41.91 | 39.45 | 41.45 | 44.84 | 38.5 | 44.5 | 27.5 | 43.5 |
Gemini 2.5 Pro 🥇 | - | 53.56 | 59.48 | 65.00 | 51.94 | 61.50 | 47.36 | 46.47 | 47.59 | 48.04 | 53.84 | 57.50 | 44.50 | 59.50 | |
Gemini 1.5 Pro | - | 44.5 | 48.33 | 48 | 47 | 50 | 39.46 | 48.51 | 34.36 | 35.5 | 45.84 | 46.5 | 47 | 44 | |
Claude 3.5 | - | 45.52 | 49.16 | 47.5 | 44.0 | 56.0 | 39.51 | 32.64 | 53.51 | 32.36 | 48.0 | 52.0 | 44.5 | 47.5 | |
InternVL2.5 | 26B | 44.33 | 48.16 | 49 | 51.5 | 44 | 40 | 43 | 45 | 32 | 44.83 | 46 | 51 | 37.5 | |
InternVL2.5 | 8B | 44.27 | 46.17 | 41.5 | 53 | 44 | 40 | 45 | 42 | 33 | 46.66 | 57 | 52 | 31 | |
InternVL2.5 | 4B | 42.61 | 48.33 | 44 | 55 | 46 | 38.33 | 39 | 41 | 35 | 41.16 | 39.5 | 54 | 30 | |
LLaVA Next | 32B | 32.23 | 37.34 | 35.5 | 43.5 | 33 | 26.33 | 24 | 23 | 32 | 33.17 | 27.5 | 40 | 32 | |
LLaVA Video | 7B | 32.33 | 33.16 | 32 | 34.5 | 33 | 34 | 36 | 37 | 29 | 29.84 | 25.5 | 31 | 33 | |
LLaVA OneVision | 7B | 31.5 | 32.66 | 32.5 | 35.5 | 30 | 29.34 | 30 | 34 | 24 | 32.5 | 31.5 | 33 | 33 | |
Qwen2.5 VL | 32B | 47.84 | 50.5 | 46 | 53 | 52.5 | 46 | 43 | 46 | 49 | 47 | 43.5 | 52 | 45.5 | |
Qwen2.5 VL | 7B | 40.28 | 42.33 | 41.5 | 30 | 55.5 | 37 | 40 | 29 | 42 | 41.5 | 44.5 | 29 | 51 |
Difficulty | Models | Size | Over. Avg. | Short Video Scenarios | Medium Video Scenarios | Long Video Scenarios | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avg. | Temporal | Spatial | Intent | Avg. | Temporal | Spatial | Intent | Avg. | Temporal | Spatial | Intent | ||||
Hard | GPT 4o | - | 24.41 | 26.78 | 34.65 | 34.69 | 11 | 35.70 | 43.14 | 32.14 | 31.82 | 11.00 | 6 | 26 | 1 |
Gemini 2.5 Pro 🥇 | - | 29.76 | 34.84 | 36.63 | 44.90 | 23.0 | 35.76 | 45.10 | 30.36 | 31.82 | 18.67 | 10.0 | 28.0 | 18.0 | |
Gemini 1.5 Pro | - | 18.76 | 19.72 | 23.76 | 20.41 | 15 | 24.55 | 33.33 | 16.07 | 24.24 | 12.00 | 2 | 26 | 8 | |
Claude 3.5 | - | 28.71 | 33.76 | 35.64 | 31.63 | 34.0 | 28.87 | 37.26 | 35.71 | 13.63 | 16.0 | 12.0 | 26.0 | 10.0 | |
InternVL2.5 | 26B | 23.78 | 21.33 | 26.0 | 31.0 | 7.0 | 32.00 | 46.0 | 32.0 | 18.0 | 18.00 | 16.0 | 24.0 | 14.0 | |
InternVL2.5 | 8B | 22.67 | 20.00 | 18.0 | 33.0 | 9.0 | 30.00 | 46.0 | 30.0 | 14.0 | 18.00 | 16.0 | 28.0 | 10.0 | |
InternVL2.5 | 4B | 19.56 | 18.67 | 18.0 | 28.0 | 8.0 | 28.00 | 34.0 | 24.0 | 26.0 | 12.00 | 8.0 | 22.0 | 6.0 | |
LLaVA Next | 32B | 16.22 | 20.67 | 16.0 | 32.0 | 14.0 | 11.33 | 12.0 | 12.0 | 10.0 | 16.67 | 10.0 | 30.0 | 10.0 | |
LLaVA Video | 7B | 19.78 | 19.33 | 12.0 | 35.0 | 11.0 | 24.67 | 26.0 | 30.0 | 18.0 | 15.33 | 10.0 | 28.0 | 8.0 | |
LLaVA OneVision | 7B | 13.67 | 14.33 | 5.0 | 27.0 | 11.0 | 14.67 | 18.0 | 8.0 | 18.0 | 12.0 | 6.0 | 22.0 | 8.0 | |
Qwen2.5 VL | 32B | 22.66 | 19.33 | 11.0 | 34.0 | 13.0 | 35.33 | 46.0 | 24.0 | 36.0 | 13.33 | 4.0 | 26.0 | 10.0 | |
Qwen2.5 VL | 7B | 22.89 | 26.00 | 17.0 | 30.0 | 31.0 | 30.00 | 40.0 | 32.0 | 18.0 | 12.67 | 2.0 | 30.0 | 6.0 | |
Medium | GPT 4o 🥇 | - | 36.99 | 45.49 | 48.48 | 55 | 33 | 33.89 | 41.67 | 26.67 | 33.33 | 31.33 | 24 | 44 | 26 |
Gemini 2.5 Pro | - | 36.46 | 42.79 | 38.38 | 59.0 | 31.0 | 33.93 | 39.58 | 28.89 | 33.33 | 32.67 | 28.0 | 44.0 | 26.0 | |
Gemini 1.5 Pro | - | 33.89 | 39.47 | 42.42 | 42 | 34 | 33.52 | 33.33 | 42.22 | 25 | 28.67 | 12 | 52 | 22 | |
Claude 3.5 | - | 35.35 | 41.78 | 35.35 | 50.0 | 40.0 | 35.60 | 39.58 | 42.22 | 25.0 | 28.67 | 16.0 | 44.0 | 26.0 | |
InternVL2.5 | 26B | 35.11 | 36.00 | 39.0 | 50.0 | 19.0 | 36.67 | 50.0 | 36.0 | 24.0 | 32.67 | 30.0 | 40.0 | 28.0 | |
InternVL2.5 | 8B | 34.66 | 37.33 | 43.0 | 57.0 | 12.0 | 35.33 | 42.0 | 46.0 | 18.0 | 31.33 | 26.0 | 44.0 | 24.0 | |
InternVL2.5 | 4B | 33.89 | 39.67 | 38.0 | 53.0 | 28.0 | 32.67 | 44.0 | 28.0 | 26.0 | 29.33 | 16.0 | 46.0 | 26.0 | |
LLaVA Next | 32B | 20.0 | 27.33 | 16.0 | 49.0 | 17.0 | 10.67 | 14.0 | 10.0 | 8.0 | 22.0 | 16.0 | 36.0 | 14.0 | |
LLaVA Video | 7B | 25.67 | 25.00 | 20.0 | 34.0 | 26.0 | 28.67 | 36.0 | 28.0 | 22.0 | 23.33 | 14.0 | 40.0 | 16.0 | |
LLaVA OneVision | 7B | 16.67 | 16.00 | 26.0 | 30.0 | 16.0 | 14.67 | 18.0 | 8.0 | 18.0 | 19.33 | 12.0 | 30.0 | 16.0 | |
Qwen2.5 VL | 32B | 28.55 | 28.33 | 21.0 | 44.0 | 20.0 | 33.33 | 40.0 | 30.0 | 30.0 | 24.00 | 8.0 | 40.0 | 24.0 | |
Qwen2.5 VL | 7B | 29.89 | 39.00 | 37.0 | 42.0 | 38.0 | 30.67 | 32.0 | 40.0 | 20.0 | 20.00 | 16.0 | 26.0 | 18.0 | |
Easy | GPT 4o | - | 42.17 | 52.35 | 59 | 47.06 | 51 | 47.16 | 54.9 | 44.9 | 41.67 | 27.00 | 44 | 5 | 32 |
Gemini 2.5 Pro 🥇 | - | 54.56 | 62.96 | 70.0 | 55.88 | 63.0 | 54.73 | 52.94 | 59.18 | 52.08 | 46.00 | 40.0 | 54.0 | 44.0 | |
Gemini 1.5 Pro | - | 46.00 | 51.33 | 60 | 50 | 44 | 36.92 | 49.02 | 36.73 | 25 | 50.00 | 58 | 44 | 48 | |
Claude 3.5 | - | 48.59 | 60.33 | 61.0 | 50.0 | 70.0 | 36.35 | 35.29 | 51.02 | 22.73 | 49.33 | 64.0 | 44.0 | 40.0 | |
InternVL2.5 | 26B | 52.55 | 61.00 | 62.0 | 59.0 | 62.0 | 45.33 | 58.0 | 44.0 | 34.0 | 51.33 | 62.0 | 62.0 | 30.0 | |
InternVL2.5 | 8B | 50.11 | 55.67 | 55.0 | 60.0 | 52.0 | 44.67 | 58.0 | 42.0 | 34.0 | 50.00 | 54.0 | 64.0 | 32.0 | |
InternVL2.5 | 4B | 44.89 | 53.33 | 46.0 | 60.0 | 54.0 | 37.33 | 48.0 | 38.0 | 26.0 | 44.00 | 44.0 | 48.0 | 40.0 | |
LLaVA Next | 32B | 31.25 | 38.00 | 35.0 | 45.0 | 34.0 | 21.33 | 12.0 | 14.0 | 38.0 | 34.67 | 20.0 | 50.0 | 34.0 | |
LLaVA Video | 7B | 31.44 | 33.00 | 30.0 | 31.0 | 38.0 | 33.33 | 38.0 | 36.0 | 26.0 | 28.00 | 16.0 | 32.0 | 36.0 | |
LLaVA OneVision | 7B | 29.78 | 32.00 | 31.0 | 33.0 | 32.0 | 24.00 | 26.0 | 30.0 | 16.0 | 33.33 | 28.0 | 36.0 | 36.0 | |
Qwen2.5 VL | 32B | 43.22 | 51.00 | 58.0 | 50.0 | 45.0 | 41.33 | 46.0 | 38.0 | 40.0 | 37.33 | 32.0 | 44.0 | 36.0 | |
Qwen2.5 VL | 7B | 40.67 | 51.33 | 55.0 | 42.0 | 57.0 | 36.00 | 32.0 | 42.0 | 34.0 | 34.67 | 34.0 | 28.0 | 42.0 |
Difficulty | Models | Size | Over. Avg. | Short Video Scenarios | Medium Video Scenarios | Long Video Scenarios | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avg. | Temporal | Spatial | Intent | Avg. | Temporal | Spatial | Intent | Avg. | Temporal | Spatial | Intent | ||||
Hard | GPT 4o | - | 18.11 | 21.33 | 16.00 | 26.00 | 22.00 | 14.67 | 12.00 | 30.00 | 2.00 | 18.33 | 5.00 | 35.00 | 15.00 |
Gemini 2.5 Pro 🥇 | - | 31.39 | 32.83 | 36.0 | 24.49 | 38.0 | 24.67 | 32.0 | 22.0 | 20.0 | 36.67 | 30.0 | 15.0 | 65.0 | |
Gemini 1.5 Pro | - | 22.34 | 26.67 | 24.00 | 26.00 | 30.00 | 18.67 | 20.00 | 22.00 | 14.00 | 21.67 | 10.00 | 25.00 | 30.00 | |
Claude 3.5 | - | 24.22 | 26.00 | 18.0 | 32.0 | 28.0 | 23.33 | 20.0 | 28.0 | 22.0 | 23.33 | 10.0 | 40.0 | 20.0 | |
InternVL2.5 | 26B | 17.33 | 19.33 | 24.00 | 26.00 | 10.00 | 19.33 | 16.00 | 32.00 | 10.00 | 13.33 | 10.00 | 10.00 | 20.00 | |
InternVL2.5 | 8B | 18.22 | 18.67 | 20.00 | 28.00 | 8.00 | 19.33 | 16.00 | 30.00 | 12.00 | 16.67 | 5.00 | 35.00 | 10.00 | |
InternVL2.5 | 4B | 15.33 | 15.33 | 14.00 | 10.00 | 22.00 | 14.00 | 16.00 | 18.00 | 8.00 | 16.67 | 15.00 | 30.00 | 5.00 | |
LLaVA Next | 32B | 17.89 | 18.67 | 14.0 | 34.0 | 8.0 | 16.67 | 6.0 | 32.0 | 12.0 | 18.33 | 5.0 | 40.0 | 10.0 | |
LLaVA Video | 7B | 14.78 | 16.67 | 14.00 | 28.00 | 8.00 | 12.67 | 6.00 | 22.00 | 10.00 | 15.00 | 5.00 | 30.00 | 10.00 | |
LLaVA OneVision | 7B | 15.67 | 16.00 | 12.00 | 28.00 | 8.00 | 16.00 | 12.00 | 26.00 | 10.00 | 15.00 | 10.00 | 25.00 | 10.00 | |
Qwen2.5 VL | 32B | 16.22 | 20.00 | 6.00 | 36.00 | 18.00 | 15.33 | 4.00 | 24.00 | 18.00 | 13.33 | 0.00 | 30.00 | 10.00 | |
Qwen2.5 VL | 7B | 16.55 | 19.33 | 0.00 | 30.00 | 28.00 | 15.33 | 2.00 | 30.00 | 14.00 | 15.00 | 5.00 | 30.00 | 10.00 | |
Medium | GPT 4o | - | 38.45 | 38.67 | 38.00 | 56.00 | 22.00 | 30.00 | 38.00 | 34.00 | 18.00 | 46.67 | 65.00 | 30.00 | 45.00 |
Gemini 2.5 Pro 🥇 | - | 43.11 | 44.67 | 42.0 | 40.0 | 52.0 | 31.33 | 34.0 | 34.0 | 26.0 | 53.33 | 60.0 | 35.0 | 65.0 | |
Gemini 1.5 Pro | - | 38.78 | 38.00 | 32.00 | 48.00 | 34.00 | 36.67 | 34.00 | 52.00 | 24.00 | 41.67 | 30.00 | 55.00 | 40.00 | |
Claude 3.5 | - | 39.67 | 38.00 | 26.0 | 40.0 | 48.0 | 36.00 | 32.0 | 54.0 | 22.0 | 45.00 | 50.0 | 35.0 | 50.0 | |
InternVL2.5 | 26B | 28.67 | 31.33 | 28.00 | 58.00 | 8.00 | 24.67 | 12.00 | 50.00 | 12.00 | 30.00 | 25.00 | 45.00 | 20.00 | |
InternVL2.5 | 8B | 34.33 | 30.00 | 20.00 | 58.00 | 12.00 | 34.67 | 32.00 | 50.00 | 22.00 | 38.33 | 40.00 | 45.00 | 30.00 | |
InternVL2.5 | 4B | 32.22 | 29.33 | 28.00 | 44.00 | 16.00 | 34.00 | 30.00 | 54.00 | 18.00 | 33.33 | 35.00 | 40.00 | 25.00 | |
LLaVA Next | 32B | 26.11 | 24.67 | 18.0 | 40.0 | 16.0 | 25.33 | 18.0 | 40.0 | 18.0 | 28.33 | 25.0 | 40.0 | 20.0 | |
LLaVA Video | 7B | 24.00 | 25.33 | 24.00 | 36.00 | 16.00 | 20.00 | 16.00 | 26.00 | 18.00 | 26.67 | 15.00 | 45.00 | 20.00 | |
LLaVA OneVision | 7B | 23.67 | 23.33 | 20.00 | 34.00 | 16.00 | 22.67 | 20.00 | 32.00 | 16.00 | 25.00 | 20.00 | 35.00 | 20.00 | |
Qwen2.5 VL | 32B | 33.34 | 32.67 | 12.00 | 48.00 | 38.00 | 30.67 | 22.00 | 50.00 | 20.00 | 36.67 | 20.00 | 60.00 | 30.00 | |
Qwen2.5 VL | 7B | 28.00 | 24.67 | 16.00 | 24.00 | 34.00 | 26.00 | 24.00 | 26.00 | 28.00 | 33.33 | 35.00 | 20.00 | 45.00 | |
Easy | GPT 4o | - | 40.67 | 35.33 | 30.00 | 28.00 | 48.00 | 36.67 | 24.00 | 38.00 | 48.00 | 50.00 | 45.00 | 50.00 | 55.00 |
Gemini 2.5 Pro 🥇 | - | 52.56 | 56.00 | 60.0 | 48.0 | 60.0 | 40.00 | 40.0 | 36.0 | 44.0 | 61.67 | 75.0 | 35.0 | 75.0 | |
Gemini 1.5 Pro | - | 43.00 | 45.33 | 36.00 | 44.00 | 56.00 | 42.00 | 48.00 | 32.00 | 46.00 | 41.67 | 35.00 | 50.00 | 40.00 | |
Claude 3.5 | - | 42.45 | 38.00 | 34.0 | 38.0 | 42.0 | 42.67 | 30.0 | 56.0 | 42.0 | 46.67 | 40.0 | 45.0 | 55.0 | |
InternVL2.5 | 26B | 36.11 | 35.33 | 36.00 | 44.00 | 26.00 | 34.67 | 28.00 | 46.00 | 30.00 | 38.33 | 30.00 | 40.00 | 45.00 | |
InternVL2.5 | 8B | 38.44 | 36.67 | 28.00 | 46.00 | 36.00 | 35.33 | 32.00 | 42.00 | 32.00 | 43.33 | 60.00 | 40.00 | 30.00 | |
InternVL2.5 | 4B | 40.33 | 43.33 | 42.00 | 50.00 | 38.00 | 39.33 | 30.00 | 44.00 | 44.00 | 38.33 | 35.00 | 60.00 | 20.00 | |
LLaVA Next | 32B | 33.22 | 36.67 | 36.00 | 42.0 | 32.0 | 31.33 | 36.0 | 32.0 | 26.0 | 31.67 | 35.0 | 30.0 | 30.0 | |
LLaVA Video | 7B | 33.22 | 33.33 | 34.00 | 38.00 | 28.00 | 34.67 | 34.00 | 38.00 | 32.00 | 31.67 | 35.00 | 30.00 | 30.00 | |
LLaVA OneVision | 7B | 33.22 | 33.33 | 34.00 | 38.00 | 28.00 | 34.67 | 34.00 | 38.00 | 32.00 | 31.67 | 35.00 | 30.00 | 30.00 | |
Qwen2.5 VL | 32B | 52.45 | 50.00 | 34.00 | 56.00 | 60.00 | 50.67 | 40.00 | 54.00 | 58.00 | 56.67 | 55.00 | 60.00 | 55.00 | |
Qwen2.5 VL | 7B | 39.89 | 33.33 | 28.00 | 18.00 | 54.00 | 38.00 | 48.00 | 16.00 | 50.00 | 48.33 | 55.00 | 30.00 | 60.00 |
Difficulty | Models | Size | Over. Avg. | River Scenarios | Ocean Scenarios | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Avg. | Temporal | Spatial | Intent | Avg. | Temporal | Spatial | Intent | ||||
Hard | GPT 4o | - | 22.10 | 28.20 | 38.46 | 26.92 | 19.23 | 16.00 | 18.00 | 18.00 | 12.00 |
Gemini 2.5 Pro 🥇 | - | 29.64 | 34.62 | 23.08 | 34.62 | 46.15 | 24.67 | 38.0 | 16.0 | 20.0 | |
Gemini 1.5 Pro | - | 26.02 | 26.92 | 23.08 | 30.77 | 26.92 | 25.11 | 34.00 | 20.93 | 20.41 | |
Claude 3.5 | - | 25.44 | 28.20 | 19.23 | 19.23 | 46.15 | 22.67 | 26.0 | 22.0 | 20.0 | |
InternVL2.5 | 26B | 22.54 | 23.08 | 15.38 | 19.23 | 34.62 | 22.00 | 18.00 | 28.00 | 20.00 | |
InternVL2.5 | 8B | 21.90 | 21.79 | 7.69 | 26.92 | 30.77 | 22.00 | 16.00 | 28.00 | 22.00 | |
InternVL2.5 | 4B | 20.92 | 20.51 | 19.23 | 19.23 | 23.08 | 21.33 | 16.00 | 26.00 | 22.00 | |
LLaVA Next | 32B | 14.39 | 11.54 | 7.69 | 19.23 | 7.69 | 15.33 | 8.0 | 30.0 | 8.0 | |
LLaVA Video | 7B | 14.00 | 16.67 | 15.38 | 23.08 | 11.54 | 11.33 | 8.00 | 20.00 | 6.00 | |
LLaVA OneVision | 7B | 15.67 | 16.67 | 11.54 | 26.92 | 11.54 | 14.67 | 8.00 | 28.00 | 8.00 | |
Qwen2.5 VL | 32B | 13.39 | 14.10 | 7.69 | 23.08 | 11.54 | 12.67 | 8.0 | 24.0 | 6.0 | |
Qwen2.5 VL | 7B | 14.67 | 16.67 | 7.69 | 30.77 | 11.54 | 12.67 | 6.00 | 24.00 | 8.00 | |
Medium | GPT 4o | - | 38.49 | 42.31 | 50.00 | 53.85 | 23.08 | 34.67 | 36.00 | 48.00 | 20.00 |
Gemini 2.5 Pro | - | 41.77 | 44.87 | 30.77 | 61.54 | 42.31 | 38.67 | 48.0 | 46.0 | 22.0 | |
Gemini 1.5 Pro 🥇 | - | 46.31 | 53.84 | 46.15 | 65.38 | 50.00 | 38.78 | 34.00 | 49.02 | 33.33 | |
Claude 3.5 | - | 38.62 | 35.90 | 34.62 | 50.0 | 23.08 | 41.33 | 42.0 | 54.0 | 28.0 | |
InternVL2.5 | 26B | 41.77 | 44.87 | 30.77 | 57.69 | 46.15 | 38.67 | 24.00 | 62.00 | 30.00 | |
InternVL2.5 | 8B | 41.08 | 46.15 | 34.62 | 61.54 | 42.31 | 36.00 | 34.00 | 60.00 | 14.00 | |
InternVL2.5 | 4B | 44.36 | 48.72 | 23.08 | 65.38 | 57.69 | 40.00 | 28.00 | 60.00 | 32.00 | |
LLaVA Next | 32B | 20.88 | 23.08 | 11.54 | 38.46 | 19.23 | 18.67 | 10.00 | 30.00 | 16.00 | |
LLaVA Video | 7B | 21.92 | 20.51 | 19.23 | 26.92 | 15.38 | 23.33 | 20.00 | 30.00 | 20.00 | |
LLaVA OneVision | 7B | 22.54 | 23.08 | 19.23 | 30.77 | 19.23 | 22.00 | 14.00 | 34.00 | 18.00 | |
Qwen2.5 VL | 32B | 33.31 | 34.62 | 19.23 | 50.00 | 34.62 | 32.00 | 20.00 | 50.00 | 26.00 | |
Qwen2.5 VL | 7B | 24.08 | 29.49 | 19.23 | 30.77 | 38.46 | 18.67 | 18.00 | 26.00 | 12.00 | |
Easy | GPT 4o | - | 50.51 | 57.69 | 57.69 | 50.00 | 65.38 | 43.33 | 66.00 | 34.00 | 30.00 |
Gemini 2.5 Pro 🥇 | - | 61.05 | 64.10 | 57.69 | 57.69 | 76.92 | 58.00 | 72.0 | 50.0 | 52.0 | |
Gemini 1.5 Pro | - | 50.69 | 52.56 | 42.31 | 61.54 | 53.85 | 48.81 | 50.00 | 46.43 | 50.00 | |
Claude 3.5 | - | 49.39 | 47.44 | 50.0 | 53.85 | 38.46 | 51.33 | 62.0 | 52.0 | 40.0 | |
InternVL2.5 | 26B | 55.05 | 64.10 | 65.38 | 57.69 | 69.23 | 46.00 | 50.00 | 50.00 | 38.00 | |
InternVL2.5 | 8B | 53.47 | 60.26 | 69.23 | 46.15 | 65.38 | 46.67 | 46.00 | 54.00 | 40.00 | |
InternVL2.5 | 4B | 53.87 | 56.41 | 53.85 | 57.69 | 57.69 | 51.33 | 52.00 | 56.00 | 46.00 | |
LLaVA Next | 32B | 35.59 | 37.18 | 26.92 | 53.85 | 30.77 | 34.00 | 30.00 | 38.00 | 34.00 | |
LLaVA Video | 7B | 31.03 | 32.05 | 30.77 | 34.62 | 30.77 | 30.00 | 22.00 | 38.00 | 30.00 | |
LLaVA OneVision | 7B | 33.00 | 33.33 | 34.62 | 34.62 | 30.77 | 32.67 | 28.00 | 38.00 | 32.00 | |
Qwen2.5 VL | 32B | 52.77 | 61.54 | 53.85 | 61.54 | 69.23 | 44.00 | 40.00 | 54.00 | 38.00 | |
Qwen2.5 VL | 7B | 31.31 | 34.62 | 38.46 | 19.23 | 46.15 | 28.00 | 36.00 | 22.00 | 26.00 |
@article{gu2025m4r,
title={Measuring Massive Multimodal Understanding and Reasoning in Open Space},
author={Gu, Shangding and Wang, Xiaohan and Ying, Donghao and Zhao, Haoyu and Yang, Runing and Li, Boyi and Jin, Ming and Pavone, Marco and Yeung-Levy, Serena and Wang, Jun and Song, Dawn and Spanos, Costas},
journal={Github},
year={2025}
}