New Paper: Who gets to use the street? Evaluate the utilization and inclusiveness using crowdsourced videos and vision-language models

Our new study is published in Sustainable Cities and Society. This study introduces a scalable framework combining crowdsourced street videos from delivery riders (EgoCity dataset) and a custom vision-language model (LLaVA-PI) to evaluate public street space (PSS) through two indicators: utilization and inclusiveness. Results show a structural mismatch, while adults dominate high-activity streets, children and elderly residents are underrepresented, even in neighborhoods where they live in large numbers. Only 4.9% of streets achieve both high utilization and high inclusiveness. This framework offers a new AI-powered lens for diagnosing spatial inequality and supporting just urban design.

Nature AI Lab
2025-10-27

We are glad to share our new paper:
We are glad to share our new paper: Zhang, X., Chen, M., & Huang, Y. (2025). Who gets to use the street? Evaluate the utilization and inclusiveness using crowdsourced videos and vision-language models. Sustainable Cities and Society, 134, 106906. https://doi.org/10.1016/j.scs.2025.106906

📢 Background :
Public street space (PSS) plays a vital role in promoting urban vitality and social equity. However, current evaluation frameworks often overlook demographic differences and lack fine-grained data, leading to limited understanding of how different groups actually use urban streets. This study addresses these gaps by proposing a scalable, low-cost data collection method using food delivery riders and a vision-language model to evaluate both the utilization and inclusiveness of street spaces.

🔍 Research Objectives and Question :
This study aims to develop a scalable and low-cost framework to evaluate the real-world performance of public street spaces (PSS) by integrating crowdsourced street videos and a high-accuracy vision-language model. Specifically, it seeks to (1) Propose a low-cost and scalable framework for data collection and analysis, and developed a corresponding high-accuracy hybrid vision-language model (VLM). (2) Analyse pedestrian volume and activities distribution from street videos using VLM. (3) Evaluate the utilization and inclusiveness of PSS based on different age groups.

🔍 Key Highlights:

This study proposes a novel AI driven framework to evaluate public street space (PSS) using crowdsourced delivery rider videos and a vision-language model. It introduces two core indicator: Utilization and Inclusiveness, to assess how actively and fairly street spaces are used. The findings reveal a systemic mismatch: adults dominate vibrant streets, while children and the elderly are underrepresented. Even in neighborhoods where they reside. Only 4.9% of areas achieve both high utilization and inclusiveness, highlighting spatial inequalities in everyday urban life.

New Paper: Who gets to use the street? Evaluate the utilization and inclusiveness using crowdsourced videos and vision-language models

New Paper: Who gets to use the street? Evaluate the utilization and inclusiveness using crowdsourced videos and vision-language models

Headquarters

Service

Help & support

Follow us