Best Paper in Cognitive Robotics at the International Conference on Robotics and Automation (ICRA), 2024
Workshop on Language and Robot Learning @ CoRL, 2023
Text prompt: "Seems like there's a microwave ahead."
Text prompt: "Potted plant in the sun by the stairs."
Text prompt: "Seems like there is a toilet ahead."
Text prompt: "Seems like there's a toilet ahead."
Text prompt: "Seems like there's a bed ahead."
Text prompt: "Seems like there's a chair ahead."
VLFM does not yet filter its detections using other visual cues from the environment, and is thus still sensitive to false positives outputted by the detector.
Text prompt: "Seems like there's a tv ahead."
Text prompt: "Seems like there's a bed ahead."
We have found that VLFM can also be sensitive to environments that do not feature many visual semantics cues relating to the object, such as homogeneous office environments where it may be difficult to find a toilet from a far distance.
@inproceedings{yokoyama2024vlfm,
title={VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation},
author={Naoki Yokoyama and Sehoon Ha and Dhruv Batra and Jiuguang Wang and Bernadette Bucher},
booktitle={International Conference on Robotics and Automation (ICRA)},
year={2024},
}