Naoki Yokoyama
About MeCVResearchGitHubGoogle ScholarProjectsPhotography



Ph.D. Student in Robotics at Georgia Tech
Atlanta, GA

Northeastern University
B.S. and M.S. Electrical Engineering
Concn. in Machine Learning & Computer Vision



About Me

I am currently a 4th year Robotics Ph.D. student at Georgia Tech advised by Dhruv Batra and Sehoon Ha. Previously, I graduated with my BS and MS from Northeastern University. My research interests involve scalable learning methods that will teach robots to effectively perceive and interact within various environments in the real world by training them within realistic simulators before transferring the learned skills to reality.

During my PhD, I've interned at the Boston Dynamics AI Institute with Bernadette Bucher and Jiuguang Wang (Summer 2023), at Amazon with Gaurav Sukhatme on deep reinforcement learning for robotics with reward decomposition (Summer 2022), and at Meta AI with Akshara Rai on mobile manipulation for object rearrangement (Summer 2021).

Previously I also worked with Taskin Padir in the Robotics and Intelligent Vehicles Research (RIVeR) lab at Northeastern University. There, I led Team Northeastern in mutiple international robotics competitions such as the 2019 RoboCup@Home competition in Sydney, Australia, the 2018 World Robot Summit in Tokyo, Japan, and the Robocup@Home 2018 in Montreal, Canada, where we placed 4th internationally and 1st in the USA.

I have also had the pleasure of mentoring other students, such as Qian Luo (MS@GT), Simar Kareer (MS@GT), and Marco Delgado (BS@GT) in research projects.

One of my hobbies is taking photos. You can see some here.


Boston Dynamics AI Institute
Summer 2023

Amazon Science
Summer 2022

Facebook AI Research
Summer 2021

RIVeR Research Lab
2017 - 2019



Awards

  • Achievement Rewards for College Scientists (ARCS) Fellowship 2022, 2023
  • Adobe Research Fellowship 2022 Finalist
  • iGibson Dynamic Visual Navigation Challenge 2021 1st Place
  • Robocup@Home 2019 1st Place in USA, 2018 1st Place in USA
  • Northeastern Senior Capstone Design 2018, 1st Place
  • Joseph Spear Scholarship 2017
  • SASE Kellogg Scholarship 2016
  • Clara & Joseph Ford Scholarship 2016
  • HackMIT “Best NativeScript App for IoT” Winner 2016
  • SASE InnoService Competition 3rd Place 2014-15, 3rd Place 2013-2014
  • Karen T. Rigg Scholarship 2014
  • Gordon CenSSIS Scholar 2013
  • George Alden and Amelia Peabody Scholarship 2013-18
  • Dean's Scholarship 2013-18


Research


VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, Bernadette Bucher

ICRA 2024
Workshop on Language and Robot Learning at CoRL 2023

Project Page Paper Code

State-of-the-art ObjectNav performance using vision-language foundation models.



LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place

Tsung-Yen Yang, Sergio Arnaud, Kavit Shah, Naoki Yokoyama, Alexander Clegg, Joanne Truong , Eric Undersander, Oleksandr Maksymets, Sehoon Ha, Mrinal Kalakrishnan, Roozbeh Mottaghi, Dhruv Batra, Akshara Rai

CVPR 2023 Demo Track
CVPR 2023 Meta AI Booth

Project Page

Open-vocabulary mobile manipulation using LLMs to generate plans from natural language commands.



Adaptive Skill Coordination for Robotic Mobile Manipulation

Naoki Yokoyama, Alexander Clegg, Joanne Truong, Eric Undersander, Tsung-Yen Yang, Sergio Arnaud, Sehoon Ha, Dhruv Batra, Akshara Rai

RA-L 2023
ICRA 2024

Project Page Paper

Near-perfect mobile pick-and-place in diverse unseen real-world environments without obstacle maps or precise object locations.



OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav

Karmesh Yadav*, Arjun Majumdar*, Ram Ramrakhya, Naoki Yokoyama, Aleksei Baevski, Zsolt Kira, Oleksandr Makysmets, Dhruv Batra

Paper


ViNL: Visual Navigation and Locomotion Over Obstacles

Simar Kareer*, Naoki Yokoyama*, Dhruv Batra, Sehoon Ha, Joanne Truong

ICRA 2023
Best Paper Award at Learning for Agile Robotics Workshop at CoRL 2022

Project Page Paper

Learned vision-based locomotion and navigation policies with Learning By Cheating to enable quadruped robots to navigate unfamiliar cluttered environments by stepping over obstacles.



Rethinking Sim2Real: Lower Fidelity Simulation Leads to Higher Sim2Real Transfer in Navigation

Joanne Truong, Max Rudolph, Naoki Yokoyama, Sonia Chernova, Dhruv Batra, Akshara Rai

CoRL 2022

Project Page Paper


Benchmarking Augmentation Methods for Learning Robust Navigation Agents: The Winning Entry of the 2021 iGibson Challenge

Naoki Yokoyama, Qian Luo, Dhruv Batra, Sehoon Ha

IROS 2022
Embodied AI Workshop at CVPR 2022

Challenge Page Workshop Page Paper

Achieved 1st place in the 2021 iGibson Visual Navigation Challenge using data augmentation methods coupled with deep reinforcement learning (PPO).




Is Mapping Necessary for Realistic PointGoal Navigation?

Ruslan Partsey, Erik Wijmans, Naoki Yokoyama, Oles Dobosevych, Dhruv Batra, Oleksandr Maksymets

CVPR 2022

Project Page Paper Code

Can an autonomous agent navigate in a new environment without ever building an explicit map?


Success Weighted By Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation

Naoki Yokoyama, Sehoon Ha, Dhruv Batra

IROS 2021


Project Page Video Paper Code

Dynamics-aware training and evaluation for navigation. Demonstrated that trained agents better leveraged the dynamics of the robot to be faster than previous work, both within simulation and in the real world.



System Architecture for Autonomous Mobile Manipulation of Everyday Objects in Domestic Environments

Tarik Kelestemur, Naoki Yokoyama, Joanne Truong, Anas Abou Allaban, Taskin Padir

PETRA 2019


Video Paper Code


Robocup@Home 2019 in Sydney, Australia

Finished 1st place among US teams.


World Robot Competition 2018 in Tokyo, Japan

Competition with mobile manipulation and perception tasks, held in Odaiba's Tokyo Big Sight.




Robocup@Home 2018 in Montreal, Canada

Object Segmentation Person Description

Finished 4th internationally, 1st among USA. Completed various mobile manipulation and human-robot interaction tasks using deep learning and computer vision.