Amir Bar

Amir Bar

I am a Postdoctoral Researcher at Meta AI (FAIR), working with Yann LeCun. Prior to this, I was a Ph.D. candidate at Tel Aviv University and UC Berkeley, advised by Amir Globerson and Trevor Darrell.

My goal is to teach computers to perceive, reason, and act in the world from visual data, using little to no supervision.

I am on the 2024-2025 academic job market

News

Selected Publications
Before Image

Navigation World Models
Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, Yann LeCun
Technical Report, 2024
Project Page

Planning trajectories by simulating them using a video world model. Our conditional diffusion transformer (CDiT) generates videos that enable planning, counterfactual reasoning, and action.

Before Image After Image

Task Vectors are Cross-Modal
Grace Luo, Trevor Darrell, Amir Bar
Technical Report, 2024
Project Page | Code

Vision-Language Models (VLMs) answer questions by mapping inputs to shared cross-modal task representations, an important inductive bias that helps explain VLM's success.

EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann LeCun, Amir Globerson, Trevor Darrell
ECCV, 2024
Project Page | Data | Code

We present EgoPet, a new egocentric video dataset of animals.

Before Image After Image

Finding Visual Task Vectors
Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar
ECCV, 2024
Code

We discover Task Vectors, latent representations in a model's activation space that encode task information. We propose a method to find and use them to guide models for performing tasks.

Before Image After Image

Stochastic Positional Embeddings Improve Masked Image Modeling
Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann LeCun
ICML, 2024
Code

Modeling location uncertainties via stochastic positional embeddings (StoP) improves masked image modeling.

Before Image After Image

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang
TMLR, 2024
Project Page | Code/Data | Demo

IMProv performs multimodal in-context learning to solve computer vision tasks.

LVM

Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros
CVPR, 2024
Project Page | Code

Large Vision Model trained on 420B tokens; exhibits interesting in-context learning capabilities.

Before Image After Image

Visual Prompting via Image Inpainting
Amir Bar*, Yossi Gandelsman*, Trevor Darrell, Amir Globerson, Alexei A. Efros
NeurIPS, 2022
Project Page | Code/Data

Adapt a pre-trained visual model to novel downstream tasks without task-specific fine-tuning or any model modification.

SViT Image

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson
NeurIPS, 2022
Project Page | Code

Incorporating image-level scene structure during training improves video transformers.

ORViT Animation

Object-Region Video Transformers
Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
CVPR, 2022
Project Page | Code

Incorporating objects into transformer layers improves video transformers.

Before Image After Image

DETReg: Unsupervised Pretraining with Region Priors for Object Detection
Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
CVPR, 2022
Project Page | Code | Video | Demo

Pretraining transformers to localize potential objects improves object detection.

Action Graphs Animation

Compositional Video Synthesis with Action Graphs
Amir Bar*, Roei Herzig*, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson
ICML, 2021
Project Page | Code | Video

We introduce Action Graphs, a structure that can better capture the compositional and hierarchical nature of actions.

Before Image After Image

Learning Canonical Representations for Scene Graph to Image Generation
Roei Herzig*, Amir Bar*, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson
ECCV, 2020
Project Page | Code | Video

We present a model for Scene Graph to Image generation which is more robust to complex input scene graphs.

Gestures Animation

Learning Individual Styles of Conversational Gesture
Shiry Ginosar*, Amir Bar*, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik
CVPR, 2019
Project Page | Code | Data

We predict plausible gestures to go along with someone's speech.

Sample:

The man allowed that about health captain played that alleged to Marks live up in the club comes the handed up moved to a brief

Language Generation with Recurrent Generative Adversarial Networks without Pre-training
Ofir Press*, Amir Bar*, Ben Bogin*, Jonathan Berant, Lior Wolf
1st Workshop on Learning to Generate Natural Language at ICML, 2017
Code

We show that recurrent neural networks can be trained to generate text with GANs from scratch and vastly improve the quality of generated sequences compared to a convolutional baseline.

Medical Papers

Before Image After Image

3D Convolutional Sequence to Sequence Model for Vertebral Compression Fractures Identification in CT
David Chettrit, Tomer Meir, Hila Lebel, Mila Orlovsky, Ronen Gordon, Ayelet Akselrod-Ballin, Amir Bar
MICCAI, 2020
Press: 1, 2

We present a novel architecture used to detect vertebral compression fractures in Chest and Abdomen CT.

Nature Paper Image

Automated Opportunistic Osteoporotic Fracture Risk Assessment Using Computed Tomography Scans to Aid in FRAX Underutilization
Noa Dagan, Eldad Elnekave, Noam Barda, Orna Bregman-Amitai, Amir Bar, Mila Orlovsky, Eitan Bachmat, Ran D. Balicer
Nature Medicine, 2020
Press: 1, 2

We demonstrate it is feasible to automatically evaluate risk based on routine abdomen or chest computed tomography (CT) scans.

ICH Classification

Improved ICH Classification Using Task-Dependent Learning
Amir Bar, Michal Mauda-Havakuk, Yoni Turner, Michal Safadi, Eldad Elnekave
ISBI, 2019
Press

Intracranial hemorrhage (ICH) is among the most critical and time-sensitive findings to be detected on Head CT. We present a new architecture designed for optimal triaging of Head CTs, with the goal of decreasing the time from CT acquisition to accurate ICH detection.

Before Image After Image

Simulating Dual-Energy X-Ray Absorptiometry in CT Using Deep-Learning Segmentation Cascade
Arun Krishnaraj, Spencer Barrett, Orna Bregman-Amitai, Michael Cohen-Sfady, Amir Bar, David Chettrit, Mila Orlovsky, Eldad Elnekave
Journal of the American College of Radiology, 2019

Osteoporosis is an underdiagnosed condition despite effective screening modalities. This study describes a method to simulate lumbar DEXA scores from routine CT studies using a machine-learning algorithm.

Before Image After Image

PHT-bot: A Deep Learning-Based System for Automatic Risk Stratification of COPD Patients Based Upon Signs of Pulmonary Hypertension
David Chettrit, Orna Bregman Amitai, Itamar Tamir, Amir Bar, and Eldad Elnekave
SPIE, 2019

Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbidity and mortality worldwide. We applied a deep learning algorithm to detect those at risk.

Before Image After Image

Compression Fractures Detection on CT
Amir Bar, Lior Wolf, Orna Bregman Amitai, Eyal Toledano, Eldad Elnekave
SPIE, 2017
Press: 1, 2

The presence of a vertebral compression fracture is highly indicative of osteoporosis and represents the single most robust predictor for development of a second osteoporotic fracture. We present an automated method for detecting spine compression fractures in CT scans.

Patents

Undergraduate and MA Collaborators

If you are a student interested in collaborating on research projects or looking for advice, please reach out.