Vikas Dhiman

PhD Student
Vision and Perceptual Machines Lab (with Dr. Jason Corso)
University of Michigan, Ann Arbor



Paper on "A continuous occlusion model for Road Scene Understanding" accepted to CVPR 2016

A Continuous Occlusion Model for Road Scene Understanding (May 2014-)

Paper Supplementary Bibtex

We present a physically interpretable, continuous three-dimensional (3D) model for handling occlusions with applications to road scene understanding. We probabilistically assign each point in space to an object with a theoretical modeling of the reflection and transmission probabilities for the corresponding camera ray. Our modeling is unified in handling occlusions across a variety of scenarios, such as associating structure from motion (SFM) point tracks with potentially occluding objects or modeling object detection scores in applications such as 3D localization. For point track association, our model uniformly handles static and dynamic objects, which is an advantage over motion segmentation approaches traditionally used in multibody SFM. Detailed experiments on the KITTI raw dataset show the superiority of the proposed method over both state-of-the-art motion segmentation and a baseline that heuristically uses detection bounding boxes for resolving occlusions. We also demonstrate how our continuous occlusion model may be applied to the task of 3D localization in road scenes.

V. Dhiman and Q. Tran and J. J. Corso and M. Chandrakar A Continuous Occlusion Model for Road Scene Understanding

Pinhole camera workshop for middle school students


2015 Xplore/Friday - Computer Imaging Session 2

Modern MAP algorithms for occupancy grid mapping (May 2013-Sept 2013)

Paper Bibtex Code

Using the inverse sensor model has been popular in occupancy grid mapping. However, it is widely known that applying the inverse sensor model to mapping requires certain assumptions that are not necessarily true. Even the works that use forward sensor models have relied on methods like expectation maximization or Gibbs sampling which have been succeeded by more effective methods of maximum a posteriori (MAP) inference over graphical models. In this paper, we propose the use of modern MAP inference methods along with the forward sensor model. Our implementation and experimental results demonstrate that these modern inference methods deliver more accurate maps more efficiently than previously used methods.

V. Dhiman and A. Kundu and F. Dellaert and J. J. Corso Modern MAP inference methods for accurate and faster occupancy grid mapping on higher order factor graphs

Voxel Planes (Feb 2013-May 2013)

Paper Bibtex Presentation Code

Conversion of unorganized point clouds to surface reconstructions is increasingly required in the mobile robotics perception processing pipeline, particularly with the rapid adoption of RGB-D (color and depth) image sensors. Many contemporary methods stem from the work in the computer graphics community in order to handle the point clouds generated by tabletop scanners in a batch-like manner. The requirements for mobile robotics are different and include support for real-time processing, incremental update, localization, mapping, path planning, obstacle avoidance, ray-tracing, terrain traversability assessment, grasping/manipulation and visualization for effective human-robot interaction.

We carry out a quantitative comparison of Greedy Projection and Marching cubes along with our voxel planes method. The execution speed, error, compression and visualization appearance of these are assessed. Our voxel planes approach first computes the PCA over the points inside a voxel, combining these PCA results across 2x2x2 voxel neighborhoods in a sliding window. Second, the smallest eigenvector and voxel centroid define a plane which is intersected with the voxel to reconstruct the surface patch (3-6 sided convex polygon) within that voxel. By nature of their construction these surface patches tessellate to produce a surface representation of the underlying points.

In experiments on public datasets the voxel planes method is 3 times faster than marching cubes, offers 300 times better compression than Greedy Projection, 10 fold lower error than marching cubes whilst allowing incremental map updates.

J. Ryde, V. Dhiman, and R. Platt. Voxel planes: Rapid visualization and meshification of point cloud ensembles. (IROS), November 2013

Kinfu based localization for Augmented Reality (March 2013)

Used Kinect Fusion based camera localization from PCL library to render augmented reality (using VTK) that enables user to add markers to describe arbitrary objects in 3D.

Head tracking using RGBD at Hackathon (April 2013)


We used OpenCV face detector, KLT feature tracker, VTK visualizer to cook up a head tracker at UB Hackathon 2013 within 24 hours.

Mutual Localization (June - Sept 2012)

Paper Bibtex Presentation Code

Concurrently estimating the 6-DOF pose of multiple cameras or robots---cooperative localization---is a core problem in contemporary robotics. Current works focus on a set of mutually observable world landmarks and often require inbuilt egomotion estimates; situations in which both assumptions are violated often arise, for example, robots with erroneous low quality odometry and IMU exploring an unknown environment. In contrast to these existing works in cooperative localization, we propose a cooperative localization method, which we call \textit{mutual localization}, that uses reciprocal observations of camera-fiducials to obviate the need for egomotion estimates and mutually observable world landmarks. We formulate and solve an algebraic formulation for the pose of the two camera mutual localization setup under these assumptions. Our experiments demonstrate the capabilities of our proposal egomotion-free cooperative localization method: for example, the method achieves 2cm range and 0.7 degree accuracy at 2m sensing for 6-DOF pose. To demonstrate the applicability of the proposed work, we deploy our method on Turtlebots and we compare our results with ARToolKit and Bundler, over which our method achieves a 10 fold improvement in translation estimation accuracy.

V. Dhiman, J. Ryde, and J. J. Corso. Mutual localization: Two camera relative 6-dof pose estimation from reciprocal fiducial observation. IROS, November 2013.


Multi-Resolution Occupied Voxel Lists (Jan - Jun 2012)

Added a few features like ROS support, support for color voxels to MROL library by Dr. Julian Ryde.

Work Experience (2008-2011)

At DE Shaw, I have worked on various small automation projects like Web robots, web scrapers, feeds parsers etc. Most of the these applications were written in Perl and a few in Python.

Apart from this I have also worked on a simple web application using J2EE. I used Hibernate, Struts 2 and Spring for the web application.

Undergrad Projects (2004-2008)

Biped Robot (July 2007-May 2008)

Our aim for final year project was to design and fabricate a 12 DOF freedom biped that could be used for further development and testing on walking robots. However, due to unforeseen financial constraints we decided to fallback to the fabrication a 4 DOF biped.

I was responsible in the mechanical design and simulation of the robot.

More ...

Spherical Stepper Motor (June-August 2008)

A motor that could rotate around any axis of rotation passing through its center.

The aim of this project was to fabricate a spherical stepper motor using strategic positioning of electromagnets on the rotor and permanent magnets on the stator.

The strategy of pole placement was inspired by a paper titled "Kinematic Design and Commutation of a Spherical Stepper Motor" from Gregory S. Chirikjian and David Stein.

More ...

Refreshable Braille Display (July-December 2005)

An 8 character electronic BRAILLE display, that could be controlled using a PC. This project was an application of the knowledge that gained while fabrication of "Moving message display" in the first year of my B.Tech.

The idea was to replace LED's of the "Moving message display" with electromechanical tangible "points" that could be turned on and off to create braille characters. Each character had 6 points and the unit was designed to have 6 characters. The complete unit was controlled through a parallel port of a PC.

Grid Crossing Robot (December 2005)

A machine for robotic competition called GRIP in TECHFEST'06, annual technical festival of IIT Bombay.

The task provided in the competition was to make a wired/wireless remote controlled machine, which can traverse a grid 1250 mm high, collect a cylindrical object located at a specific point below the grid, and then place the cylinder in the hollow cylindrical container provided.

More ...

Moving message display (February 2005)

A PC controlled moving message display. This project was my first fabrication of an electronic circuit. The circuit design and construction was directly taken from an Electronics for you construction article in December 2004 issue titled Moving Message Over Dot-Matrix Display .


You can find me on the following websites:

  1. Picasa
  2. Facebook
  3. Blogger