$ % Vector \newcommand{\vect}[1]{\mathbf{#1}} \newcommand{\hvect}[1]{\bar{\vect{#1}}} % unit vector \newcommand{\uvect}[1]{\hat{\vect{#1}}} % Field \newcommand{\field}[1]{\mathbb{#1}} % Trace \def\act{a} \def\actt{\act_t} \newcommand\acttT{\vect{\act}_{t:\epiT}} \newcommand{\epiT}{T} \newcommand{\spawnP}{P_{\text{spawn}}} % Map \def\mapsym{x} \def\map{\vect{\mapsym}} \newcommand{\est}[1]{\hat{#1}} \newcommand{\mapest}{\est{\vect{x}}} % Measurement or observation \newcommand{\meas}{z} \newcommand{\measurements}{\vect{\meas}_{1:t}} \newcommand{\meast}[1][t]{\meas_{#1}} % Pose \newcommand{\pose}{g} \newcommand{\poset}[1][t]{\pose_{#1}} \newcommand{\posetn}{\pose_{t+1}} \newcommand{\posetp}{\pose_{t-1}} \newcommand{\poses}{\vect{\pose}_{1:t}} \newcommand{\posest}{\est{\pose}} \newcommand{\posestt}{\est{\pose}_t} % Pose goal \newcommand{\posegoal}{d} \newcommand{\unaryminus}{\scalebox{0.5}[0.5]{\( - $}} \newcommand{\prevtime}{1:t\unaryminus1} \newcommand{\pastposes}{\vect{\pose}_{\prevtime}} \newcommand{\pastmeas}{\vect{\meas}_{\prevtime}} \newcommand{\pastobs}{\pastmeas, \pastposes} \newcommand{\mutlocest}{f_{\text{mutloc}}} \newcommand{\epiT}{T} \newcommand{\spawnP}{P_{\text{spawn}}} \newcommand{\param}{\theta} \newcommand{\policy}{\chi} % CVPR16 %\DeclareMathSymbol{\Tangent} \DeclareMathOperator{\diag}{diag} \DeclareMathOperator{\sech}{sech} \DeclareMathOperator{\poly}{poly} %\DeclareMathOperator*{\argmin}{\arg\min} \newcommand{\ptD}{\posDp} % FIXME: conflicts with \map, this should be \posDp \newcommand{\Obj}{C} % In mutloc C_p and C_q used for robots \newcommand{\ObjMu}{\vect{m}} % FIXME: in mutloc used as 3D pos of markers in frame 1, this should \pose \newcommand{\ObjR}{\bm{\omega}} \newcommand{\ObjSigma}{\bm{\Sigma}} \newcommand{\Logst}{\mathcal{L}} \newcommand{\OccF}{f_{\text{occ}}} \newcommand{\gOccF}{\nabla \OccF} \newcommand{\gOccFi}{\nabla \OccF^i} \newcommand{\impt}{\imgp{\ptD}} % Mahalonobis distance between to points \newcommand{\mahaF}{\zeta} % FIXME: use some greek letter \newcommand{\LogstIm}{\Logst_u} \newcommand{\tImMu}{\bm{\mu}} \newcommand{\tImS}{\bm{\Gamma}} \newcommand{\LogstD}{\Logst_{\dist}} \newcommand{\tDMu}{\nu} \newcommand{\meandepth}[1]{\tDMu_{#1}} \newcommand{\BB}{\mathbf{B}} \newcommand{\imBB}{\imgp{\BB}} \newcommand{\camO}{\mathbf{p}^c} \newcommand{\camR}{\bm{\omega}^c} \newcommand{\bepsilon}{\bm{\epsilon}} % 3D pose of the cars and ego motion \newcommand{\pos}[2]{\ObjMu^{#1}({#2})} \newcommand{\ori}[2]{\ObjR^{#1}(#2)} \newcommand{\state}[2]{\mathbf{s}^{#1}(#2)} % ego pose \newcommand{\egop}[1][t]{\pos{c}{#1}} \newcommand{\egoo}[1][t]{\ori{c}{#1}} \newcommand{\egos}[1][t]{\state{c}{#1}} % relative pose between camera and car $i$ \newcommand{\posealt}{\pose} % FIXME: Consider unifying with \pose \newcommand{\relp}[2]{\posealt^{#1}(#2)} \newcommand{\relpz}[2]{\posealt_z^{#1}(#2)} % 3D tracks on car $i$ in its own coordinate frame \newcommand{\tracklets}{\mathbf{X}^{i}_o} \newcommand{\tracklet}[1]{\ptD^{i}_{#1}} % track projections on camera \newcommand{\trackpit}[2]{\impt_{#1}(#2)} \newcommand{\trackp}[1]{\impt_j(#1)} \newcommand{\trackpj}[1]{\impt_j(#1)} % Unclassified track point projected on camera \newcommand{\ucTrackp}{\impt(t)} % dimensions of car $i$ \newcommand{\dimsn}[1]{\BB^{#1}} \newcommand{\expDimsn}{\hat{\BB}} % projection function \newcommand{\projectionOf}[1]{\proj_{\relp{i}{t}}\left(#1\right)} \newcommand{\projectionOft}[1]{\proj_{\relp{i}{t+1}}\left(#1\right)} \newcommand{\centerProj}{\bar{\proj}_{\relp{i}{t}}(\dimsn{i})} \newcommand{\cornerProj}[1]{\proj^{#1}_{\relp{i}{t}}(\dimsn{i})} \newcommand{\triangleProj}[1]{\triangle^{#1}_{\relp{i}{t}}(\dimsn{i})} % bounding box corners on image \newcommand{\bbt}[2]{\imBB^{#1}({#2})} \newcommand{\bb}[1]{\imBB^{#1}(t)} \newcommand{\bbMu}{\bm{\mu}} % FIXME: conflicts with logistic mean of trans prob \newcommand{\bbS}{\bm{\Lambda}} \newcommand{\Energy}[1]{\mathcal{E}^{it}_{\text{#1}}} \newcommand{\pEnergy}[1]{\mathcal{E}^{ijt}_{\text{#1}}} % Weighted energy \newcommand{\WEnergy}[1]{\lambda_{\text{#1}}\Energy{#1}} \newcommand{\WpEnergy}[1]{\lambda_{\text{#1}}\pEnergy{#1}} \newcommand{\EnergyCol}{\mathcal{E}^{ijt}_{\text{col}}} \newcommand{\WEnergyCol}{\lambda_{\text{col}}\EnergyCol} \newcommand{\EnergyBBoxNoOcc}{\Energy{detectNoOcc}} \newcommand{\EnergyBBox}{\Energy{detect}} \newcommand{\EnergyTrack}{ \pEnergy{track}} \newcommand{\EnergyTrackNoOcc}{\pEnergy{trackNoOcc}} \newcommand{\EnergyLane}{\Energy{lane}} \newcommand{\EnergySize}{\Energy{size}} \newcommand{\EnergyDyn}{\Energy{dyn}} \newcommand{\EnergyDynHol}{\Energy{dyn-hol}} \newcommand{\EnergyDynOri}{\Energy{dyn-ori}} \newcommand{\EnergyDynVel}{\Energy{dyn-vel}} \newcommand{\occFreeProj}[1]{\Pi_{\relp{i}{t}}(#1)} \newcommand{\minx}{x_{\text{min}}} \newcommand{\miny}{y_{\text{min}}} \newcommand{\maxx}{x_{\text{max}}} \newcommand{\maxy}{y_{\text{max}}} \newcommand{\frontface}{F^i_\text{FF}(t)} \newcommand{\occ}[1]{o({#1})} \newcommand{\face}{F^i_k(t)} \newcommand{\invProjectionOf}[1]{\pi^{-1}_{\relp{i}{t}}\left(#1\right)} \newcommand{\invProjectionOftm}[1]{\pi^{-1}_{\relp{i}{t-1}}\left(#1\right)} \newcommand{\occf}{\OccF^i(\ptD_j)} \newcommand{\occftot}{\OccF(\ptD_j)} \newcommand{\occft}[1]{\OccF(#1)} \newcommand{\ray}{\uvect{\ptD}_j} \newcommand{\occfray}{f_{occ}(\lambda\ray)} \newcommand{\lambdadist}{f_{\lambda}(\trackpj{t-1}, \lambda)} \newcommand{\occfxi}{L(\mathbf{x}; \pos{i}{t-1}, \Sigma_i)} \newcommand{\occfi}{L(\lambda \ray; \pos{i}{t-1}, \Sigma_i)} \newcommand{\PreflR}{P^{ij}_{\text{reflection}}} \newcommand{\Pobs}{P^{ij}_{\text{observation}}} \newcommand{\PtransR}{P^{j}_{\text{transmission}}} \newcommand{\PassocR}{P^{ij}_{\text{assoc}}} \newcommand{\assoc}{a^{ij}} \newcommand{\assocP}{\assoc(\lambda)} \newcommand{\assocPk}{\assoc(\lambda_k)} \newcommand{\Ereproj}{E^{ij}_{\text{reproj}}} \newcommand{\Ptransarg}[1]{\PtransR(#1)} \newcommand{\Ptrans}{\Ptransarg{\lambda}} \newcommand{\Ptransmud}{\Ptransarg{\mu^{i}_d}} \newcommand{\Prefl}{\PreflR(\lambda)} \newcommand{\Zrefl}{Z_{\text{refl}}} \newcommand{\dishort}{d_i(\mathbf{x})} \newcommand{\Lu}{\LogstIm(\impt, \tImMu^i,\tImS^i)} \newcommand{\Llambda}{\LogstD(\impt, \dist; \tDMu^i)} \newcommand{\Gauss}{\mathcal{N}} \newcommand{\PropDist}{\mathcal{W}_j} \newcommand{\muijl}{\mu^{ij}_{\dist}} \newcommand{\sigmaijl}{\sigma^{ij}_{\dist}} \newcommand{\Sigmait}{\bSigma^{i^{-1}}_o} \newcommand{\muit}{\bmu^{i}_o} \newcommand{\Sigmaic}{\bSigma'^{i^{-1}}_c} \newcommand{\muic}{{\bmu^{i}_c}} \newcommand{\Sigmaicf}{\bSigma^{i^{-1}}_c} \newcommand{\Sigmaicfinv}{\Sigma^{i}_c} \newcommand{\muiu}{\mu^{i}_t} \newcommand{\Sigmaiu}{\Sigma^{i^{-1}}_u} \newcommand{\uv}[1]{\hat{\mathbf{#1}}} \newcommand{\Trr}[3]{{}^{#1}{#2}_{#3}} \newcommand{\xymin}[1]{#1_{\text{min}}} \newcommand{\xymax}[1]{#1_{\text{max}}} \newcommand{\xt}{\mathbf{x}_t} \newcommand{\xc}{\mathbf{x}_c} \newcommand{\Rctot}{\bR} \newcommand{\tctot}{\bt} \newcommand{\tcmut}{\bt'} \newcommand{\Beizer}{B\'eizer } \newcommand{\LaneUncertainty}[1]{\Sigma_{L_m}(#1)} \newcommand{\projOnLane}[1]{\pi_{L_m(k)}(#1)} \newlength{\eqs} \setlength{\eqs}{-0.04in} \setitemize[0]{leftmargin=15pt} \newenvironment{tight_itemize}{ \begin{itemize}[leftmargin=10pt] \setlength{\topsep}{0pt} \setlength{\itemsep}{0pt} \setlength{\parskip}{0pt} \setlength{\parsep}{0pt} }{\end{itemize}} % DRL \def\state{s} \newcommand{\statet}[1][t]{\state_{#1}} \def\statetp{\state_{t-1}} \def\statehist{\state_{1:t-1}} \def\statetn{\state_{t+1}} \def\obs{\meas} \def\obst{\obs_t} \def\act{a} \newcommand{\actt}[1][t]{\act_{#1}} \def\acttp{\act_{t-1}} \def\acttn{\act_{t+1}} \def\Obs{\mathcal{O}} \def\ObsEnc{E_o} \def\ObsProb{P_o} \def\ObsFunc{C} \def\ObsFuncFull{\ObsFunc(\statet, \actt) \rightarrow \obst} \def\ObsFuncInv{\ObsFunc^{-1}} \def\ObsFuncInvFull{\ObsFuncInv(\obst, \statetp, \actt) \rightarrow \statet} \def\State{\mathcal{S}} \def\Action{\mathcal{A}} \def\TransP{P_{T}} \def\Trans{T} \def\TransFull{\Trans(\statet, \actt) \rightarrow \statetn} \def\TransObs{T_c} \def\Rew{R} \def\rew{r} \def\rewards{\vect{r}_{1:t}} \def\rewt{\rew_t} \def\rewtp{\rew_{t-1}} \def\rewtn{\rew_{t+1}} \def\RewFull{\Rew(\statet, \actt) \rightarrow \rewtn} \def\TransObsFull{\TransObs(\statet, \obst, \actt, \rewt; \theta_T) \rightarrow \statetn} \def\Value{V} \def\pit{\pi_t} \def\piDef{\pi(\acttn|\statet, \obst, \actt, \rewt; \theta_\pi) \rightarrow \pit(\acttn ; \theta_\pi)} \def\Valuet{\Value_t} \def\ValueDef{\Value(\statet, \obst, \actt, \rewt; \theta_\Value) \rightarrow \Valuet(\theta_\Value)} \def\R{\mathbb{R}} \def\E{\mathbb{E}} \newcommand{\LatencyOneGtOne}{Latency $1:>1$} \newcommand{\NavAiiiCDiDiiL}{NavA3C+D\textsubscript{1}D\textsubscript{2}L} \newcounter{Benchmark} \newcounter{BenchmarkB}[Benchmark] \newcommand{\ditem}[1]{\refstepcounter{Benchmark}\item[\arabic{Benchmark}. {#1}]} \newcommand{\dditem}[1]{\refstepcounter{BenchmarkB}\item[\arabic{Benchmark}.\alph{BenchmarkB}. {#1}]} \newcommand{\DistanceInefficiency}{Distance-inefficiency} % Observation probability % forward sensor model % Differentiable Floyd Warshall algorithm \newcommand{\graph}{G} \newcommand{\vtces}{V} \newcommand{\edges}{E} %\newcommand{\policy}{\Pi} \newcommand{\wts}{W} \newcommand{\distM}{D} \newcommand{\nil}{\emptyset} \newcommand{\qValue}{Q} \newcommand{\Q}{\qValue} \newcommand{\goal}{g} \newcommand{\Goal}{\mathcal{G}} \newcommand{\discount}{\gamma} \newcommand{\fwcost}{Q} \newcommand{\fw}{\fwcost} \newcommand{\fwargs}[5]{\fw_{#4}^{#5}\left({#1}, {#2}, {#3}\right)} \newcommand{\minedgecost}{\fwcost_0} \newcommand{\Rgoal}{R_{\text{goal}}} \newcommand{\Loo}{Latency-1:\textgreater1} \newcommand{\Loss}{\mathcal{L}} \newcommand{\LossText}[1]{\Loss_{\text{#1}}} \newcommand{\LossDDPG}{\LossText{ddpg}} \newcommand{\LossStep}{\LossText{step}} \newcommand{\LossLo}{\LossText{lo}} \newcommand{\LossUp}{\LossText{up}} \newcommand{\LossTrieq}{\LossText{trieq}} \newcommand{\tgt}{\text{tgt}} \newcommand{\Qstar}{\Q_{*}} \newcommand{\Qtgt}{\Q_{\text{tgt}}} \newcommand{\ytgt}{y_t} \newcommand{\bpmsg}[4]{\mu^{#4}_{#1\rightarrow#2}(#3)} \newcommand{\msg}[3]{\mu_{#1#2}(#3)} \newcommand{\discount}{\gamma} \newcommand{\problemobjective}{\begin{align} \vect{\act}_{1:\epiT} &= \arg \max_{\vect{\act}_{1:\epiT}} \E \left[\left. \discount^{k}\mathbb{1}_{\|\posegoal - \pose_k\|_2 \le \epsilon} \right| \vect{\meas}_{1:\epiT} \right] \end{align}} \)

Optimizing mapping, localization and planning algorithms

Navigation with autonomous cars

Google trends for 'Autonomous cars'

Potential benefits of self-driving cars

For elderly and disabled

Roads will become safer

More energy efficiency

Less parking lots

Navigation needs to be fast for safer cars

Reaction time in driving is a matter of life and death
Perception-reaction time (PRT) is positive correlated to crashes
"PRT has a cause-effect relationship in deceleration rate and crash severity."

"PRT has a direct effect on (crash) event and severity."

Outline

Mapping Localization Goal conditioned RL

- Navigation is the problem of converting sequence of observation to a sequence of actions for the purpose of going from one place to another. - It is often addressed in three parts. - Mapping---which is the estimation of the static part of the environment. - Localization---which is the estimation of the dynamic state of the environment like the agents location in the map. - and Planning which is the estimation of the sequence of action that moves the agent from current state to the desired goal. + Most of my work has been focused around mapping with some work around localization and planning. + Today I am going to talk about three of my works. + First I am going to talk about my work on making mapping faster by using modern inference methods on factor graphs. + Then I am going to talk about mutual localiztion that intern enables faster mapping by allowing robots to divide and conquer the environment. + In the end I will talk about making goal conditioned reinforcement learning faster by removing redundant computation.

Mapping
            
       Mapping 
        Localization 
       Goal conditioned RL
    
      - First let's talk about mapping using modern inference algorithms in
        factor graphs.

Occupancy grid mapping

Estimate map: \[ \mathbf{x}^* = \arg \max_{\mathbf{x} \in \Omega} p(\mathbf{x} | \mathbf{z}_{1:T}, \mathbf{y}_{1:T}) \]

$ O(|\Omega|) = O(2^{\dim(\mathbf{x})}) $

- Occupacny grid is a type of mapping where the space is divided into grid cells. - Each grid cell can either be occupied or free. - On the left you can ses a simulation of robot with a laser scanner estimating an occupancy grid map. - Assume a robot or a self-driving car with a laser scanner is moving through the environment while observing it. - Let y_1 be the pose of the laser scanner and z_1 be the range measurement from the laser scanner. - The laser scan will pass through a subset of cells x_1 of the map. - Simularly the robot collects more observations over time 1 to T - Our objective is to estimate the map of the environment using observations and locations of the robot over time - Note the maximization is over the map space which in case of occupacny grid maps is exponential in the size of the map. - A map-space is 100x100 will have 10^3000 possible configurations. - To make the problem tractable we factorize the probability distributions. That brings to the tool of our choice: Factor graphs.

Factor graphs represent factorizations

Example: \[ P(x_1, x_2, x_3, x_4, x_5) = f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5) \]
Factorization: \[ P(\mathbf{x}) = \prod_{f \in F} P_f(\mathbf{x}^f) \]
Factor graph $G = (V, F, E)$
- $V = $ variable nodes. Example $x_1, x_2, \dots, x_5 $
- $F = $ factor nodes. Example $f_A, f_B, \dots, f_E $
- $E = \{(i, f) : i \in V, f \in F, \text{ if }P_f(.) \text{ depends upon }x_i\}$

- Let us understand Factor graphs by an example. - A probability distribution can be factorized into factors that depend upon only small subset of the random variables. - This factorization can be represented by a factor graph as shown. - Each factor is forms a factor node shown in square. - It is connected to only the random variables on which that factor depends upon. - In general if the probability on random vector x is factorized into f factors, then it can be represented as a graph G - The only edges are between variable nodes and factor nodes and a factor is connected to a variable node if and only if the factor depends up on the variable. - The factorizations of the probability depend upon modeling assumption and can differ widely.

Mapping using Inverse sensor models

Independent cell assumption: \[ p(\mathbf{x} | \mathbf{z}, \mathbf{y}) = \prod_{i = 1}^{N} p(x_i | \mathbf{z}, \mathbf{y}) \]
Mapping: \[ \mathbf{x}^* = \arg \max_{\mathbf{x}} \eta \prod_{i=1}^N \prod_{f=1}^{t} p( x_i | z_f, y_f) \]

Mapping using Forward sensor models

Mapping: \[ \mathbf{x}^* = \arg \max_{\mathbf{x}} \eta \prod_{f=1}^t p(\mathbf{x}) p( z_f | \mathbf{x}_f, y_f) \] where $ p( z_f | \mathbf{x}_f, y_f) $ is the Forward sensor model.

Inverse sensor models

$ p(\mapsym_i|\meas_f, y_f) $ is the inverse sensor model

Drawbacks

Cell Independence assumption is limiting
Inverse sensor models are often inaccurate

Advantages

Fast

Forward sensor models

$ p(\meas_f | \map, y_f) $ is the forward sensor model

Advantages

No independent cell assumption.
Forward sensor models are easy to compute

Drawbacks

Slow

Modern inference methods

Metropolis Hastings for mapping
Belief propagation
Dual decomposition

Metropolis Hastings for mapping

Current sample $\map^r$
Potential sample $\map'$
Acceptance ratio \begin{align} a = \frac{P(\map')Q(\map^r| \map')} {P(\map^r)Q(\map'| \map^r)} \end{align}
\begin{align} Q(\map'| \map^r) = \begin{cases} \frac{1}{N} & \text{if $\|\map' - \map^r\|_1 = 1$}\\ 0 & \text{ otherwise} \end{cases} \end{align}

Belief propagation intuition

Example: \[ P(x_1, x_2, x_3, x_4, x_5) = f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5) \]
Want: \[ P(x_3) = \sum_{\mathbf{x} \setminus x_3} P(x_1, x_2, x_3, x_4, x_5) \]
\[ \newcommand{\bpop}[1]{\sum_{#1}} P(x_3) = \bpop{x_1, x_2, x_4, x_5} f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5) \]
\[ P(x_3) = \underbrace{ \bpop{x_1,x_2} \left( f_A(x_1) f_B(x_2) f_C(x_1, x_2, x_3) \right) }_{\bpmsg{f_C}{x_3}{x_3}{}} \underbrace{ \bpop{x_4}f_D(x_3, x_4) }_{\bpmsg{f_D}{x_3}{x_3}{}} \underbrace{ \bpop{x_5}f_E(x_3, x_5) }_{\bpmsg{f_E}{x_3}{x_3}{}} \]

Belief propagation

\[ \begin{align} \bpmsg{f}{i}{x_i = l_i}{r+1} &= \sum_{\map_f \in \Omega_f: x_i = l_i}P_f(\map_f)\prod_{j \in n(f) \setminus i}\bpmsg{j}{f}{x_j}{r} \\ \bpmsg{i}{f}{x_i = l_i}{r+1} &= \prod_{h \in n(i) \setminus f}\bpmsg{h}{i}{l_i}{r} \end{align} \]

Dual decomposition

Example:
\[ P(x_1, x_2, x_3, x_4, x_5) = f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5) \]
\[ x_3^{C*}, \dots = \arg~\max f_C(\dots) \qquad x_3^{D*}, \dots = \arg~\max f_D(\dots) \qquad \dots \]
Solve for all $f \in F $:
$ \map^f \leftarrow \arg~\min\limits_{\map^f} \left( \theta_f(\map^f) \right) $ where $\theta_f(\map^f) = -\log P_f(\map^f)$

Dual decomposition contd.

Solve for all $f \in F $:
$ \map^f \leftarrow \arg~\min\limits_{\map^f} \left( \theta_f(\map^f) \right) $ where $\theta_f(\map^f) = -\log P_f(\map^f)$
For disagreeing nodes $i \in V $: \[ \msg{i}{f}{x^f_i} \leftarrow \msg{i}{f}{x^f_i} + \frac{\alpha}{r} \] where $ r $ is the iteration count
and $\alpha$ is the step size.
\[\map^f \leftarrow \arg~\min\limits_{\map^f} \left( \theta_f(\map^f) + \sum\limits_{i \in n(f)}\msg{i}{f}{x^f_i} \right)\] where $\theta_f(\map^f) = -\log P_f(\map^f)$

Modern inference methods

Metropolis Hastings for mapping
Belief propagation

\[ \begin{align} \bpmsg{f}{i}{l_i}{r+1} &= \sum_{\map_f \in \Omega_f: x_i = l_i}P_f(\map_f)\prod_{j \in n(f) \setminus i}\bpmsg{j}{f}{x_j}{r} \label{eq:factor2node} \\ \bpmsg{i}{f}{l_i}{r+1} &= \prod_{h \in n(i) \setminus f}\bpmsg{h}{i}{l_i}{r} \label{eq:node2factor} \end{align} \]
Dual decomposition \[\begin{align} \map^f &\leftarrow \arg \min\limits_{\map^f} \left( \theta_f(\map^f) + \sum\limits_{i \in n(f)}\msg{i}{f}{x^f_i} \right) \\ \msg{i}{f}{x^f_i} &\leftarrow \msg{i}{f}{x^{f'}_i} + \frac{\alpha}{r} & \text{for disagreeing nodes $i$} \end{align}\]

Forward sensor models are still expensive

Belief propagation \begin{align} \bpmsg{f}{i}{x_i = l_i}{r+1} &= \sum_{{\color{red}\map_f \in \Omega_f}: x_i = l_i}P_f(\map_f)\prod_{j \in n(f) \setminus i}\bpmsg{j}{f}{x_j}{r} \end{align}
Complexity = $O(|\Omega_f|) = O(2^{n(f)}) $

Piecewise constant sensor model

\begin{align} P(z_f|\map_f, y_f) &= \frac{1}{Z}\begin{cases} 1 & \text{ if } \map_f = [0, \dots 0, 1, \dots, *]^\top\\ \exp(-900) & \text{ if } \map_f = [0, \dots 0, 0, \dots, *]^\top\\ \exp(-1000) & \text{ otherwise} \end{cases} \end{align}
Generalization:
\begin{align} P^{PCSM}_f(\map_f) = \begin{cases} \psi_{1} & \text{ if $\map_f$ matches pattern $\vect{R}_1$} \\ \vdots& \\ \psi_{M} & \text{ if $\map_f$ matches pattern $\vect{R}_M$} \\ \psi_{\text{min}} & \text{ otherwise} \end{cases} \end{align}

Efficient belief propagation for PCSM

\begin{align} P^{PCSM}_f(\map_f) = \begin{cases} \psi_{1} & \text{ if $\map_f$ matches pattern $\vect{R}_1$} \\ \vdots& \\ \psi_{M} & \text{ if $\map_f$ matches pattern $\vect{R}_M$} \\ \psi_{\text{min}} & \text{ otherwise} \end{cases} \end{align}
\begin{align} p(\vect{x_f} \sim \vect{R}_m| x_i = l_i) &= \begin{cases} \prod\limits_{j \in n(f) \setminus i, \vect{R}_m[j] \ne *}\bpmsg{j}{f}{x_j = \vect{R}_m[j]}{r} & \text{ if $x_i = l_i \in \vect{R}_m$ } \\ 0 & \text{ otherwise } \end{cases} \end{align}
\begin{align} \bpmsg{f}{i}{x_i = l_i}{r+1} = \sum_{m = 1}^M \psi_m p(\vect{x_f} \sim \vect{R}_m, x_i = l_i) + \psi_{\text{min}} p_{\text{otherwise}} \end{align} where $p_{\text{otherwise}} = 1 - \sum_{m = 1}^M p(\vect{x_f} \sim \vect{R}_m, x_i = l_i)$.

Occupancy grid mapping using Factor graphs

Results

Cave

Hospital Section

Hospital

Albert.B

Conclusion

Mapping Localization Goal conditioned RL

Communicating cars with mutual obsevation

Mutual localization
      
     Mapping 
      Localization 
     Goal conditioned RL

Mutual localization

Find $R,t$ such that \begin{align} q_i &= Rp_i + t \\ \text{where }&\\ q_i \in&\, \mathbb{R}^3, p_i \in \mathbb{R}^3, \\ R \in& \text{ SO(3), } t \in \mathbb{R}^3 \end{align}

Realtime mutual localization

Mutual localization conclusion

Mapping Localization Goal conditioned RL

Goal conditioned RL
      
     Mapping 
      Localization 
     Goal conditioned RL

Goal conditioned reinforcement Learning
      
      Goal is "specified"
    
      State is fully specified (no partial observation)

Manipulation is navigation in configuration space

Conventional RL vs Goal conditioned RL

Q-Learning

Define \[ \Q^*(s_t, a_t) = \max_{\policy} \E\left[ \sum_{k=t}^\infty \discount^{k-t} R(s_t, a_t) \right] \]
Bellman equation \[ \Q^*(s_t, a_t) = R(s_t, a_t) + \gamma \max_{a} Q^*(s_{t+1}, a)\]
Temporal Difference Update for tabular Q-functions \[ \Q^*(s_t, a_t) \leftarrow \alpha \Q^*(s_t, a_t) + (1-\alpha)( R(s_t, a_t) + \gamma \max_{a} Q^*(s_{t+1}, a) ) \]

Deep Q Learning

\begin{align} R(s_t, a_t) + \max_a \discount \Q(s_{t+1}, a) = \Q(s_t, a_t) \end{align}
Naive Loss:
\[ \Loss = \left( R(s_t, a_t) + \max_a \discount \Q(s_{t+1}, a) - \Q(s_t, a) \right)^2 \]

Problem:

Leads to unstable updates

Solution Strategy:

Target network $\Qtgt(\dots) = \Q(\dots; \Theta_{\text{tgt}}) $
Main network $\Q_m(\dots) = \Q(\dots; \Theta_{m}) $
Update target weights: $ \Theta_{\text{tgt}} \leftarrow \alpha \Theta_{\text{tgt}} + (1-\alpha) \Theta_{m} $ where $\alpha \approx 0.9$
$\nabla_{\Theta_m} \Qtgt = 0 $

Solution:

\[ \Loss_{\text{DQN}} = \begin{cases} \left( R(s_t, a_t) - {\color{red}\Q_{m}}(s_t, a_t) \right)^2 & \text{ if $s_{t+1}$ is terminal } \\ \left( R(s_t, a_t) + \max_a \discount {\color{red}\Qtgt}(s_{t+1}, a) - {\color{red}\Q_m}(s_t, a_t) \right)^2 & \text{ otherwise } \end{cases} \]

Floyd-Warshall

\[ \Q^P(s_t, g^*) := \max_{\policy} \E\left[ \sum_{k = t}^{\color{blue}g_{k+1} = g^*} R(s_k, s_{k+1}) \right] \]

Floyd-Warshall RL

\[ \newcommand{\redat}{{\color{red}a_t}} \newcommand{\reda}{{\color{red}a}} \newcommand{\redmaxa}{{\color{red}\max_a}} \Q^P(s_t, \redat, g^*) := \max_{\policy} \E\left[ \sum_{k = t}^{g_{k+1} = g^*} R(s_k, \redat) \right] \]

\[ \Q^P(s_t, g_w) + \Q^P(s_w, g^*) \le \Q^P(s_t, g^*) \]
\[ \Q^P(s_t, g_{t+1}) = R(s_t, {\color{red}s_{t+1}}) \]

\[ \Q^P(s_t, \redat, g_w) + \redmaxa \Q^P(s_w, \reda, g^*) \le \Q^P(s_t, \redat, g^*) \]
\[ \Q^P(s_t, \redat, g_{t+1}) = R(s_t, \redat) \]

Deep Floyd-Warshall RL

\begin{align} \Q^P(s_t, a_t, g_w) + \max_a \Q^P(s_w, a, g^*) \le \Q^P(s_t, a_t, g^*) \end{align}
\begin{align} R(s_t, a_t) + \max_a \discount \Q(s_{t+1}, a) = \Q(s_t, a_t) \end{align} \[ \Loss_{\text{DQN}} = \left( R(s_t, a_t) + \max_a \discount {\color{red}\Qtgt}(s_{t+1}, a) - {\color{red}\Q_m}(s_t, a_t) \right)^2 \]
\[ \LossLo = \left( \Qtgt^P(s_t, a_t, g_w) + \max_a \Qtgt^P(s_w, a, g^*) - {\color{red}\Q^P_{m}}(s_t, a_t, g^*) \right)^2_+ \]
\[ \LossUp = \left( {\color{red}\Q^P_{m}}(s_t, a_t, g_w) + \max_a \Qtgt^P(s_w, a, g^*) - \Qtgt^P(s_t, a_t, g^*) \right)^2_+ \]
\[ \Q^P(s_t, \redat, g_{t+1}) = R(s_t, \redat) \]
\[ \LossStep = \left( R(s_t, a_t) - {\color{red}\Q^P_{m}}(s_t, a_t, g_{t+1}) \right)^2 \]

Hindsight Experience Replay (HER)

What does it mean to re-imagine a Virtual Goal

The desired goal was $ g^* $, but the achieved goal was $ g_f $
The loss function used is
\[ \newcommand{\cred}[1]{{\color{red}{#1}}} \Loss = \left( R(s_t, a_t, \cred{g_f}) + \max_a \Qtgt(s_t, a_t, \cred{g_f}) - \Q_m(s_t, a_t, \cred{g_f}) \right)^2 \]
instead of
\[ \Loss = \left( R(s_t, a_t, \cred{g^*}) + \max_a \Qtgt(s_t, a_t, \cred{g^*}) - \Q_m(s_t, a_t, \cred{g^*}) \right)^2 \]
This needs re-computing the reward function $ R(s_t, a_t, g^*) $

Ablation with FWRL inspired $\LossUp$ and $\LossLo$

\begin{align} \LossLo &= \left( \Qtgt(\dots) + \Qtgt(\dots) - \Q_m(\dots) \right)_+^2 \end{align}
\begin{align} \LossUp &= \left( \Q_m(\dots) + \Qtgt(\dots) - \Qtgt(\dots) \right)_+^2 \end{align}

Ablation with Step Loss

\begin{align} \LossStep &= \left( \fwargs{\state_{t}}{\act_{t}}{\goal_{t+1}}{*}{P} - \Rew(\state_{t}, \act_{t}) \right)^2 \end{align}
\begin{align} \Loss_{\text{HER}} = \Loss_{\text{bellman}} &= \left( R(s_t, a_t, g_f) + \max_a \Qtgt(s_t, a_t, g_f) - \Q_m(s_t, a_t, g_f) \right)^2 \end{align}

HER modeling has duplicate information

Hindsight Experience Replay

Can this work?

Claim: Goal conditioned reinforcement learning without goal rewards is possible

Step Loss does not need Goal Rewards

Why does duplicate information hurt?

HER like methods require to reimagine new goal specification.
If rewards depend upon goal specification, the rewards need to be resampled.
Reward re-computing can be costly.

Cost of algorithms in Reward samples

Ours : HER - Goal rewards $ + \LossStep $
FWRL : HER $ + \LossLo + \LossUp $

Fetch Push

Fetch Pick And Place

Effect of distance threshold

HER Rewards: \[ R(s_t, a_t, g^*) = \begin{cases} 0 &\text{ if } \|g_t - g^*\| \le \epsilon \\ -1 &\text{ otherwise} \end{cases} \]
Our Rewards: \[ R(s_t, a_t) = -1 \]

Conclusion

Mapping Localization Goal conditioned RL

Conclusion

Mapping Localization Goal conditioned RL

Future work

End to end navigation

Language driven localization

Acknowledgements

Thanks to my colleagues and collaborators

Prof. Jason Corso (Advisor)
Prof. Frank Dellaert
Prof. Manmohan Chandraker
Prof. Jeff Siskind
Prof. Chenliang Xu
Dr. Julian Ryde
Dr. Brent Griffin
Dr. Suren Kumar
Shurjo Banerjee
Madan Ravi Ganesh

Andrychowicz, Marcin, et al. "Hindsight experience replay." Advances in Neural Information Processing Systems. 2017.
Mutual localization: Two camera relative 6-dof pose estimation from reciprocal fiducial observation. V Dhiman, J Ryde, JJ Corso. IROS 2013
Learning Compositional Sparse Models of Bimodal Percepts. S Kumar, V Dhiman, JJ Corso AAAI, 2014
Voxel planes: Rapid visualization and meshification of point cloud ensembles. J Ryde, V Dhiman, R Platt IROS, 2013
Modern MAP inference methods for accurate and fast occupancy grid mapping on higher order factor graphs. V Dhiman, A Kundu, F Dellaert, JJ Corso ICRA 2014
Continuous occlusion models for road scene understanding M Chandraker, V Dhiman. US Patent 9,821,813, 2017
A continuous occlusion model for road scene understanding V Dhiman, QH Tran, JJ Corso, M Chandraker. CVPR 2016
A Critical Investigation of DRL for Navigation V Dhiman, S Banerjee, B Griffin, JM Siskind, JJ Corso NIPS DRL Workshop, 2017.
Learning Compositional Sparse Bimodal Models S Kumar, V Dhiman, PA Koch, JJ Corso. PAMI, 2017.
(Mirowski et al. 2017) Learning to navigate in complex environments. In ICLR 2017.
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research. Matthias Plappert and Marcin Andrychowicz and Alex Ray and Bob McGrew and Bowen Baker and Glenn Powell and Jonas Schneider and Josh Tobin and Maciek Chociej and Peter Welinder and Vikash Kumar and Wojciech Zaremba. ArXiV 2018. 1802.09464
Kaelbling, Leslie Pack. "Learning to achieve goals." IJCAI. 1993.
V. Dhiman, S. Banerjee, J. M. Siskind, and J. J. Corso. Learning goal-conditioned value functions with one-step path rewards rather than goal-rewards. In Submitted to ICLR, 2019. Under review.
Zachariou, Peter et al. “SPEEDING Effects on hazard perception and reaction time.” (2011).
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.
Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.
Pearl, Judea. "Fusion, propagation, and structuring in belief networks." Artificial intelligence 29.3 (1986): 241-288.
Jojic, Vladimir, Stephen Gould, and Daphne Koller. "Accelerated dual decomposition for MAP inference." ICML. 2010.
Merali, Rehman S., and Timothy D. Barfoot. "Occupancy grid mapping with Markov chain monte carlo Gibbs sampling." Robotics and Automation (ICRA), 2013 IEEE International Conference on. IEEE, 2013.

Optimizing mapping, localization and planning algorithms

Navigation with autonomous cars

Google trends for 'Autonomous cars'

Potential benefits of self-driving cars

Navigation needs to be fast for safer cars

Outline

Mapping

Occupancy grid mapping

Factor graphs represent factorizations

Mapping using Inverse sensor models

Mapping using Forward sensor models

Inverse sensor models

Drawbacks

Advantages

Forward sensor models

Advantages

Drawbacks

Modern inference methods

Metropolis Hastings for mapping

Belief propagation intuition

Belief propagation

Dual decomposition

Dual decomposition contd.

Modern inference methods

Forward sensor models are still expensive

Piecewise constant sensor model

Efficient belief propagation for PCSM

Occupancy grid mapping using Factor graphs

Results

Conclusion

Communicating cars with mutual obsevation

Mutual localization

Mutual localization

Realtime mutual localization

Mutual localization conclusion

Goal conditioned RL

Goal conditioned reinforcement Learning

Manipulation is navigation in configuration space

Conventional RL vs Goal conditioned RL

Q-Learning

Deep Q Learning

Problem:

Solution Strategy:

Solution:

Floyd-Warshall

Floyd-Warshall

Floyd-Warshall RL

Deep Floyd-Warshall RL

Hindsight Experience Replay (HER)

What does it mean to re-imagine a Virtual Goal

Ablation with FWRL inspired \(\LossUp\) and \(\LossLo\)

Ablation with Step Loss

HER modeling has duplicate information

Step Loss does not need Goal Rewards

Why does duplicate information hurt?

Cost of algorithms in Reward samples

Fetch Push

Fetch Pick And Place

Effect of distance threshold

Conclusion

Conclusion

Future work

End to end navigation

Language driven localization

Acknowledgements

Bibliography

Questions?