\( % Vector \newcommand{\vect}[1]{\mathbf{#1}} \newcommand{\hvect}[1]{\bar{\vect{#1}}} % unit vector \newcommand{\uvect}[1]{\hat{\vect{#1}}} % Field \newcommand{\field}[1]{\mathbb{#1}} % Trace \def\act{a} \def\actt{\act_t} \newcommand\acttT{\vect{\act}_{t:\epiT}} \newcommand{\epiT}{T} \newcommand{\spawnP}{P_{\text{spawn}}} % Map \def\mapsym{x} \def\map{\vect{\mapsym}} \newcommand{\est}[1]{\hat{#1}} \newcommand{\mapest}{\est{\vect{x}}} % Measurement or observation \newcommand{\meas}{z} \newcommand{\measurements}{\vect{\meas}_{1:t}} \newcommand{\meast}[1][t]{\meas_{#1}} % Pose \newcommand{\pose}{g} \newcommand{\poset}[1][t]{\pose_{#1}} \newcommand{\posetn}{\pose_{t+1}} \newcommand{\posetp}{\pose_{t-1}} \newcommand{\poses}{\vect{\pose}_{1:t}} \newcommand{\posest}{\est{\pose}} \newcommand{\posestt}{\est{\pose}_t} % Pose goal \newcommand{\posegoal}{d} \newcommand{\unaryminus}{\scalebox{0.5}[0.5]{\( - \)}} \newcommand{\prevtime}{1:t\unaryminus1} \newcommand{\pastposes}{\vect{\pose}_{\prevtime}} \newcommand{\pastmeas}{\vect{\meas}_{\prevtime}} \newcommand{\pastobs}{\pastmeas, \pastposes} \newcommand{\mutlocest}{f_{\text{mutloc}}} \newcommand{\epiT}{T} \newcommand{\spawnP}{P_{\text{spawn}}} \newcommand{\param}{\theta} \newcommand{\policy}{\chi} % CVPR16 %\DeclareMathSymbol{\Tangent} \DeclareMathOperator{\diag}{diag} \DeclareMathOperator{\sech}{sech} \DeclareMathOperator{\poly}{poly} %\DeclareMathOperator*{\argmin}{\arg\min} \newcommand{\ptD}{\posDp} % FIXME: conflicts with \map, this should be \posDp \newcommand{\Obj}{C} % In mutloc C_p and C_q used for robots \newcommand{\ObjMu}{\vect{m}} % FIXME: in mutloc used as 3D pos of markers in frame 1, this should \pose \newcommand{\ObjR}{\bm{\omega}} \newcommand{\ObjSigma}{\bm{\Sigma}} \newcommand{\Logst}{\mathcal{L}} \newcommand{\OccF}{f_{\text{occ}}} \newcommand{\gOccF}{\nabla \OccF} \newcommand{\gOccFi}{\nabla \OccF^i} \newcommand{\impt}{\imgp{\ptD}} % Mahalonobis distance between to points \newcommand{\mahaF}{\zeta} % FIXME: use some greek letter \newcommand{\LogstIm}{\Logst_u} \newcommand{\tImMu}{\bm{\mu}} \newcommand{\tImS}{\bm{\Gamma}} \newcommand{\LogstD}{\Logst_{\dist}} \newcommand{\tDMu}{\nu} \newcommand{\meandepth}[1]{\tDMu_{#1}} \newcommand{\BB}{\mathbf{B}} \newcommand{\imBB}{\imgp{\BB}} \newcommand{\camO}{\mathbf{p}^c} \newcommand{\camR}{\bm{\omega}^c} \newcommand{\bepsilon}{\bm{\epsilon}} % 3D pose of the cars and ego motion \newcommand{\pos}[2]{\ObjMu^{#1}({#2})} \newcommand{\ori}[2]{\ObjR^{#1}(#2)} \newcommand{\state}[2]{\mathbf{s}^{#1}(#2)} % ego pose \newcommand{\egop}[1][t]{\pos{c}{#1}} \newcommand{\egoo}[1][t]{\ori{c}{#1}} \newcommand{\egos}[1][t]{\state{c}{#1}} % relative pose between camera and car $i$ \newcommand{\posealt}{\pose} % FIXME: Consider unifying with \pose \newcommand{\relp}[2]{\posealt^{#1}(#2)} \newcommand{\relpz}[2]{\posealt_z^{#1}(#2)} % 3D tracks on car $i$ in its own coordinate frame \newcommand{\tracklets}{\mathbf{X}^{i}_o} \newcommand{\tracklet}[1]{\ptD^{i}_{#1}} % track projections on camera \newcommand{\trackpit}[2]{\impt_{#1}(#2)} \newcommand{\trackp}[1]{\impt_j(#1)} \newcommand{\trackpj}[1]{\impt_j(#1)} % Unclassified track point projected on camera \newcommand{\ucTrackp}{\impt(t)} % dimensions of car $i$ \newcommand{\dimsn}[1]{\BB^{#1}} \newcommand{\expDimsn}{\hat{\BB}} % projection function \newcommand{\projectionOf}[1]{\proj_{\relp{i}{t}}\left(#1\right)} \newcommand{\projectionOft}[1]{\proj_{\relp{i}{t+1}}\left(#1\right)} \newcommand{\centerProj}{\bar{\proj}_{\relp{i}{t}}(\dimsn{i})} \newcommand{\cornerProj}[1]{\proj^{#1}_{\relp{i}{t}}(\dimsn{i})} \newcommand{\triangleProj}[1]{\triangle^{#1}_{\relp{i}{t}}(\dimsn{i})} % bounding box corners on image \newcommand{\bbt}[2]{\imBB^{#1}({#2})} \newcommand{\bb}[1]{\imBB^{#1}(t)} \newcommand{\bbMu}{\bm{\mu}} % FIXME: conflicts with logistic mean of trans prob \newcommand{\bbS}{\bm{\Lambda}} \newcommand{\Energy}[1]{\mathcal{E}^{it}_{\text{#1}}} \newcommand{\pEnergy}[1]{\mathcal{E}^{ijt}_{\text{#1}}} % Weighted energy \newcommand{\WEnergy}[1]{\lambda_{\text{#1}}\Energy{#1}} \newcommand{\WpEnergy}[1]{\lambda_{\text{#1}}\pEnergy{#1}} \newcommand{\EnergyCol}{\mathcal{E}^{ijt}_{\text{col}}} \newcommand{\WEnergyCol}{\lambda_{\text{col}}\EnergyCol} \newcommand{\EnergyBBoxNoOcc}{\Energy{detectNoOcc}} \newcommand{\EnergyBBox}{\Energy{detect}} \newcommand{\EnergyTrack}{ \pEnergy{track}} \newcommand{\EnergyTrackNoOcc}{\pEnergy{trackNoOcc}} \newcommand{\EnergyLane}{\Energy{lane}} \newcommand{\EnergySize}{\Energy{size}} \newcommand{\EnergyDyn}{\Energy{dyn}} \newcommand{\EnergyDynHol}{\Energy{dyn-hol}} \newcommand{\EnergyDynOri}{\Energy{dyn-ori}} \newcommand{\EnergyDynVel}{\Energy{dyn-vel}} \newcommand{\occFreeProj}[1]{\Pi_{\relp{i}{t}}(#1)} \newcommand{\minx}{x_{\text{min}}} \newcommand{\miny}{y_{\text{min}}} \newcommand{\maxx}{x_{\text{max}}} \newcommand{\maxy}{y_{\text{max}}} \newcommand{\frontface}{F^i_\text{FF}(t)} \newcommand{\occ}[1]{o({#1})} \newcommand{\face}{F^i_k(t)} \newcommand{\invProjectionOf}[1]{\pi^{-1}_{\relp{i}{t}}\left(#1\right)} \newcommand{\invProjectionOftm}[1]{\pi^{-1}_{\relp{i}{t-1}}\left(#1\right)} \newcommand{\occf}{\OccF^i(\ptD_j)} \newcommand{\occftot}{\OccF(\ptD_j)} \newcommand{\occft}[1]{\OccF(#1)} \newcommand{\ray}{\uvect{\ptD}_j} \newcommand{\occfray}{f_{occ}(\lambda\ray)} \newcommand{\lambdadist}{f_{\lambda}(\trackpj{t-1}, \lambda)} \newcommand{\occfxi}{L(\mathbf{x}; \pos{i}{t-1}, \Sigma_i)} \newcommand{\occfi}{L(\lambda \ray; \pos{i}{t-1}, \Sigma_i)} \newcommand{\PreflR}{P^{ij}_{\text{reflection}}} \newcommand{\Pobs}{P^{ij}_{\text{observation}}} \newcommand{\PtransR}{P^{j}_{\text{transmission}}} \newcommand{\PassocR}{P^{ij}_{\text{assoc}}} \newcommand{\assoc}{a^{ij}} \newcommand{\assocP}{\assoc(\lambda)} \newcommand{\assocPk}{\assoc(\lambda_k)} \newcommand{\Ereproj}{E^{ij}_{\text{reproj}}} \newcommand{\Ptransarg}[1]{\PtransR(#1)} \newcommand{\Ptrans}{\Ptransarg{\lambda}} \newcommand{\Ptransmud}{\Ptransarg{\mu^{i}_d}} \newcommand{\Prefl}{\PreflR(\lambda)} \newcommand{\Zrefl}{Z_{\text{refl}}} \newcommand{\dishort}{d_i(\mathbf{x})} \newcommand{\Lu}{\LogstIm(\impt, \tImMu^i,\tImS^i)} \newcommand{\Llambda}{\LogstD(\impt, \dist; \tDMu^i)} \newcommand{\Gauss}{\mathcal{N}} \newcommand{\PropDist}{\mathcal{W}_j} \newcommand{\muijl}{\mu^{ij}_{\dist}} \newcommand{\sigmaijl}{\sigma^{ij}_{\dist}} \newcommand{\Sigmait}{\bSigma^{i^{-1}}_o} \newcommand{\muit}{\bmu^{i}_o} \newcommand{\Sigmaic}{\bSigma'^{i^{-1}}_c} \newcommand{\muic}{{\bmu^{i}_c}} \newcommand{\Sigmaicf}{\bSigma^{i^{-1}}_c} \newcommand{\Sigmaicfinv}{\Sigma^{i}_c} \newcommand{\muiu}{\mu^{i}_t} \newcommand{\Sigmaiu}{\Sigma^{i^{-1}}_u} \newcommand{\uv}[1]{\hat{\mathbf{#1}}} \newcommand{\Trr}[3]{{}^{#1}{#2}_{#3}} \newcommand{\xymin}[1]{#1_{\text{min}}} \newcommand{\xymax}[1]{#1_{\text{max}}} \newcommand{\xt}{\mathbf{x}_t} \newcommand{\xc}{\mathbf{x}_c} \newcommand{\Rctot}{\bR} \newcommand{\tctot}{\bt} \newcommand{\tcmut}{\bt'} \newcommand{\Beizer}{B\'eizer } \newcommand{\LaneUncertainty}[1]{\Sigma_{L_m}(#1)} \newcommand{\projOnLane}[1]{\pi_{L_m(k)}(#1)} \newlength{\eqs} \setlength{\eqs}{-0.04in} \setitemize[0]{leftmargin=15pt} \newenvironment{tight_itemize}{ \begin{itemize}[leftmargin=10pt] \setlength{\topsep}{0pt} \setlength{\itemsep}{0pt} \setlength{\parskip}{0pt} \setlength{\parsep}{0pt} }{\end{itemize}} % DRL \def\state{s} \newcommand{\statet}[1][t]{\state_{#1}} \def\statetp{\state_{t-1}} \def\statehist{\state_{1:t-1}} \def\statetn{\state_{t+1}} \def\obs{\meas} \def\obst{\obs_t} \def\act{a} \newcommand{\actt}[1][t]{\act_{#1}} \def\acttp{\act_{t-1}} \def\acttn{\act_{t+1}} \def\Obs{\mathcal{O}} \def\ObsEnc{E_o} \def\ObsProb{P_o} \def\ObsFunc{C} \def\ObsFuncFull{\ObsFunc(\statet, \actt) \rightarrow \obst} \def\ObsFuncInv{\ObsFunc^{-1}} \def\ObsFuncInvFull{\ObsFuncInv(\obst, \statetp, \actt) \rightarrow \statet} \def\State{\mathcal{S}} \def\Action{\mathcal{A}} \def\TransP{P_{T}} \def\Trans{T} \def\TransFull{\Trans(\statet, \actt) \rightarrow \statetn} \def\TransObs{T_c} \def\Rew{R} \def\rew{r} \def\rewards{\vect{r}_{1:t}} \def\rewt{\rew_t} \def\rewtp{\rew_{t-1}} \def\rewtn{\rew_{t+1}} \def\RewFull{\Rew(\statet, \actt) \rightarrow \rewtn} \def\TransObsFull{\TransObs(\statet, \obst, \actt, \rewt; \theta_T) \rightarrow \statetn} \def\Value{V} \def\pit{\pi_t} \def\piDef{\pi(\acttn|\statet, \obst, \actt, \rewt; \theta_\pi) \rightarrow \pit(\acttn ; \theta_\pi)} \def\Valuet{\Value_t} \def\ValueDef{\Value(\statet, \obst, \actt, \rewt; \theta_\Value) \rightarrow \Valuet(\theta_\Value)} \def\R{\mathbb{R}} \def\E{\mathbb{E}} \newcommand{\LatencyOneGtOne}{Latency $1:>1$} \newcommand{\NavAiiiCDiDiiL}{NavA3C+D\textsubscript{1}D\textsubscript{2}L} \newcounter{Benchmark} \newcounter{BenchmarkB}[Benchmark] \newcommand{\ditem}[1]{\refstepcounter{Benchmark}\item[\arabic{Benchmark}. {#1}]} \newcommand{\dditem}[1]{\refstepcounter{BenchmarkB}\item[\arabic{Benchmark}.\alph{BenchmarkB}. {#1}]} \newcommand{\DistanceInefficiency}{Distance-inefficiency} % Observation probability % forward sensor model % Differentiable Floyd Warshall algorithm \newcommand{\graph}{G} \newcommand{\vtces}{V} \newcommand{\edges}{E} %\newcommand{\policy}{\Pi} \newcommand{\wts}{W} \newcommand{\distM}{D} \newcommand{\nil}{\emptyset} \newcommand{\qValue}{Q} \newcommand{\Q}{\qValue} \newcommand{\goal}{g} \newcommand{\Goal}{\mathcal{G}} \newcommand{\discount}{\gamma} \newcommand{\fwcost}{Q} \newcommand{\fw}{\fwcost} \newcommand{\fwargs}[5]{\fw_{#4}^{#5}\left({#1}, {#2}, {#3}\right)} \newcommand{\minedgecost}{\fwcost_0} \newcommand{\Rgoal}{R_{\text{goal}}} \newcommand{\Loo}{Latency-1:\textgreater1} \newcommand{\Loss}{\mathcal{L}} \newcommand{\LossText}[1]{\Loss_{\text{#1}}} \newcommand{\LossDDPG}{\LossText{ddpg}} \newcommand{\LossStep}{\LossText{step}} \newcommand{\LossLo}{\LossText{lo}} \newcommand{\LossUp}{\LossText{up}} \newcommand{\LossTrieq}{\LossText{trieq}} \newcommand{\tgt}{\text{tgt}} \newcommand{\Qstar}{\Q_{*}} \newcommand{\Qtgt}{\Q_{\text{tgt}}} \newcommand{\ytgt}{y_t} \newcommand{\bpmsg}[4]{\mu^{#4}_{#1\rightarrow#2}(#3)} \newcommand{\msg}[3]{\mu_{#1#2}(#3)} \newcommand{\discount}{\gamma} \newcommand{\problemobjective}{\begin{align} \vect{\act}_{1:\epiT} &= \arg \max_{\vect{\act}_{1:\epiT}} \E \left[\left. \discount^{k}\mathbb{1}_{\|\posegoal - \pose_k\|_2 \le \epsilon} \right| \vect{\meas}_{1:\epiT} \right] \end{align}} \)

Optimizing mapping, localization and planning algorithms

Navigation with autonomous cars

Google trends for 'Autonomous cars'

Potential benefits of self-driving cars

For elderly and disabled
Roads will become safer
More energy efficiency
Less parking lots

Navigation needs to be fast for safer cars

  • Reaction time in driving is a matter of life and death
  • Perception-reaction time (PRT) is positive correlated to crashes
  • "PRT has a cause-effect relationship in deceleration rate and crash severity."
    "PRT has a direct effect on (crash) event and severity."

Outline

Mapping Localization Goal conditioned RL

Mapping

Mapping Localization Goal conditioned RL

Occupancy grid mapping

Estimate map: \[ \mathbf{x}^* = \arg \max_{\mathbf{x} \in \Omega} p(\mathbf{x} | \mathbf{z}_{1:T}, \mathbf{y}_{1:T}) \]
\( O(|\Omega|) = O(2^{\dim(\mathbf{x})}) \)

Factor graphs represent factorizations

  • Example: \[ P(x_1, x_2, x_3, x_4, x_5) = f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5) \]
  • Factorization: \[ P(\mathbf{x}) = \prod_{f \in F} P_f(\mathbf{x}^f) \]
  • Factor graph \(G = (V, F, E)\)
    • \(V = \) variable nodes. Example \(x_1, x_2, \dots, x_5 \)
    • \(F = \) factor nodes. Example \(f_A, f_B, \dots, f_E \)
    • \(E = \{(i, f) : i \in V, f \in F, \text{ if }P_f(.) \text{ depends upon }x_i\}\)

Mapping using Inverse sensor models

  • Independent cell assumption: \[ p(\mathbf{x} | \mathbf{z}, \mathbf{y}) = \prod_{i = 1}^{N} p(x_i | \mathbf{z}, \mathbf{y}) \]
  • Mapping: \[ \mathbf{x}^* = \arg \max_{\mathbf{x}} \eta \prod_{i=1}^N \prod_{f=1}^{t} p( x_i | z_f, y_f) \]

Mapping using Forward sensor models

  • Mapping: \[ \mathbf{x}^* = \arg \max_{\mathbf{x}} \eta \prod_{f=1}^t p(\mathbf{x}) p( z_f | \mathbf{x}_f, y_f) \] where \( p( z_f | \mathbf{x}_f, y_f) \) is the Forward sensor model.

Inverse sensor models

  • \( p(\mapsym_i|\meas_f, y_f) \) is the inverse sensor model

Drawbacks

  • Cell Independence assumption is limiting
  • Inverse sensor models are often inaccurate

Advantages

  • Fast

Forward sensor models

  • \( p(\meas_f | \map, y_f) \) is the forward sensor model

Advantages

  • No independent cell assumption.
  • Forward sensor models are easy to compute

Drawbacks

  • Slow

Modern inference methods

  • Metropolis Hastings for mapping
  • Belief propagation
  • Dual decomposition

Metropolis Hastings for mapping

  • Current sample \(\map^r\)
  • Potential sample \(\map'\)
  • Acceptance ratio \begin{align} a = \frac{P(\map')Q(\map^r| \map')} {P(\map^r)Q(\map'| \map^r)} \end{align}
  • \begin{align} Q(\map'| \map^r) = \begin{cases} \frac{1}{N} & \text{if $\|\map' - \map^r\|_1 = 1$}\\ 0 & \text{ otherwise} \end{cases} \end{align}

Belief propagation intuition

  • Example: \[ P(x_1, x_2, x_3, x_4, x_5) = f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5) \]
  • Want: \[ P(x_3) = \sum_{\mathbf{x} \setminus x_3} P(x_1, x_2, x_3, x_4, x_5) \]
  • \[ \newcommand{\bpop}[1]{\sum_{#1}} P(x_3) = \bpop{x_1, x_2, x_4, x_5} f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5) \]
  • \[ P(x_3) = \underbrace{ \bpop{x_1,x_2} \left( f_A(x_1) f_B(x_2) f_C(x_1, x_2, x_3) \right) }_{\bpmsg{f_C}{x_3}{x_3}{}} \underbrace{ \bpop{x_4}f_D(x_3, x_4) }_{\bpmsg{f_D}{x_3}{x_3}{}} \underbrace{ \bpop{x_5}f_E(x_3, x_5) }_{\bpmsg{f_E}{x_3}{x_3}{}} \]

Belief propagation

  • \[ \begin{align} \bpmsg{f}{i}{x_i = l_i}{r+1} &= \sum_{\map_f \in \Omega_f: x_i = l_i}P_f(\map_f)\prod_{j \in n(f) \setminus i}\bpmsg{j}{f}{x_j}{r} \\ \bpmsg{i}{f}{x_i = l_i}{r+1} &= \prod_{h \in n(i) \setminus f}\bpmsg{h}{i}{l_i}{r} \end{align} \]

Dual decomposition

  • Example:
    \[ P(x_1, x_2, x_3, x_4, x_5) = f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5) \]
  • \[ x_3^{C*}, \dots = \arg~\max f_C(\dots) \qquad x_3^{D*}, \dots = \arg~\max f_D(\dots) \qquad \dots \]
  • Solve for all \(f \in F \):
    \( \map^f \leftarrow \arg~\min\limits_{\map^f} \left( \theta_f(\map^f) \right) \) where \(\theta_f(\map^f) = -\log P_f(\map^f)\)

Dual decomposition contd.

  • Solve for all \(f \in F \):
    \( \map^f \leftarrow \arg~\min\limits_{\map^f} \left( \theta_f(\map^f) \right) \) where \(\theta_f(\map^f) = -\log P_f(\map^f)\)
  • For disagreeing nodes \(i \in V \): \[ \msg{i}{f}{x^f_i} \leftarrow \msg{i}{f}{x^f_i} + \frac{\alpha}{r} \] where \( r \) is the iteration count
    and \(\alpha\) is the step size.
  • \[\map^f \leftarrow \arg~\min\limits_{\map^f} \left( \theta_f(\map^f) + \sum\limits_{i \in n(f)}\msg{i}{f}{x^f_i} \right)\] where \(\theta_f(\map^f) = -\log P_f(\map^f)\)

Modern inference methods

  • Metropolis Hastings for mapping
  • Belief propagation
    \[ \begin{align} \bpmsg{f}{i}{l_i}{r+1} &= \sum_{\map_f \in \Omega_f: x_i = l_i}P_f(\map_f)\prod_{j \in n(f) \setminus i}\bpmsg{j}{f}{x_j}{r} \label{eq:factor2node} \\ \bpmsg{i}{f}{l_i}{r+1} &= \prod_{h \in n(i) \setminus f}\bpmsg{h}{i}{l_i}{r} \label{eq:node2factor} \end{align} \]
  • Dual decomposition \[\begin{align} \map^f &\leftarrow \arg \min\limits_{\map^f} \left( \theta_f(\map^f) + \sum\limits_{i \in n(f)}\msg{i}{f}{x^f_i} \right) \\ \msg{i}{f}{x^f_i} &\leftarrow \msg{i}{f}{x^{f'}_i} + \frac{\alpha}{r} & \text{for disagreeing nodes \(i\)} \end{align}\]

Forward sensor models are still expensive

  • Belief propagation \begin{align} \bpmsg{f}{i}{x_i = l_i}{r+1} &= \sum_{{\color{red}\map_f \in \Omega_f}: x_i = l_i}P_f(\map_f)\prod_{j \in n(f) \setminus i}\bpmsg{j}{f}{x_j}{r} \end{align}
  • Complexity = \(O(|\Omega_f|) = O(2^{n(f)}) \)

Piecewise constant sensor model

  • \begin{align} P(z_f|\map_f, y_f) &= \frac{1}{Z}\begin{cases} 1 & \text{ if } \map_f = [0, \dots 0, 1, \dots, *]^\top\\ \exp(-900) & \text{ if } \map_f = [0, \dots 0, 0, \dots, *]^\top\\ \exp(-1000) & \text{ otherwise} \end{cases} \end{align}
  • Generalization:
    \begin{align} P^{PCSM}_f(\map_f) = \begin{cases} \psi_{1} & \text{ if $\map_f$ matches pattern $\vect{R}_1$} \\ \vdots& \\ \psi_{M} & \text{ if $\map_f$ matches pattern $\vect{R}_M$} \\ \psi_{\text{min}} & \text{ otherwise} \end{cases} \end{align}

Efficient belief propagation for PCSM

  • \begin{align} P^{PCSM}_f(\map_f) = \begin{cases} \psi_{1} & \text{ if $\map_f$ matches pattern $\vect{R}_1$} \\ \vdots& \\ \psi_{M} & \text{ if $\map_f$ matches pattern $\vect{R}_M$} \\ \psi_{\text{min}} & \text{ otherwise} \end{cases} \end{align}
  • \begin{align} p(\vect{x_f} \sim \vect{R}_m| x_i = l_i) &= \begin{cases} \prod\limits_{j \in n(f) \setminus i, \vect{R}_m[j] \ne *}\bpmsg{j}{f}{x_j = \vect{R}_m[j]}{r} & \text{ if $x_i = l_i \in \vect{R}_m$ } \\ 0 & \text{ otherwise } \end{cases} \end{align}
  • \begin{align} \bpmsg{f}{i}{x_i = l_i}{r+1} = \sum_{m = 1}^M \psi_m p(\vect{x_f} \sim \vect{R}_m, x_i = l_i) + \psi_{\text{min}} p_{\text{otherwise}} \end{align} where $p_{\text{otherwise}} = 1 - \sum_{m = 1}^M p(\vect{x_f} \sim \vect{R}_m, x_i = l_i)$.

Occupancy grid mapping using Factor graphs

Results

Cave
Hospital Section
Hospital
Albert.B

Conclusion

Mapping Localization Goal conditioned RL

Communicating cars with mutual obsevation

Mutual localization

Mapping Localization Goal conditioned RL

Mutual localization

Find \(R,t\) such that \begin{align} q_i &= Rp_i + t \\ \text{where }&\\ q_i \in&\, \mathbb{R}^3, p_i \in \mathbb{R}^3, \\ R \in& \text{ SO(3), } t \in \mathbb{R}^3 \end{align}

Realtime mutual localization

Mutual localization conclusion

Mapping Localization Goal conditioned RL

Goal conditioned RL

Mapping Localization Goal conditioned RL

Goal conditioned reinforcement Learning

  • Goal is "specified"
  • State is fully specified (no partial observation)

Manipulation is navigation in configuration space

Conventional RL vs Goal conditioned RL

Q-Learning

  • Define \[ \Q^*(s_t, a_t) = \max_{\policy} \E\left[ \sum_{k=t}^\infty \discount^{k-t} R(s_t, a_t) \right] \]
  • Bellman equation \[ \Q^*(s_t, a_t) = R(s_t, a_t) + \gamma \max_{a} Q^*(s_{t+1}, a)\]
  • Temporal Difference Update for tabular Q-functions \[ \Q^*(s_t, a_t) \leftarrow \alpha \Q^*(s_t, a_t) + (1-\alpha)( R(s_t, a_t) + \gamma \max_{a} Q^*(s_{t+1}, a) ) \]

Deep Q Learning

  • \begin{align} R(s_t, a_t) + \max_a \discount \Q(s_{t+1}, a) = \Q(s_t, a_t) \end{align}
  • Naive Loss:
    \[ \Loss = \left( R(s_t, a_t) + \max_a \discount \Q(s_{t+1}, a) - \Q(s_t, a) \right)^2 \]

Problem:

Leads to unstable updates

Solution Strategy:

  • Target network \(\Qtgt(\dots) = \Q(\dots; \Theta_{\text{tgt}}) \)
  • Main network \(\Q_m(\dots) = \Q(\dots; \Theta_{m}) \)
  • Update target weights: \( \Theta_{\text{tgt}} \leftarrow \alpha \Theta_{\text{tgt}} + (1-\alpha) \Theta_{m} \) where \(\alpha \approx 0.9\)
  • \(\nabla_{\Theta_m} \Qtgt = 0 \)

Solution:

  • \[ \Loss_{\text{DQN}} = \begin{cases} \left( R(s_t, a_t) - {\color{red}\Q_{m}}(s_t, a_t) \right)^2 & \text{ if \(s_{t+1}\) is terminal } \\ \left( R(s_t, a_t) + \max_a \discount {\color{red}\Qtgt}(s_{t+1}, a) - {\color{red}\Q_m}(s_t, a_t) \right)^2 & \text{ otherwise } \end{cases} \]

Floyd-Warshall

Floyd-Warshall

  • \[ \Q^P(s_t, g^*) := \max_{\policy} \E\left[ \sum_{k = t}^{\color{blue}g_{k+1} = g^*} R(s_k, s_{k+1}) \right] \]

Floyd-Warshall RL

  • \[ \newcommand{\redat}{{\color{red}a_t}} \newcommand{\reda}{{\color{red}a}} \newcommand{\redmaxa}{{\color{red}\max_a}} \Q^P(s_t, \redat, g^*) := \max_{\policy} \E\left[ \sum_{k = t}^{g_{k+1} = g^*} R(s_k, \redat) \right] \]
  • \[ \Q^P(s_t, g_w) + \Q^P(s_w, g^*) \le \Q^P(s_t, g^*) \]
  • \[ \Q^P(s_t, g_{t+1}) = R(s_t, {\color{red}s_{t+1}}) \]
  • \[ \Q^P(s_t, \redat, g_w) + \redmaxa \Q^P(s_w, \reda, g^*) \le \Q^P(s_t, \redat, g^*) \]
  • \[ \Q^P(s_t, \redat, g_{t+1}) = R(s_t, \redat) \]

Deep Floyd-Warshall RL

  • \begin{align} \Q^P(s_t, a_t, g_w) + \max_a \Q^P(s_w, a, g^*) \le \Q^P(s_t, a_t, g^*) \end{align}
  • \begin{align} R(s_t, a_t) + \max_a \discount \Q(s_{t+1}, a) = \Q(s_t, a_t) \end{align} \[ \Loss_{\text{DQN}} = \left( R(s_t, a_t) + \max_a \discount {\color{red}\Qtgt}(s_{t+1}, a) - {\color{red}\Q_m}(s_t, a_t) \right)^2 \]
  • \[ \LossLo = \left( \Qtgt^P(s_t, a_t, g_w) + \max_a \Qtgt^P(s_w, a, g^*) - {\color{red}\Q^P_{m}}(s_t, a_t, g^*) \right)^2_+ \]
  • \[ \LossUp = \left( {\color{red}\Q^P_{m}}(s_t, a_t, g_w) + \max_a \Qtgt^P(s_w, a, g^*) - \Qtgt^P(s_t, a_t, g^*) \right)^2_+ \]
  • \[ \Q^P(s_t, \redat, g_{t+1}) = R(s_t, \redat) \]
  • \[ \LossStep = \left( R(s_t, a_t) - {\color{red}\Q^P_{m}}(s_t, a_t, g_{t+1}) \right)^2 \]

Hindsight Experience Replay (HER)

What does it mean to re-imagine a Virtual Goal

  • The desired goal was \( g^* \), but the achieved goal was \( g_f \)
  • The loss function used is
    \[ \newcommand{\cred}[1]{{\color{red}{#1}}} \Loss = \left( R(s_t, a_t, \cred{g_f}) + \max_a \Qtgt(s_t, a_t, \cred{g_f}) - \Q_m(s_t, a_t, \cred{g_f}) \right)^2 \]
    instead of
    \[ \Loss = \left( R(s_t, a_t, \cred{g^*}) + \max_a \Qtgt(s_t, a_t, \cred{g^*}) - \Q_m(s_t, a_t, \cred{g^*}) \right)^2 \]
  • This needs re-computing the reward function \( R(s_t, a_t, g^*) \)

Ablation with FWRL inspired \(\LossUp\) and \(\LossLo\)

  • \begin{align} \LossLo &= \left( \Qtgt(\dots) + \Qtgt(\dots) - \Q_m(\dots) \right)_+^2 \end{align}
  • \begin{align} \LossUp &= \left( \Q_m(\dots) + \Qtgt(\dots) - \Qtgt(\dots) \right)_+^2 \end{align}

Ablation with Step Loss

  • \begin{align} \LossStep &= \left( \fwargs{\state_{t}}{\act_{t}}{\goal_{t+1}}{*}{P} - \Rew(\state_{t}, \act_{t}) \right)^2 \end{align}
  • \begin{align} \Loss_{\text{HER}} = \Loss_{\text{bellman}} &= \left( R(s_t, a_t, g_f) + \max_a \Qtgt(s_t, a_t, g_f) - \Q_m(s_t, a_t, g_f) \right)^2 \end{align}

HER modeling has duplicate information

Hindsight Experience Replay

Can this work?

Claim: Goal conditioned reinforcement learning without goal rewards is possible

Step Loss does not need Goal Rewards

Why does duplicate information hurt?

  • HER like methods require to reimagine new goal specification.
  • If rewards depend upon goal specification, the rewards need to be resampled.
  • Reward re-computing can be costly.

Cost of algorithms in Reward samples

  • Ours : HER - Goal rewards \( + \LossStep \)
  • FWRL : HER \( + \LossLo + \LossUp \)

Fetch Push

Fetch Pick And Place

Effect of distance threshold

  • HER Rewards: \[ R(s_t, a_t, g^*) = \begin{cases} 0 &\text{ if } \|g_t - g^*\| \le \epsilon \\ -1 &\text{ otherwise} \end{cases} \]
  • Our Rewards: \[ R(s_t, a_t) = -1 \]

Conclusion

Mapping Localization Goal conditioned RL

Conclusion

Mapping Localization Goal conditioned RL

Future work

End to end navigation

Language driven localization

Acknowledgements

Thanks to my colleagues and collaborators

  • Prof. Jason Corso (Advisor)
  • Prof. Frank Dellaert
  • Prof. Manmohan Chandraker
  • Prof. Jeff Siskind
  • Prof. Chenliang Xu
  • Dr. Julian Ryde
  • Dr. Brent Griffin
  • Dr. Suren Kumar
  • Shurjo Banerjee
  • Madan Ravi Ganesh
  • Andrychowicz, Marcin, et al. "Hindsight experience replay." Advances in Neural Information Processing Systems. 2017.
  • Mutual localization: Two camera relative 6-dof pose estimation from reciprocal fiducial observation. V Dhiman, J Ryde, JJ Corso. IROS 2013
  • Learning Compositional Sparse Models of Bimodal Percepts. S Kumar, V Dhiman, JJ Corso AAAI, 2014
  • Voxel planes: Rapid visualization and meshification of point cloud ensembles. J Ryde, V Dhiman, R Platt IROS, 2013
  • Modern MAP inference methods for accurate and fast occupancy grid mapping on higher order factor graphs. V Dhiman, A Kundu, F Dellaert, JJ Corso ICRA 2014
  • Continuous occlusion models for road scene understanding M Chandraker, V Dhiman. US Patent 9,821,813, 2017
  • A continuous occlusion model for road scene understanding V Dhiman, QH Tran, JJ Corso, M Chandraker. CVPR 2016
  • A Critical Investigation of DRL for Navigation V Dhiman, S Banerjee, B Griffin, JM Siskind, JJ Corso NIPS DRL Workshop, 2017.
  • Learning Compositional Sparse Bimodal Models S Kumar, V Dhiman, PA Koch, JJ Corso. PAMI, 2017.
  • (Mirowski et al. 2017) Learning to navigate in complex environments. In ICLR 2017.
  • Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research. Matthias Plappert and Marcin Andrychowicz and Alex Ray and Bob McGrew and Bowen Baker and Glenn Powell and Jonas Schneider and Josh Tobin and Maciek Chociej and Peter Welinder and Vikash Kumar and Wojciech Zaremba. ArXiV 2018. 1802.09464
  • Kaelbling, Leslie Pack. "Learning to achieve goals." IJCAI. 1993.
  • V. Dhiman, S. Banerjee, J. M. Siskind, and J. J. Corso. Learning goal-conditioned value functions with one-step path rewards rather than goal-rewards. In Submitted to ICLR, 2019. Under review.
  • Zachariou, Peter et al. “SPEEDING Effects on hazard perception and reaction time.” (2011).
  • Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.
  • Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.
  • Pearl, Judea. "Fusion, propagation, and structuring in belief networks." Artificial intelligence 29.3 (1986): 241-288.
  • Jojic, Vladimir, Stephen Gould, and Daphne Koller. "Accelerated dual decomposition for MAP inference." ICML. 2010.
  • Merali, Rehman S., and Timothy D. Barfoot. "Occupancy grid mapping with Markov chain monte carlo Gibbs sampling." Robotics and Automation (ICRA), 2013 IEEE International Conference on. IEEE, 2013.

Bibliography

Questions?