\(
% Vector
\newcommand{\vect}[1]{\mathbf{#1}}
\newcommand{\hvect}[1]{\bar{\vect{#1}}}
% unit vector
\newcommand{\uvect}[1]{\hat{\vect{#1}}}
% Field
\newcommand{\field}[1]{\mathbb{#1}}
% Trace
\def\act{a}
\def\actt{\act_t}
\newcommand\acttT{\vect{\act}_{t:\epiT}}
\newcommand{\epiT}{T}
\newcommand{\spawnP}{P_{\text{spawn}}}
% Map
\def\mapsym{x}
\def\map{\vect{\mapsym}}
\newcommand{\est}[1]{\hat{#1}}
\newcommand{\mapest}{\est{\vect{x}}}
% Measurement or observation
\newcommand{\meas}{z}
\newcommand{\measurements}{\vect{\meas}_{1:t}}
\newcommand{\meast}[1][t]{\meas_{#1}}
% Pose
\newcommand{\pose}{g}
\newcommand{\poset}[1][t]{\pose_{#1}}
\newcommand{\posetn}{\pose_{t+1}}
\newcommand{\posetp}{\pose_{t-1}}
\newcommand{\poses}{\vect{\pose}_{1:t}}
\newcommand{\posest}{\est{\pose}}
\newcommand{\posestt}{\est{\pose}_t}
% Pose goal
\newcommand{\posegoal}{d}
\newcommand{\unaryminus}{\scalebox{0.5}[0.5]{\( - \)}}
\newcommand{\prevtime}{1:t\unaryminus1}
\newcommand{\pastposes}{\vect{\pose}_{\prevtime}}
\newcommand{\pastmeas}{\vect{\meas}_{\prevtime}}
\newcommand{\pastobs}{\pastmeas, \pastposes}
\newcommand{\mutlocest}{f_{\text{mutloc}}}
\newcommand{\epiT}{T}
\newcommand{\spawnP}{P_{\text{spawn}}}
\newcommand{\param}{\theta}
\newcommand{\policy}{\chi}
% CVPR16
%\DeclareMathSymbol{\Tangent}
\DeclareMathOperator{\diag}{diag}
\DeclareMathOperator{\sech}{sech}
\DeclareMathOperator{\poly}{poly}
%\DeclareMathOperator*{\argmin}{\arg\min}
\newcommand{\ptD}{\posDp} % FIXME: conflicts with \map, this should be \posDp
\newcommand{\Obj}{C} % In mutloc C_p and C_q used for robots
\newcommand{\ObjMu}{\vect{m}} % FIXME: in mutloc used as 3D pos of markers in frame 1, this should \pose
\newcommand{\ObjR}{\bm{\omega}}
\newcommand{\ObjSigma}{\bm{\Sigma}}
\newcommand{\Logst}{\mathcal{L}}
\newcommand{\OccF}{f_{\text{occ}}}
\newcommand{\gOccF}{\nabla \OccF}
\newcommand{\gOccFi}{\nabla \OccF^i}
\newcommand{\impt}{\imgp{\ptD}}
% Mahalonobis distance between to points
\newcommand{\mahaF}{\zeta} % FIXME: use some greek letter
\newcommand{\LogstIm}{\Logst_u}
\newcommand{\tImMu}{\bm{\mu}}
\newcommand{\tImS}{\bm{\Gamma}}
\newcommand{\LogstD}{\Logst_{\dist}}
\newcommand{\tDMu}{\nu}
\newcommand{\meandepth}[1]{\tDMu_{#1}}
\newcommand{\BB}{\mathbf{B}}
\newcommand{\imBB}{\imgp{\BB}}
\newcommand{\camO}{\mathbf{p}^c}
\newcommand{\camR}{\bm{\omega}^c}
\newcommand{\bepsilon}{\bm{\epsilon}}
% 3D pose of the cars and ego motion
\newcommand{\pos}[2]{\ObjMu^{#1}({#2})}
\newcommand{\ori}[2]{\ObjR^{#1}(#2)}
\newcommand{\state}[2]{\mathbf{s}^{#1}(#2)}
% ego pose
\newcommand{\egop}[1][t]{\pos{c}{#1}}
\newcommand{\egoo}[1][t]{\ori{c}{#1}}
\newcommand{\egos}[1][t]{\state{c}{#1}}
% relative pose between camera and car $i$
\newcommand{\posealt}{\pose} % FIXME: Consider unifying with \pose
\newcommand{\relp}[2]{\posealt^{#1}(#2)}
\newcommand{\relpz}[2]{\posealt_z^{#1}(#2)}
% 3D tracks on car $i$ in its own coordinate frame
\newcommand{\tracklets}{\mathbf{X}^{i}_o}
\newcommand{\tracklet}[1]{\ptD^{i}_{#1}}
% track projections on camera
\newcommand{\trackpit}[2]{\impt_{#1}(#2)}
\newcommand{\trackp}[1]{\impt_j(#1)}
\newcommand{\trackpj}[1]{\impt_j(#1)}
% Unclassified track point projected on camera
\newcommand{\ucTrackp}{\impt(t)}
% dimensions of car $i$
\newcommand{\dimsn}[1]{\BB^{#1}}
\newcommand{\expDimsn}{\hat{\BB}}
% projection function
\newcommand{\projectionOf}[1]{\proj_{\relp{i}{t}}\left(#1\right)}
\newcommand{\projectionOft}[1]{\proj_{\relp{i}{t+1}}\left(#1\right)}
\newcommand{\centerProj}{\bar{\proj}_{\relp{i}{t}}(\dimsn{i})}
\newcommand{\cornerProj}[1]{\proj^{#1}_{\relp{i}{t}}(\dimsn{i})}
\newcommand{\triangleProj}[1]{\triangle^{#1}_{\relp{i}{t}}(\dimsn{i})}
% bounding box corners on image
\newcommand{\bbt}[2]{\imBB^{#1}({#2})}
\newcommand{\bb}[1]{\imBB^{#1}(t)}
\newcommand{\bbMu}{\bm{\mu}} % FIXME: conflicts with logistic mean of trans prob
\newcommand{\bbS}{\bm{\Lambda}}
\newcommand{\Energy}[1]{\mathcal{E}^{it}_{\text{#1}}}
\newcommand{\pEnergy}[1]{\mathcal{E}^{ijt}_{\text{#1}}}
% Weighted energy
\newcommand{\WEnergy}[1]{\lambda_{\text{#1}}\Energy{#1}}
\newcommand{\WpEnergy}[1]{\lambda_{\text{#1}}\pEnergy{#1}}
\newcommand{\EnergyCol}{\mathcal{E}^{ijt}_{\text{col}}}
\newcommand{\WEnergyCol}{\lambda_{\text{col}}\EnergyCol}
\newcommand{\EnergyBBoxNoOcc}{\Energy{detectNoOcc}}
\newcommand{\EnergyBBox}{\Energy{detect}}
\newcommand{\EnergyTrack}{ \pEnergy{track}}
\newcommand{\EnergyTrackNoOcc}{\pEnergy{trackNoOcc}}
\newcommand{\EnergyLane}{\Energy{lane}}
\newcommand{\EnergySize}{\Energy{size}}
\newcommand{\EnergyDyn}{\Energy{dyn}}
\newcommand{\EnergyDynHol}{\Energy{dyn-hol}}
\newcommand{\EnergyDynOri}{\Energy{dyn-ori}}
\newcommand{\EnergyDynVel}{\Energy{dyn-vel}}
\newcommand{\occFreeProj}[1]{\Pi_{\relp{i}{t}}(#1)}
\newcommand{\minx}{x_{\text{min}}}
\newcommand{\miny}{y_{\text{min}}}
\newcommand{\maxx}{x_{\text{max}}}
\newcommand{\maxy}{y_{\text{max}}}
\newcommand{\frontface}{F^i_\text{FF}(t)}
\newcommand{\occ}[1]{o({#1})}
\newcommand{\face}{F^i_k(t)}
\newcommand{\invProjectionOf}[1]{\pi^{-1}_{\relp{i}{t}}\left(#1\right)}
\newcommand{\invProjectionOftm}[1]{\pi^{-1}_{\relp{i}{t-1}}\left(#1\right)}
\newcommand{\occf}{\OccF^i(\ptD_j)}
\newcommand{\occftot}{\OccF(\ptD_j)}
\newcommand{\occft}[1]{\OccF(#1)}
\newcommand{\ray}{\uvect{\ptD}_j}
\newcommand{\occfray}{f_{occ}(\lambda\ray)}
\newcommand{\lambdadist}{f_{\lambda}(\trackpj{t-1}, \lambda)}
\newcommand{\occfxi}{L(\mathbf{x}; \pos{i}{t-1}, \Sigma_i)}
\newcommand{\occfi}{L(\lambda \ray; \pos{i}{t-1}, \Sigma_i)}
\newcommand{\PreflR}{P^{ij}_{\text{reflection}}}
\newcommand{\Pobs}{P^{ij}_{\text{observation}}}
\newcommand{\PtransR}{P^{j}_{\text{transmission}}}
\newcommand{\PassocR}{P^{ij}_{\text{assoc}}}
\newcommand{\assoc}{a^{ij}}
\newcommand{\assocP}{\assoc(\lambda)}
\newcommand{\assocPk}{\assoc(\lambda_k)}
\newcommand{\Ereproj}{E^{ij}_{\text{reproj}}}
\newcommand{\Ptransarg}[1]{\PtransR(#1)}
\newcommand{\Ptrans}{\Ptransarg{\lambda}}
\newcommand{\Ptransmud}{\Ptransarg{\mu^{i}_d}}
\newcommand{\Prefl}{\PreflR(\lambda)}
\newcommand{\Zrefl}{Z_{\text{refl}}}
\newcommand{\dishort}{d_i(\mathbf{x})}
\newcommand{\Lu}{\LogstIm(\impt, \tImMu^i,\tImS^i)}
\newcommand{\Llambda}{\LogstD(\impt, \dist; \tDMu^i)}
\newcommand{\Gauss}{\mathcal{N}}
\newcommand{\PropDist}{\mathcal{W}_j}
\newcommand{\muijl}{\mu^{ij}_{\dist}}
\newcommand{\sigmaijl}{\sigma^{ij}_{\dist}}
\newcommand{\Sigmait}{\bSigma^{i^{-1}}_o}
\newcommand{\muit}{\bmu^{i}_o}
\newcommand{\Sigmaic}{\bSigma'^{i^{-1}}_c}
\newcommand{\muic}{{\bmu^{i}_c}}
\newcommand{\Sigmaicf}{\bSigma^{i^{-1}}_c}
\newcommand{\Sigmaicfinv}{\Sigma^{i}_c}
\newcommand{\muiu}{\mu^{i}_t}
\newcommand{\Sigmaiu}{\Sigma^{i^{-1}}_u}
\newcommand{\uv}[1]{\hat{\mathbf{#1}}}
\newcommand{\Trr}[3]{{}^{#1}{#2}_{#3}}
\newcommand{\xymin}[1]{#1_{\text{min}}}
\newcommand{\xymax}[1]{#1_{\text{max}}}
\newcommand{\xt}{\mathbf{x}_t}
\newcommand{\xc}{\mathbf{x}_c}
\newcommand{\Rctot}{\bR}
\newcommand{\tctot}{\bt}
\newcommand{\tcmut}{\bt'}
\newcommand{\Beizer}{B\'eizer }
\newcommand{\LaneUncertainty}[1]{\Sigma_{L_m}(#1)}
\newcommand{\projOnLane}[1]{\pi_{L_m(k)}(#1)}
\newlength{\eqs}
\setlength{\eqs}{-0.04in}
\setitemize[0]{leftmargin=15pt}
\newenvironment{tight_itemize}{
\begin{itemize}[leftmargin=10pt]
\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\setlength{\parskip}{0pt}
\setlength{\parsep}{0pt}
}{\end{itemize}}
% DRL
\def\state{s}
\newcommand{\statet}[1][t]{\state_{#1}}
\def\statetp{\state_{t-1}}
\def\statehist{\state_{1:t-1}}
\def\statetn{\state_{t+1}}
\def\obs{\meas}
\def\obst{\obs_t}
\def\act{a}
\newcommand{\actt}[1][t]{\act_{#1}}
\def\acttp{\act_{t-1}}
\def\acttn{\act_{t+1}}
\def\Obs{\mathcal{O}}
\def\ObsEnc{E_o}
\def\ObsProb{P_o}
\def\ObsFunc{C}
\def\ObsFuncFull{\ObsFunc(\statet, \actt) \rightarrow \obst}
\def\ObsFuncInv{\ObsFunc^{-1}}
\def\ObsFuncInvFull{\ObsFuncInv(\obst, \statetp, \actt) \rightarrow \statet}
\def\State{\mathcal{S}}
\def\Action{\mathcal{A}}
\def\TransP{P_{T}}
\def\Trans{T}
\def\TransFull{\Trans(\statet, \actt) \rightarrow \statetn}
\def\TransObs{T_c}
\def\Rew{R}
\def\rew{r}
\def\rewards{\vect{r}_{1:t}}
\def\rewt{\rew_t}
\def\rewtp{\rew_{t-1}}
\def\rewtn{\rew_{t+1}}
\def\RewFull{\Rew(\statet, \actt) \rightarrow \rewtn}
\def\TransObsFull{\TransObs(\statet, \obst, \actt, \rewt; \theta_T) \rightarrow \statetn}
\def\Value{V}
\def\pit{\pi_t}
\def\piDef{\pi(\acttn|\statet, \obst, \actt, \rewt; \theta_\pi) \rightarrow \pit(\acttn ; \theta_\pi)}
\def\Valuet{\Value_t}
\def\ValueDef{\Value(\statet, \obst, \actt, \rewt; \theta_\Value) \rightarrow \Valuet(\theta_\Value)}
\def\R{\mathbb{R}}
\def\E{\mathbb{E}}
\newcommand{\LatencyOneGtOne}{Latency $1:>1$}
\newcommand{\NavAiiiCDiDiiL}{NavA3C+D\textsubscript{1}D\textsubscript{2}L}
\newcounter{Benchmark}
\newcounter{BenchmarkB}[Benchmark]
\newcommand{\ditem}[1]{\refstepcounter{Benchmark}\item[\arabic{Benchmark}. {#1}]}
\newcommand{\dditem}[1]{\refstepcounter{BenchmarkB}\item[\arabic{Benchmark}.\alph{BenchmarkB}. {#1}]}
\newcommand{\DistanceInefficiency}{Distance-inefficiency}
% Observation probability
% forward sensor model
% Differentiable Floyd Warshall algorithm
\newcommand{\graph}{G}
\newcommand{\vtces}{V}
\newcommand{\edges}{E}
%\newcommand{\policy}{\Pi}
\newcommand{\wts}{W}
\newcommand{\distM}{D}
\newcommand{\nil}{\emptyset}
\newcommand{\qValue}{Q}
\newcommand{\Q}{\qValue}
\newcommand{\goal}{g}
\newcommand{\Goal}{\mathcal{G}}
\newcommand{\discount}{\gamma}
\newcommand{\fwcost}{Q}
\newcommand{\fw}{\fwcost}
\newcommand{\fwargs}[5]{\fw_{#4}^{#5}\left({#1}, {#2}, {#3}\right)}
\newcommand{\minedgecost}{\fwcost_0}
\newcommand{\Rgoal}{R_{\text{goal}}}
\newcommand{\Loo}{Latency-1:\textgreater1}
\newcommand{\Loss}{\mathcal{L}}
\newcommand{\LossText}[1]{\Loss_{\text{#1}}}
\newcommand{\LossDDPG}{\LossText{ddpg}}
\newcommand{\LossStep}{\LossText{step}}
\newcommand{\LossLo}{\LossText{lo}}
\newcommand{\LossUp}{\LossText{up}}
\newcommand{\LossTrieq}{\LossText{trieq}}
\newcommand{\tgt}{\text{tgt}}
\newcommand{\Qstar}{\Q_{*}}
\newcommand{\Qtgt}{\Q_{\text{tgt}}}
\newcommand{\ytgt}{y_t}
\newcommand{\bpmsg}[4]{\mu^{#4}_{#1\rightarrow#2}(#3)}
\newcommand{\msg}[3]{\mu_{#1#2}(#3)}
\newcommand{\discount}{\gamma}
\newcommand{\problemobjective}{\begin{align}
\vect{\act}_{1:\epiT} &= \arg \max_{\vect{\act}_{1:\epiT}} \E \left[\left.
\discount^{k}\mathbb{1}_{\|\posegoal - \pose_k\|_2 \le \epsilon}
\right|
\vect{\meas}_{1:\epiT} \right]
\end{align}}
\)
Optimizing mapping, localization and planning algorithms
Hi Everyone, I am Vikas and welcome to my thesis defence.
Today I am going to talk about improving
navigation algorithms by optimizing mapping, localization and path-planning.
Navigation with autonomous cars
Google trends for 'Autonomous cars'
- While talk about navigation we cannot ignore self-driving cars.
- Self-driving cars have been getting a lot of attention from public, media
and entrepreneurs.
- Self-driving cars have covered millions of miles.
- States have been legalizing self-driving cars.
- The predictions of self-driving cars have been getting closer.
- This is a wonderful time to be working on the problem of navigation and
see the technology being applied in real life.
Potential benefits of self-driving cars
For elderly and disabled
Roads will become safer
More energy efficiency
Less parking lots
- And the interest is justified because of the benefits that self-driving cars bring.
- Elderly, young and disabled get increased mobility.
- Roads become safer.
- More energy efficiency.
- Less car parking.
- However, these potentials can be realized only if the self-driving cars
are safe and the navigation algorithms are safe.
Navigation needs to be fast for safer cars
Reaction time in driving is a matter of life and death
Perception-reaction time (PRT) is positive correlated to crashes
"PRT has a cause-effect relationship in deceleration rate and crash severity."
"PRT has a direct effect on (crash) event and severity."
- It is clear reaction time in driving can be a difference between life and
death
- Driving safety research has clearly established the relationship between Perception-Reaction time and probability of crash and severity of crash.
- Even half a second difference between Perception Reaction time (PRT) can
be a difference between crash and non-crash events.
- Here the perception reaction time is measured as the time elapsed between
an observation and the action taken for example foot touching the pedal.
- In this talk, I will discuss how different parts of the navigation
pipeline can be optimized or made faster.
Outline
Mapping
Localization
Goal conditioned RL
- Navigation is the problem of converting sequence of observation to a
sequence of actions for the purpose of going from one place to another.
- It is often addressed in three parts.
- Mapping---which is the estimation of the static part of the environment.
- Localization---which is the estimation of the dynamic state of the
environment like the agents location in the map.
- and Planning which is the estimation of the sequence of action that
moves the agent from current state to the desired goal.
+ Most of my work has been focused around mapping with some work around
localization and planning.
+ Today I am going to talk about three of my works.
+ First I am going to talk about my work on making mapping faster by using
modern inference methods on factor graphs.
+ Then I am going to talk about mutual localiztion that intern enables
faster mapping by allowing robots to divide and conquer the environment.
+ In the end I will talk about making goal conditioned reinforcement
learning faster by removing redundant computation.
Mapping
Mapping
Localization
Goal conditioned RL
- First let's talk about mapping using modern inference algorithms in
factor graphs.
Occupancy grid mapping
Estimate map:
\[
\mathbf{x}^* = \arg \max_{\mathbf{x} \in \Omega} p(\mathbf{x} | \mathbf{z}_{1:T}, \mathbf{y}_{1:T})
\]
\( O(|\Omega|) = O(2^{\dim(\mathbf{x})}) \)
- Occupacny grid is a type of mapping where the space is divided into grid
cells.
- Each grid cell can either be occupied or free.
- On the left you can ses a simulation of robot with a laser scanner
estimating an occupancy grid map.
- Assume a robot or a self-driving car with a laser scanner is moving
through the environment while observing it.
- Let y_1 be the pose of the laser scanner and z_1 be the range measurement
from the laser scanner.
- The laser scan will pass through a subset of cells x_1 of the map.
- Simularly the robot collects more observations over time 1 to T
- Our objective is to estimate the map of the environment using observations
and locations of the robot over time
- Note the maximization is over the map space which in case of occupacny
grid maps is exponential in the size of the map.
- A map-space is 100x100 will have 10^3000 possible configurations.
- To make the problem tractable we factorize the probability distributions.
That brings to the tool of our choice: Factor graphs.
Factor graphs represent factorizations
Example:
\[
P(x_1, x_2, x_3, x_4, x_5) = f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5)
\]
Factorization:
\[
P(\mathbf{x}) = \prod_{f \in F} P_f(\mathbf{x}^f)
\]
Factor graph \(G = (V, F, E)\)
\(V = \) variable nodes. Example \(x_1, x_2, \dots, x_5 \)
\(F = \) factor nodes. Example \(f_A, f_B, \dots, f_E \)
\(E = \{(i, f) : i \in V, f \in F, \text{ if }P_f(.) \text{ depends upon }x_i\}\)
- Let us understand Factor graphs by an example.
- A probability distribution can be factorized into factors that depend upon
only small subset of the random variables.
- This factorization can be represented by a factor graph as shown.
- Each factor is forms a factor node shown in square.
- It is connected to only the random variables on which that factor depends upon.
- In general if the probability on random vector x is factorized into f
factors, then it can be represented as a graph G
- The only edges are between variable nodes and factor nodes and a factor is
connected to a variable node if and only if the factor depends up on the
variable.
- The factorizations of the probability depend upon modeling assumption and
can differ widely.
Mapping using Inverse sensor models
Independent cell assumption:
\[ p(\mathbf{x} | \mathbf{z}, \mathbf{y}) = \prod_{i = 1}^{N} p(x_i | \mathbf{z}, \mathbf{y}) \]
Mapping:
\[
\mathbf{x}^* = \arg \max_{\mathbf{x}} \eta \prod_{i=1}^N \prod_{f=1}^{t} p( x_i | z_f, y_f)
\]
Mapping using Forward sensor models
Mapping:
\[
\mathbf{x}^* = \arg \max_{\mathbf{x}} \eta \prod_{f=1}^t
p(\mathbf{x}) p( z_f | \mathbf{x}_f, y_f)
\]
where \( p( z_f | \mathbf{x}_f, y_f) \) is the Forward sensor model.
Inverse sensor models
\(
p(\mapsym_i|\meas_f, y_f)
\) is the inverse sensor model
Drawbacks
Cell Independence assumption is limiting
Inverse sensor models are often inaccurate
Advantages
Forward sensor models
\( p(\meas_f | \map, y_f) \) is the forward sensor model
Advantages
No independent cell assumption.
Forward sensor models are easy to compute
Drawbacks
Modern inference methods
Metropolis Hastings for mapping
Belief propagation
Dual decomposition
Metropolis Hastings for mapping
Current sample \(\map^r\)
Potential sample \(\map'\)
Acceptance ratio
\begin{align}
a = \frac{P(\map')Q(\map^r| \map')}
{P(\map^r)Q(\map'| \map^r)}
\end{align}
\begin{align}
Q(\map'| \map^r) = \begin{cases}
\frac{1}{N} & \text{if $\|\map' - \map^r\|_1 = 1$}\\
0 & \text{ otherwise}
\end{cases}
\end{align}
Belief propagation intuition
Example:
\[
P(x_1, x_2, x_3, x_4, x_5) = f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5)
\]
Want:
\[
P(x_3) = \sum_{\mathbf{x} \setminus x_3} P(x_1, x_2, x_3, x_4, x_5)
\]
\[
\newcommand{\bpop}[1]{\sum_{#1}}
P(x_3) = \bpop{x_1, x_2, x_4, x_5}
f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5)
\]
\[
P(x_3) =
\underbrace{
\bpop{x_1,x_2} \left( f_A(x_1)
f_B(x_2)
f_C(x_1, x_2, x_3) \right)
}_{\bpmsg{f_C}{x_3}{x_3}{}}
\underbrace{
\bpop{x_4}f_D(x_3, x_4)
}_{\bpmsg{f_D}{x_3}{x_3}{}}
\underbrace{
\bpop{x_5}f_E(x_3, x_5)
}_{\bpmsg{f_E}{x_3}{x_3}{}}
\]
Belief propagation
\[
\begin{align}
\bpmsg{f}{i}{x_i = l_i}{r+1} &= \sum_{\map_f \in \Omega_f: x_i = l_i}P_f(\map_f)\prod_{j \in n(f) \setminus i}\bpmsg{j}{f}{x_j}{r}
\\
\bpmsg{i}{f}{x_i = l_i}{r+1} &= \prod_{h \in n(i) \setminus f}\bpmsg{h}{i}{l_i}{r}
\end{align}
\]
Dual decomposition
Example:
\[ P(x_1, x_2, x_3, x_4, x_5) = f_A(x_1)f_B(x_2)f_C(x_1, x_2, x_3)f_D(x_3, x_4)f_E(x_3, x_5) \]
\[ x_3^{C*}, \dots = \arg~\max f_C(\dots) \qquad
x_3^{D*}, \dots = \arg~\max f_D(\dots) \qquad \dots
\]
Solve for all \(f \in F \):
\( \map^f \leftarrow \arg~\min\limits_{\map^f} \left( \theta_f(\map^f) \right) \)
where \(\theta_f(\map^f) = -\log P_f(\map^f)\)
Dual decomposition contd.
Solve for all \(f \in F \):
\( \map^f \leftarrow \arg~\min\limits_{\map^f} \left( \theta_f(\map^f) \right) \)
where \(\theta_f(\map^f) = -\log P_f(\map^f)\)
For disagreeing nodes \(i \in V \):
\[ \msg{i}{f}{x^f_i} \leftarrow \msg{i}{f}{x^f_i} + \frac{\alpha}{r} \]
where \( r \) is the iteration count and \(\alpha\) is the step size.
\[\map^f \leftarrow \arg~\min\limits_{\map^f} \left( \theta_f(\map^f) + \sum\limits_{i \in n(f)}\msg{i}{f}{x^f_i} \right)\]
where \(\theta_f(\map^f) = -\log P_f(\map^f)\)
Modern inference methods
Metropolis Hastings for mapping
Belief propagation
\[
\begin{align}
\bpmsg{f}{i}{l_i}{r+1} &= \sum_{\map_f \in \Omega_f: x_i = l_i}P_f(\map_f)\prod_{j \in n(f) \setminus i}\bpmsg{j}{f}{x_j}{r}
\label{eq:factor2node}
\\
\bpmsg{i}{f}{l_i}{r+1} &= \prod_{h \in n(i) \setminus f}\bpmsg{h}{i}{l_i}{r}
\label{eq:node2factor}
\end{align}
\]
Dual decomposition
\[\begin{align}
\map^f &\leftarrow \arg \min\limits_{\map^f} \left( \theta_f(\map^f) + \sum\limits_{i \in n(f)}\msg{i}{f}{x^f_i} \right)
\\
\msg{i}{f}{x^f_i} &\leftarrow \msg{i}{f}{x^{f'}_i} + \frac{\alpha}{r}
& \text{for disagreeing nodes \(i\)}
\end{align}\]
Forward sensor models are still expensive
Belief propagation
\begin{align}
\bpmsg{f}{i}{x_i = l_i}{r+1} &= \sum_{{\color{red}\map_f \in \Omega_f}: x_i = l_i}P_f(\map_f)\prod_{j \in n(f) \setminus i}\bpmsg{j}{f}{x_j}{r}
\end{align}
Complexity = \(O(|\Omega_f|) = O(2^{n(f)}) \)
Exponential in the neighborhood size
Piecewise constant sensor model
\begin{align}
P(z_f|\map_f, y_f) &= \frac{1}{Z}\begin{cases}
1 & \text{ if } \map_f = [0, \dots 0,
1, \dots, *]^\top\\
\exp(-900) & \text{ if } \map_f = [0, \dots 0,
0, \dots, *]^\top\\
\exp(-1000) & \text{ otherwise}
\end{cases}
\end{align}
Generalization:
\begin{align}
P^{PCSM}_f(\map_f) = \begin{cases}
\psi_{1} & \text{ if $\map_f$ matches pattern $\vect{R}_1$} \\
\vdots& \\
\psi_{M} & \text{ if $\map_f$ matches pattern $\vect{R}_M$} \\
\psi_{\text{min}} & \text{ otherwise}
\end{cases}
\end{align}
Efficient belief propagation for PCSM
\begin{align}
P^{PCSM}_f(\map_f) = \begin{cases}
\psi_{1} & \text{ if $\map_f$ matches pattern $\vect{R}_1$} \\
\vdots& \\
\psi_{M} & \text{ if $\map_f$ matches pattern $\vect{R}_M$} \\
\psi_{\text{min}} & \text{ otherwise}
\end{cases}
\end{align}
\begin{align}
p(\vect{x_f} \sim \vect{R}_m| x_i = l_i)
&= \begin{cases}
\prod\limits_{j \in n(f) \setminus i, \vect{R}_m[j] \ne *}\bpmsg{j}{f}{x_j = \vect{R}_m[j]}{r} &
\text{ if $x_i = l_i \in \vect{R}_m$ }
\\
0 & \text{ otherwise }
\end{cases}
\end{align}
\begin{align}
\bpmsg{f}{i}{x_i = l_i}{r+1} = \sum_{m = 1}^M
\psi_m p(\vect{x_f} \sim \vect{R}_m, x_i = l_i)
+ \psi_{\text{min}} p_{\text{otherwise}}
\end{align}
where $p_{\text{otherwise}} = 1 - \sum_{m = 1}^M p(\vect{x_f} \sim \vect{R}_m, x_i = l_i)$.
Occupancy grid mapping using Factor graphs
Conclusion
Mapping
Localization
Goal conditioned RL
Communicating cars with mutual obsevation
- Now that we have a faster mapping algorithm can we make it even faster.
- Imagine that we have multiple robot that can divide and conquer the mapping task.
- To share the estimated map the robots will have to know the precise
location of the robots with respect to each other.
- That brings us the next part of the talk which is mutual localization.
Mutual localization
Mapping
Localization
Goal conditioned RL
Mutual localization
Your browser does not support the video tag.
Find \(R,t\) such that
\begin{align}
q_i &= Rp_i + t
\\
\text{where }&\\
q_i \in&\, \mathbb{R}^3,
p_i \in \mathbb{R}^3,
\\
R \in& \text{ SO(3), }
t \in \mathbb{R}^3
\end{align}
Realtime mutual localization
Your browser does not support the video tag.
Mutual localization conclusion
Mapping
Localization
Goal conditioned RL
Goal conditioned RL
Mapping
Localization
Goal conditioned RL
Goal conditioned reinforcement Learning
Goal is "specified"
State is fully specified (no partial observation)
Manipulation is navigation in configuration space
Conventional RL vs Goal conditioned RL
Q-Learning
Define \[ \Q^*(s_t, a_t) = \max_{\policy} \E\left[ \sum_{k=t}^\infty \discount^{k-t} R(s_t, a_t) \right] \]
Bellman equation \[ \Q^*(s_t, a_t) = R(s_t, a_t) + \gamma \max_{a} Q^*(s_{t+1}, a)\]
Temporal Difference Update for tabular Q-functions \[ \Q^*(s_t, a_t)
\leftarrow
\alpha \Q^*(s_t, a_t)
+
(1-\alpha)( R(s_t, a_t) + \gamma \max_{a} Q^*(s_{t+1}, a) )
\]
Deep Q Learning
\begin{align}
R(s_t, a_t) + \max_a \discount \Q(s_{t+1}, a) = \Q(s_t, a_t)
\end{align}
Naive Loss:
\[ \Loss = \left( R(s_t, a_t) + \max_a \discount \Q(s_{t+1}, a) - \Q(s_t, a) \right)^2 \]
Problem:
Leads to unstable updates
Solution Strategy:
Target network \(\Qtgt(\dots) = \Q(\dots; \Theta_{\text{tgt}}) \)
Main network \(\Q_m(\dots) = \Q(\dots; \Theta_{m}) \)
Update target weights: \( \Theta_{\text{tgt}} \leftarrow \alpha \Theta_{\text{tgt}} + (1-\alpha) \Theta_{m} \)
where \(\alpha \approx 0.9\)
\(\nabla_{\Theta_m} \Qtgt = 0 \)
Solution:
\[
\Loss_{\text{DQN}} = \begin{cases}
\left( R(s_t, a_t) - {\color{red}\Q_{m}}(s_t, a_t) \right)^2 & \text{ if \(s_{t+1}\) is terminal }
\\
\left( R(s_t, a_t) + \max_a \discount {\color{red}\Qtgt}(s_{t+1}, a) - {\color{red}\Q_m}(s_t, a_t) \right)^2
& \text{ otherwise }
\end{cases}
\]
Floyd-Warshall
Floyd-Warshall
\[ \Q^P(s_t, g^*) := \max_{\policy} \E\left[
\sum_{k = t}^{\color{blue}g_{k+1} = g^*} R(s_k, s_{k+1}) \right] \]
Floyd-Warshall RL
\[
\newcommand{\redat}{{\color{red}a_t}}
\newcommand{\reda}{{\color{red}a}}
\newcommand{\redmaxa}{{\color{red}\max_a}}
\Q^P(s_t, \redat, g^*) := \max_{\policy} \E\left[
\sum_{k = t}^{g_{k+1} = g^*}
R(s_k, \redat)
\right] \]
\[ \Q^P(s_t, g_w) + \Q^P(s_w, g^*) \le \Q^P(s_t, g^*) \]
\[ \Q^P(s_t, g_{t+1}) = R(s_t, {\color{red}s_{t+1}}) \]
\[
\Q^P(s_t, \redat, g_w) + \redmaxa \Q^P(s_w, \reda, g^*) \le \Q^P(s_t, \redat, g^*) \]
\[
\Q^P(s_t, \redat, g_{t+1}) = R(s_t, \redat) \]
Deep Floyd-Warshall RL
\begin{align}
\Q^P(s_t, a_t, g_w) + \max_a \Q^P(s_w, a, g^*) \le \Q^P(s_t, a_t, g^*)
\end{align}
\begin{align}
R(s_t, a_t) + \max_a \discount \Q(s_{t+1}, a) = \Q(s_t, a_t)
\end{align}
\[
\Loss_{\text{DQN}} =
\left( R(s_t, a_t) + \max_a \discount {\color{red}\Qtgt}(s_{t+1}, a) - {\color{red}\Q_m}(s_t, a_t) \right)^2
\]
\[ \LossLo = \left( \Qtgt^P(s_t, a_t, g_w) + \max_a \Qtgt^P(s_w, a, g^*)
- {\color{red}\Q^P_{m}}(s_t, a_t, g^*) \right)^2_+ \]
\[ \LossUp = \left( {\color{red}\Q^P_{m}}(s_t, a_t, g_w)
+ \max_a \Qtgt^P(s_w, a, g^*) - \Qtgt^P(s_t, a_t, g^*) \right)^2_+ \]
\[
\Q^P(s_t, \redat, g_{t+1}) = R(s_t, \redat) \]
\[ \LossStep = \left( R(s_t, a_t) - {\color{red}\Q^P_{m}}(s_t, a_t, g_{t+1}) \right)^2 \]
Hindsight Experience Replay (HER)
Your browser does not support the video tag.
What does it mean to re-imagine a Virtual Goal
The desired goal was \( g^* \), but the achieved goal was \( g_f \)
The loss function used is
\[
\newcommand{\cred}[1]{{\color{red}{#1}}}
\Loss = \left( R(s_t, a_t, \cred{g_f}) +
\max_a \Qtgt(s_t, a_t, \cred{g_f}) - \Q_m(s_t, a_t, \cred{g_f}) \right)^2 \]
instead of
\[ \Loss = \left( R(s_t, a_t, \cred{g^*}) +
\max_a \Qtgt(s_t, a_t, \cred{g^*}) -
\Q_m(s_t, a_t, \cred{g^*}) \right)^2 \]
This needs re-computing the reward function \( R(s_t, a_t, g^*) \)
Ablation with FWRL inspired \(\LossUp\) and \(\LossLo\)
\begin{align}
\LossLo &=
\left( \Qtgt(\dots) + \Qtgt(\dots) - \Q_m(\dots) \right)_+^2
\end{align}
\begin{align}
\LossUp &=
\left( \Q_m(\dots) + \Qtgt(\dots) - \Qtgt(\dots) \right)_+^2
\end{align}
Ablation with Step Loss
\begin{align}
\LossStep &= \left(
\fwargs{\state_{t}}{\act_{t}}{\goal_{t+1}}{*}{P} - \Rew(\state_{t}, \act_{t})
\right)^2
\end{align}
\begin{align}
\Loss_{\text{HER}} = \Loss_{\text{bellman}} &=
\left( R(s_t, a_t, g_f) +
\max_a \Qtgt(s_t, a_t, g_f) - \Q_m(s_t, a_t, g_f) \right)^2
\end{align}
HER modeling has duplicate information
Hindsight Experience Replay
Claim: Goal conditioned reinforcement learning without goal rewards is possible
Step Loss does not need Goal Rewards
Why does duplicate information hurt?
HER like methods require to reimagine new goal specification.
If rewards depend upon goal specification, the rewards need to be resampled.
Reward re-computing can be costly.
Cost of algorithms in Reward samples
Ours : HER - Goal rewards \( + \LossStep \)
FWRL : HER \( + \LossLo + \LossUp \)
Fetch Push
Your browser does not support the video tag.
Fetch Pick And Place
Your browser does not support the video tag.
Effect of distance threshold
HER Rewards:
\[ R(s_t, a_t, g^*) = \begin{cases} 0 &\text{ if } \|g_t - g^*\| \le \epsilon
\\
-1 &\text{ otherwise}
\end{cases} \]
Our Rewards: \[ R(s_t, a_t) = -1 \]
Conclusion
Mapping
Localization
Goal conditioned RL
Conclusion
Mapping
Localization
Goal conditioned RL
Language driven localization
Your browser does not support the video tag.
Acknowledgements
Thanks to my colleagues and collaborators
Prof. Jason Corso (Advisor)
Prof. Frank Dellaert
Prof. Manmohan Chandraker
Prof. Jeff Siskind
Prof. Chenliang Xu
Dr. Julian Ryde
Dr. Brent Griffin
Dr. Suren Kumar
Shurjo Banerjee
Madan Ravi Ganesh
Andrychowicz, Marcin, et al. "Hindsight experience replay." Advances in Neural Information Processing Systems. 2017.
Mutual localization: Two camera relative 6-dof pose estimation from reciprocal fiducial observation. V Dhiman, J Ryde, JJ Corso. IROS 2013
Learning Compositional Sparse Models of Bimodal Percepts. S Kumar, V Dhiman, JJ Corso AAAI, 2014
Voxel planes: Rapid visualization and meshification of point cloud ensembles.
J Ryde, V Dhiman, R Platt IROS, 2013
Modern MAP inference methods for accurate and fast occupancy grid mapping on higher order factor graphs. V Dhiman, A Kundu, F Dellaert, JJ Corso ICRA 2014
Continuous occlusion models for road scene understanding M Chandraker, V Dhiman. US Patent 9,821,813, 2017
A continuous occlusion model for road scene understanding V Dhiman, QH Tran, JJ Corso, M Chandraker. CVPR 2016
A Critical Investigation of DRL for Navigation V Dhiman, S Banerjee, B Griffin, JM Siskind, JJ Corso NIPS DRL Workshop, 2017.
Learning Compositional Sparse Bimodal Models S Kumar, V Dhiman, PA Koch, JJ Corso. PAMI, 2017.
(Mirowski et al. 2017) Learning to navigate in complex environments. In ICLR 2017.
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and
Request for Research. Matthias Plappert and Marcin Andrychowicz and Alex
Ray and Bob McGrew and Bowen Baker and Glenn Powell and Jonas Schneider
and Josh Tobin and Maciek Chociej and Peter Welinder and Vikash Kumar and
Wojciech Zaremba. ArXiV 2018. 1802.09464
Kaelbling, Leslie Pack. "Learning to achieve goals." IJCAI. 1993.
V. Dhiman, S. Banerjee, J. M. Siskind, and J. J. Corso. Learning goal-conditioned
value functions with one-step path rewards rather than goal-rewards. In
Submitted to ICLR, 2019. Under review.
Zachariou, Peter et al. “SPEEDING Effects on hazard perception and reaction time.” (2011).
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.
Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.
Pearl, Judea. "Fusion, propagation, and structuring in belief networks." Artificial intelligence 29.3 (1986): 241-288.
Jojic, Vladimir, Stephen Gould, and Daphne Koller. "Accelerated dual decomposition for MAP inference." ICML. 2010.
Merali, Rehman S., and Timothy D. Barfoot. "Occupancy grid mapping with Markov chain monte carlo Gibbs sampling." Robotics and Automation (ICRA), 2013 IEEE International Conference on. IEEE, 2013.
Bibliography