Towards better navigation

$ % Vector \newcommand{\vect}[1]{\mathbf{#1}} \newcommand{\hvect}[1]{\bar{\vect{#1}}} % unit vector \newcommand{\uvect}[1]{\hat{\vect{#1}}} % Field \newcommand{\field}[1]{\mathbb{#1}} % Trace \def\act{a} \def\actt{\act_t} \newcommand\acttT{\vect{\act}_{t:\epiT}} \newcommand{\epiT}{T} \newcommand{\spawnP}{P_{\text{spawn}}} % Map \def\mapsym{x} \def\map{\vect{\mapsym}} \newcommand{\est}[1]{\hat{#1}} \newcommand{\mapest}{\est{\vect{x}}} % Measurement or observation \newcommand{\meas}{z} \newcommand{\measurements}{\vect{\meas}_{1:t}} \newcommand{\meast}[1][t]{\meas_{#1}} % Pose \newcommand{\pose}{g} \newcommand{\poset}[1][t]{\pose_{#1}} \newcommand{\posetn}{\pose_{t+1}} \newcommand{\posetp}{\pose_{t-1}} \newcommand{\poses}{\vect{\pose}_{1:t}} \newcommand{\posest}{\est{\pose}} \newcommand{\posestt}{\est{\pose}_t} % Pose goal \newcommand{\posegoal}{d} \newcommand{\unaryminus}{\scalebox{0.5}[0.5]{\( - $}} \newcommand{\prevtime}{1:t\unaryminus1} \newcommand{\pastposes}{\vect{\pose}_{\prevtime}} \newcommand{\pastmeas}{\vect{\meas}_{\prevtime}} \newcommand{\pastobs}{\pastmeas, \pastposes} \newcommand{\mutlocest}{f_{\text{mutloc}}} \newcommand{\epiT}{T} \newcommand{\spawnP}{P_{\text{spawn}}} \newcommand{\param}{\theta} \newcommand{\policy}{\chi} % CVPR16 %\DeclareMathSymbol{\Tangent} \DeclareMathOperator{\diag}{diag} \DeclareMathOperator{\sech}{sech} \DeclareMathOperator{\poly}{poly} %\DeclareMathOperator*{\argmin}{\arg\min} \newcommand{\ptD}{\posDp} % FIXME: conflicts with \map, this should be \posDp \newcommand{\Obj}{C} % In mutloc C_p and C_q used for robots \newcommand{\ObjMu}{\vect{m}} % FIXME: in mutloc used as 3D pos of markers in frame 1, this should \pose \newcommand{\ObjR}{\bm{\omega}} \newcommand{\ObjSigma}{\bm{\Sigma}} \newcommand{\Logst}{\mathcal{L}} \newcommand{\OccF}{f_{\text{occ}}} \newcommand{\gOccF}{\nabla \OccF} \newcommand{\gOccFi}{\nabla \OccF^i} \newcommand{\impt}{\imgp{\ptD}} % Mahalonobis distance between to points \newcommand{\mahaF}{\zeta} % FIXME: use some greek letter \newcommand{\LogstIm}{\Logst_u} \newcommand{\tImMu}{\bm{\mu}} \newcommand{\tImS}{\bm{\Gamma}} \newcommand{\LogstD}{\Logst_{\dist}} \newcommand{\tDMu}{\nu} \newcommand{\meandepth}[1]{\tDMu_{#1}} \newcommand{\BB}{\mathbf{B}} \newcommand{\imBB}{\imgp{\BB}} \newcommand{\camO}{\mathbf{p}^c} \newcommand{\camR}{\bm{\omega}^c} \newcommand{\bepsilon}{\bm{\epsilon}} % 3D pose of the cars and ego motion \newcommand{\pos}[2]{\ObjMu^{#1}({#2})} \newcommand{\ori}[2]{\ObjR^{#1}(#2)} \newcommand{\state}[2]{\mathbf{s}^{#1}(#2)} % ego pose \newcommand{\egop}[1][t]{\pos{c}{#1}} \newcommand{\egoo}[1][t]{\ori{c}{#1}} \newcommand{\egos}[1][t]{\state{c}{#1}} % relative pose between camera and car $i$ \newcommand{\posealt}{\pose} % FIXME: Consider unifying with \pose \newcommand{\relp}[2]{\posealt^{#1}(#2)} \newcommand{\relpz}[2]{\posealt_z^{#1}(#2)} % 3D tracks on car $i$ in its own coordinate frame \newcommand{\tracklets}{\mathbf{X}^{i}_o} \newcommand{\tracklet}[1]{\ptD^{i}_{#1}} % track projections on camera \newcommand{\trackpit}[2]{\impt_{#1}(#2)} \newcommand{\trackp}[1]{\impt_j(#1)} \newcommand{\trackpj}[1]{\impt_j(#1)} % Unclassified track point projected on camera \newcommand{\ucTrackp}{\impt(t)} % dimensions of car $i$ \newcommand{\dimsn}[1]{\BB^{#1}} \newcommand{\expDimsn}{\hat{\BB}} % projection function \newcommand{\projectionOf}[1]{\proj_{\relp{i}{t}}\left(#1\right)} \newcommand{\projectionOft}[1]{\proj_{\relp{i}{t+1}}\left(#1\right)} \newcommand{\centerProj}{\bar{\proj}_{\relp{i}{t}}(\dimsn{i})} \newcommand{\cornerProj}[1]{\proj^{#1}_{\relp{i}{t}}(\dimsn{i})} \newcommand{\triangleProj}[1]{\triangle^{#1}_{\relp{i}{t}}(\dimsn{i})} % bounding box corners on image \newcommand{\bbt}[2]{\imBB^{#1}({#2})} \newcommand{\bb}[1]{\imBB^{#1}(t)} \newcommand{\bbMu}{\bm{\mu}} % FIXME: conflicts with logistic mean of trans prob \newcommand{\bbS}{\bm{\Lambda}} \newcommand{\Energy}[1]{\mathcal{E}^{it}_{\text{#1}}} \newcommand{\pEnergy}[1]{\mathcal{E}^{ijt}_{\text{#1}}} % Weighted energy \newcommand{\WEnergy}[1]{\lambda_{\text{#1}}\Energy{#1}} \newcommand{\WpEnergy}[1]{\lambda_{\text{#1}}\pEnergy{#1}} \newcommand{\EnergyCol}{\mathcal{E}^{ijt}_{\text{col}}} \newcommand{\WEnergyCol}{\lambda_{\text{col}}\EnergyCol} \newcommand{\EnergyBBoxNoOcc}{\Energy{detectNoOcc}} \newcommand{\EnergyBBox}{\Energy{detect}} \newcommand{\EnergyTrack}{ \pEnergy{track}} \newcommand{\EnergyTrackNoOcc}{\pEnergy{trackNoOcc}} \newcommand{\EnergyLane}{\Energy{lane}} \newcommand{\EnergySize}{\Energy{size}} \newcommand{\EnergyDyn}{\Energy{dyn}} \newcommand{\EnergyDynHol}{\Energy{dyn-hol}} \newcommand{\EnergyDynOri}{\Energy{dyn-ori}} \newcommand{\EnergyDynVel}{\Energy{dyn-vel}} \newcommand{\occFreeProj}[1]{\Pi_{\relp{i}{t}}(#1)} \newcommand{\minx}{x_{\text{min}}} \newcommand{\miny}{y_{\text{min}}} \newcommand{\maxx}{x_{\text{max}}} \newcommand{\maxy}{y_{\text{max}}} \newcommand{\frontface}{F^i_\text{FF}(t)} \newcommand{\occ}[1]{o({#1})} \newcommand{\face}{F^i_k(t)} \newcommand{\invProjectionOf}[1]{\pi^{-1}_{\relp{i}{t}}\left(#1\right)} \newcommand{\invProjectionOftm}[1]{\pi^{-1}_{\relp{i}{t-1}}\left(#1\right)} \newcommand{\occf}{\OccF^i(\ptD_j)} \newcommand{\occftot}{\OccF(\ptD_j)} \newcommand{\occft}[1]{\OccF(#1)} \newcommand{\ray}{\uvect{\ptD}_j} \newcommand{\occfray}{f_{occ}(\lambda\ray)} \newcommand{\lambdadist}{f_{\lambda}(\trackpj{t-1}, \lambda)} \newcommand{\occfxi}{L(\mathbf{x}; \pos{i}{t-1}, \Sigma_i)} \newcommand{\occfi}{L(\lambda \ray; \pos{i}{t-1}, \Sigma_i)} \newcommand{\PreflR}{P^{ij}_{\text{reflection}}} \newcommand{\Pobs}{P^{ij}_{\text{observation}}} \newcommand{\PtransR}{P^{j}_{\text{transmission}}} \newcommand{\PassocR}{P^{ij}_{\text{assoc}}} \newcommand{\assoc}{a^{ij}} \newcommand{\assocP}{\assoc(\lambda)} \newcommand{\assocPk}{\assoc(\lambda_k)} \newcommand{\Ereproj}{E^{ij}_{\text{reproj}}} \newcommand{\Ptransarg}[1]{\PtransR(#1)} \newcommand{\Ptrans}{\Ptransarg{\lambda}} \newcommand{\Ptransmud}{\Ptransarg{\mu^{i}_d}} \newcommand{\Prefl}{\PreflR(\lambda)} \newcommand{\Zrefl}{Z_{\text{refl}}} \newcommand{\dishort}{d_i(\mathbf{x})} \newcommand{\Lu}{\LogstIm(\impt, \tImMu^i,\tImS^i)} \newcommand{\Llambda}{\LogstD(\impt, \dist; \tDMu^i)} \newcommand{\Gauss}{\mathcal{N}} \newcommand{\PropDist}{\mathcal{W}_j} \newcommand{\muijl}{\mu^{ij}_{\dist}} \newcommand{\sigmaijl}{\sigma^{ij}_{\dist}} \newcommand{\Sigmait}{\bSigma^{i^{-1}}_o} \newcommand{\muit}{\bmu^{i}_o} \newcommand{\Sigmaic}{\bSigma'^{i^{-1}}_c} \newcommand{\muic}{{\bmu^{i}_c}} \newcommand{\Sigmaicf}{\bSigma^{i^{-1}}_c} \newcommand{\Sigmaicfinv}{\Sigma^{i}_c} \newcommand{\muiu}{\mu^{i}_t} \newcommand{\Sigmaiu}{\Sigma^{i^{-1}}_u} \newcommand{\uv}[1]{\hat{\mathbf{#1}}} \newcommand{\Trr}[3]{{}^{#1}{#2}_{#3}} \newcommand{\xymin}[1]{#1_{\text{min}}} \newcommand{\xymax}[1]{#1_{\text{max}}} \newcommand{\xt}{\mathbf{x}_t} \newcommand{\xc}{\mathbf{x}_c} \newcommand{\Rctot}{\bR} \newcommand{\tctot}{\bt} \newcommand{\tcmut}{\bt'} \newcommand{\Beizer}{B\'eizer } \newcommand{\LaneUncertainty}[1]{\Sigma_{L_m}(#1)} \newcommand{\projOnLane}[1]{\pi_{L_m(k)}(#1)} \newlength{\eqs} \setlength{\eqs}{-0.04in} \setitemize[0]{leftmargin=15pt} \newenvironment{tight_itemize}{ \begin{itemize}[leftmargin=10pt] \setlength{\topsep}{0pt} \setlength{\itemsep}{0pt} \setlength{\parskip}{0pt} \setlength{\parsep}{0pt} }{\end{itemize}} % DRL \def\state{s} \newcommand{\statet}[1][t]{\state_{#1}} \def\statetp{\state_{t-1}} \def\statehist{\state_{1:t-1}} \def\statetn{\state_{t+1}} \def\obs{\meas} \def\obst{\obs_t} \def\act{a} \newcommand{\actt}[1][t]{\act_{#1}} \def\acttp{\act_{t-1}} \def\acttn{\act_{t+1}} \def\Obs{\mathcal{O}} \def\ObsEnc{E_o} \def\ObsProb{P_o} \def\ObsFunc{C} \def\ObsFuncFull{\ObsFunc(\statet, \actt) \rightarrow \obst} \def\ObsFuncInv{\ObsFunc^{-1}} \def\ObsFuncInvFull{\ObsFuncInv(\obst, \statetp, \actt) \rightarrow \statet} \def\State{\mathcal{S}} \def\Action{\mathcal{A}} \def\TransP{P_{T}} \def\Trans{T} \def\TransFull{\Trans(\statet, \actt) \rightarrow \statetn} \def\TransObs{T_c} \def\Rew{R} \def\rew{r} \def\rewards{\vect{r}_{1:t}} \def\rewt{\rew_t} \def\rewtp{\rew_{t-1}} \def\rewtn{\rew_{t+1}} \def\RewFull{\Rew(\statet, \actt) \rightarrow \rewtn} \def\TransObsFull{\TransObs(\statet, \obst, \actt, \rewt; \theta_T) \rightarrow \statetn} \def\Value{V} \def\pit{\pi_t} \def\piDef{\pi(\acttn|\statet, \obst, \actt, \rewt; \theta_\pi) \rightarrow \pit(\acttn ; \theta_\pi)} \def\Valuet{\Value_t} \def\ValueDef{\Value(\statet, \obst, \actt, \rewt; \theta_\Value) \rightarrow \Valuet(\theta_\Value)} \def\R{\mathbb{R}} \def\E{\mathbb{E}} \newcommand{\LatencyOneGtOne}{Latency $1:>1$} \newcommand{\NavAiiiCDiDiiL}{NavA3C+D\textsubscript{1}D\textsubscript{2}L} \newcounter{Benchmark} \newcounter{BenchmarkB}[Benchmark] \newcommand{\ditem}[1]{\refstepcounter{Benchmark}\item[\arabic{Benchmark}. {#1}]} \newcommand{\dditem}[1]{\refstepcounter{BenchmarkB}\item[\arabic{Benchmark}.\alph{BenchmarkB}. {#1}]} \newcommand{\DistanceInefficiency}{Distance-inefficiency} % Observation probability % forward sensor model % Differentiable Floyd Warshall algorithm \newcommand{\graph}{G} \newcommand{\vtces}{V} \newcommand{\edges}{E} %\newcommand{\policy}{\Pi} \newcommand{\wts}{W} \newcommand{\distM}{D} \newcommand{\nil}{\emptyset} \newcommand{\qValue}{Q} \newcommand{\Q}{\qValue} \newcommand{\goal}{g} \newcommand{\Goal}{\mathcal{G}} \newcommand{\discount}{\gamma} \newcommand{\fwcost}{Q} \newcommand{\fw}{\fwcost} \newcommand{\fwargs}[5]{\fw_{#4}^{#5}\left({#1}, {#2}, {#3}\right)} \newcommand{\minedgecost}{\fwcost_0} \newcommand{\Rgoal}{R_{\text{goal}}} \newcommand{\Loo}{Latency-1:\textgreater1} \newcommand{\Loss}{\mathcal{L}} \newcommand{\LossText}[1]{\Loss_{\text{#1}}} \newcommand{\LossDDPG}{\LossText{ddpg}} \newcommand{\LossStep}{\LossText{step}} \newcommand{\LossLo}{\LossText{lo}} \newcommand{\LossUp}{\LossText{up}} \newcommand{\LossTrieq}{\LossText{trieq}} \newcommand{\tgt}{\text{tgt}} \newcommand{\Qstar}{\Q_{*}} \newcommand{\Qtgt}{\Q_{\text{tgt}}} \newcommand{\ytgt}{y_t} \newcommand{\bpmsg}[4]{\mu^{#4}_{#1\rightarrow#2}(#3)} \newcommand{\msg}[3]{\mu_{#1#2}(#3)} \newcommand{\discount}{\gamma} \newcommand{\problemobjective}{\begin{align} \vect{\act}_{1:\epiT} &= \arg \max_{\vect{\act}_{1:\epiT}} \E \left[\left. \discount^{k}\mathbb{1}_{\|\posegoal - \pose_k\|_2 \le \epsilon} \right| \vect{\meas}_{1:\epiT} \right] \end{align}} \)

Efficient Dual decomposition

Find minimum of each pattern by taking the minimum message
Finding minimum of otherwise case, there are two cases
- If the otherwise can be enumerated efficienty, find the minimum over all enumerations
- Otherwise assume that the fixed patterns covered by the enumerated patterns are sparse. Find n-best maximas (or minimas) till the assignment do not coincides with enumerated patterns.
Find minimum among all the patterns

Related work

Intention-net by Wei et al. (2017) uses planner for goal directed navigation when the map is given. They do not address the problem of mapping in a hierarchical setup.
Pfeiffer et al. (2016) use laser scans and are able to transfer to previously unseen real-world environments. However, the goal is usually visible and no long term path planning is required.

Occupancy grid to factor graph

$V$ = Map cells are variable nodes
$F$ = Laser measurements are factor nodes
$ E = \{(i, f) : i \in V, f \in F, \text{laser $f$ passes through cell $i$}\}$
Factor graph $G = (V, F, E) $
Represents probability factorization $ P(X_V) = \prod_{f \in F}\Psi(X_{\text{Nbrs}(f)}) $

Problem formulation

\begin{align} \|s_1\uvect{p}_1-s_2\uvect{p}_2\| &=\| \vect{q_1}- \vect{q_2}\| \\ \end{align}

\begin{align} \|s_2\uvect{p}_2- \vect{p_3}\| &=\| \vect{q_2}-s_3\uvect{q}_3\| \\ \| \vect{p_3}-s_1\uvect{p}_1\| &=\|s_3\uvect{q}_3- \vect{q_1}\| \end{align}

Solving for $s_1$ $s_2$ and $s_3$

\begin{align} s_1^2 + s_2^2 - 2s_1s_2\uvect{p}_1^\top\uvect{p}_2 - \|\vect{q}_1 - \vect{q}_2\|^2 &= 0 \\ % s_2^2 - s_3^2 - 2s_2\uvect{p}_2^\top\vect{p}_3 + 2s_3\vect{q}_2^\top\uvect{q}_3 + \|\vect{p}_3\|^2 - \|\vect{q}_2\|^2 &= 0 \\ % s_1^2 - s_3^2 - 2s_1\uvect{p}_1^\top\vect{p}_3 + 2s_3\vect{q}_1^\top\uvect{q}_3 + \|\vect{p}_3\|^2 - \|\vect{q}_1\|^2 &= 0 \label{eq31} \end{align}

Eliminating $s_2$ and $s_3$

\begin{align} r_2(s_1) = \begin{vmatrix} c_4 & c_3 & c_2 & c_1 & c_0 & 0 \\ 0 & c_4 & c_3 & c_2 & c_1 & c_0\\ 1 & d_1 & d_0 & 0 & 0 & 0 \\ 0 & 1 & d_1 & d_0 & 0 & 0 \\ 0 & 0 & 1 & d_1 & d_0 & 0 \\ 0 & 0 & 0 & 1 & d_1 & d_0 \end{vmatrix} = 0. \label{eq:finalsylvester} \end{align}

This is a 8-degree polynomial with 8-roots. Which one is the right solution?

Filtering roots

\begin{align} \| \vect{p_3}-s_1\uvect{p}_1\| &=\|s_3\uvect{q}_3- \vect{q_1}\| \end{align}

\begin{align} %(s_1 + s_2)^2 - 2s_1s_2(1 + \uvect{p}_1^\top\uvect{p}_2) - \|\vect{q}_1 - %\vect{q}_2\|^2 &= 0 % \\ %(s_2 - \uvect{p}_2^\top\vect{p}_3)^2 %- (s_3 - \vect{q}_2^\top\uvect{q}_3)^2 %+ (\vect{p}_3 - \uvect{p}_2)^\top\vect{p}_3 %- \vect{q}_2^\top(\vect{q}_2 - \uvect{q}_3) &= 0\\ (s_1 - \uvect{p}_1^\top\vect{p}_3)^2 - (s_3 - \vect{q}_1^\top\uvect{q}_3)^2 + (\vect{p}_3 - \uvect{p}_1)^\top\vect{p}_3 - \vect{q}_1^\top(\vect{q}_1 - \uvect{q}_3) &= 0 \end{align}

$\uvect{p}_1^\top\vect{p}_3 $

$s_1 - \uvect{p}_1^\top\vect{p}_3 $

\begin{align*} s_2 - \uvect{p}_2^\top\vect{p}_3 &> 0\\ s_3 - \vect{q}_2^\top\uvect{q}_3 &> 0\\ s_1 - \uvect{p}_1^\top\vect{p}_3 &> 0\\ \end{align*}

Rewards

End-to-End navigation using RL
      
     Mapping 
      Localization 
      End-to-End RL
     Goal conditioned RL

Learning to navigate in complex environments (Mirowski et al. 2017)

(Mirowski et al. ICLR 2017).

Do DRL algorithms really learn to navigate?

Differences in setup from Mirowski et al. (2017)

We use small wall penality (-0.2).
We use a 4-action space while Mirowski el. (2017) use 8-action space.

Experimental Setup

Episode (1200 steps):
- New maze : Random or Static
- New goal location: Random or Static
- On finding the goal.
  - New spawn location: Random or Static
Goal: Maximize number of times the goal found per episode.

Latency metric

\[ \text{Latency }1:>1 = \frac{\text{Time taken to find goal first time}} {\text{Average time taken to find goal thereafter}} \]

Distance inefficiency

\[ \text{Distance inefficiency (after first goal hit)} = \frac{\text{Distance traveled}} {\text{Approx. shortest path}} \]

Qualitative results seen maps

Seen env.

Qualitative results unseen maps

Unseen env.

Square map

Wrench map

Goal map

Take aways

The algorithm learns:

to follow walls and explore.
better than random but not optimal policy (random goal, static maps).
no better than random policy (random goal, random map).
DRL holds potential for faster navigation but we are not there yet.

Goal conditioned RL
      
     Mapping 
      Localization 
      End-to-End RL
     Goal conditioned RL

All metrics

Goal conditioned reinforcement Learning

Formulation

Goal based rewards

\begin{align} \Q_{*}(\state_t, \act_t, \goal) &= \begin{cases} \Rew(\state_t, \act_t, \goal) + \discount \max_{\act \in \Action} \Q_{*}(\state_{t+1}, \act, \goal) & \text{ if } t < T \\ \Rew(\state_T, \act_T, \goal) & \text{ if } t = T \end{cases}. \end{align}

Path based rewards

\begin{align} \fwargs{\state_t}{\act_t}{\goal^*}*P &= \begin{cases} \Rew^P(\state_t, \act_t) + \discount \max_{\act \in \Action} \fwargs{\state_{t+1}}\act{\goal^*}*P &\text{ if } t < l-1, \goal_l = \goal^* \\ \Rew^P(\state_{l-1}, \act_{l-1}) & \text{ if } t = l-1, \goal_l = \goal^* \end{cases} \end{align}

Navigation with autonomous cars

Google trends for 'Autonomous cars'

Autonomous cars need detailed maps

Source:Waymo blog (Medium.com, March 2016)

Navigation without provided maps

Maps change

Some areas may not be mapped

Dependence on other mapping agents

Indoor navigation for mobile robots