Why the Shortest Path is a Line – Blog

$ \require{cancel} \require{color} \newcommand{\p}[1]{\left(#1\right)} \newcommand{\bk}[1]{\lbrack#1\rbrack} \newcommand{\of}[1]{\!\!\,\p{#1}\!\mathinner{}} \newcommand{\norm}[1]{\lVert#1\rVert} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \newcommand{\given}{\,\middle|\,} \renewcommand{\Pr}[1]{\mathop{{}\mathbb{P}\mathopen{}\left(#1\right)\mathclose{}}} %\newcommand{\deriv}[1]{#1^{\prime}}\newcommand{\dderiv}[1]{#1^{\prime}} \newcommand{\deriv}[1]{\dot{#1}}\newcommand{\dderiv}[1]{\ddot{#1}} \renewcommand{\eval}[1]{\left.#1\right|} $

Why the Shortest Path is a Line Segment

We know that light takes the shortest path, but why is the shortest path in space always linear?

Most people take this fact for granted, but it can in fact be proven via the calculus of variations.
Unfortunately, many texts on this subject use advanced concepts and are difficult to comprehend.

Well, it turns out you don't need a cannon to kill a fly.

Here's a 10-minute proof of the idea that relies on introductory calculus (and a bit of ingenuity).

Problem (Fermat's principle)

The trajectory $\vec{r}$ of a ray of light traveling from $\vec{r}_1$ to $\vec{r}_2$ always lies on the shortest path connecting $\vec{r}\of{t_1} = \vec{r}_1$ to $\vec{r}\of{t_2} = \vec{r}_2$.
Prove that, if light has constant speed $c$, then the ray's trajectory lies on a line.

Solution Attempt

Initially, one may intuitively desire to make a statement such as: \begin{align*} {\frac{d}{d\vec{r}} \operatorname{Length}\of{\vec{r}} = 0} \end{align*} While the intention here is clear enough, the difficulty lies in formulating the problem better.
A better formulation requires indirect thinking and the introduction of an auxiliary variable, $\epsilon$.

Solution

First, let us make a couple of critical (though unintuitive) observations:
Observation 1: If $\vec{p}$ is the shortest path, then the trajectory of $\vec{p}$ lies on a line if and only if $\dderiv{\vec{p}}\of{t} = \vec{0}$.
Observation 2: Any path $\vec{r}$ can be written as $\vec{r} = \vec{p} + \epsilon\,\vec{\delta}$ where $\epsilon > 0$ and $\vec{\delta}\of{t_1} = \vec{\delta}\of{t_2} = \vec{0}$.

Now, here's the proof. The symbol $\star$ indicates my estimate of the ingenuity required for each step.

Notice that, although the first step is trivially true by definition of $\vec{p}$, it's also one of the least intuitive! \begin{align*} \frac{d}{d\epsilon} \eval{\operatorname{Length}\of{\vec{r}}}_{\epsilon=0} &= 0 && \tag{$\star$$\star$$\star$} \\ \frac{d}{d\epsilon} \eval{\int_{t_1}^{t_2} \norm{\deriv{\vec{r}}}\,dt}_{\epsilon=0} &= 0 && \text{(definition of arc length)} \\ \int_{t_1}^{t_2} \eval{\frac{d\norm{\deriv{\vec{r}}}}{d\epsilon}}_{\epsilon=0}\,dt &= 0 \\ \int_{t_1}^{t_2} \deriv{\vec{\delta}}\of{t} \cdot \frac{\deriv{\vec{p}}\of{t}}{\norm{\deriv{\vec{p}}\of{t}}} \,dt &= 0 && \text{($\vec{r} = \vec{p}$ when $\epsilon = 0$)} \\ \frac{1}{\norm{\deriv{\vec{p}}}}\int_{t_1}^{t_2} \deriv{\vec{\delta}}\of{t} \cdot \deriv{\vec{p}}\of{t} \,dt &= 0 && \text{($\norm{\deriv{\vec{p}}} = c$ is constant)} \tag{$\star$} \\ \cancelto{0}{\eval{\vec{\delta}\of{t} \cdot \frac{\deriv{\vec{p}}\of{t}}{\norm{\deriv{\vec{p}}\of{t}}}}_{t_1}^{t_2}} - \frac{1}{\norm{\deriv{\vec{p}}}}\int_{t_1}^{t_2} \vec{\delta}\of{t} \cdot \dderiv{\vec{p}}\of{t} \,dt &= 0 && \text{(integrate by parts)} \tag{$\star$$\star$} \\ \therefore\ \ \ \forall t \in \bk{t_1, t_2}\ \ \ \ \dderiv{\vec{p}}\of{t} &= \vec{0} && \tag{$\star$$\star$$\star$} \end{align*} The last step is subtle and follows from the fact that, since $\vec{\delta}$ could be nonzero on any arbitrary interval, $\dderiv{\vec{p}}$ must be zero everywhere in order for the equality to always hold.
(The letter $\vec{\delta}$ was intentionally chosen to remind familiar readers of the Dirac delta at the last step.)

A potential objection is that we assumed $\norm{\vec{p}} = c$ is a constant without proof.
But actually, if $\norm{\vec{p}}$ is not constant, the result no longer true... which we know from physics.