\documentclass[11pt]{article}
\usepackage{latexsym}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{bbm}
\usepackage{graphicx}
\usepackage{epsfig}
\usepackage[right=0.8in, top=1in, bottom=1.2in, left=0.8in]{geometry}
\usepackage{setspace}
\usepackage[linewidth=1pt]{mdframed}
\usepackage{lipsum}
\usepackage{float}
\spacing{1.06}
\newcommand{\handout}[5]{
\noindent
\begin{center}
\framebox{
\vbox{\vspace{0.25cm}
\hbox to 5.78in { {COMS 4995-2:\hspace{0.12cm}Advanced
Algorithms (Spring'20)} \hfill #2 }
\vspace{0.48cm}
\hbox to 5.78in { {\Large \hfill #5 \hfill} }
\vspace{0.42cm}
\hbox to 5.78in { {#3 \hfill #4} }\vspace{0.25cm}
}
}
\end{center}
\vspace*{4mm}
}
\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribes:\hspace{0.08cm}#4}{Lecture #1}}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{observation}[theorem]{Observation}
\newtheorem{example}[theorem]{Example}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{fact}[theorem]{Fact}
\newtheorem{assumption}[theorem]{Assumption}
\newcommand{\E}{{\mathbb E}}
\DeclareMathOperator{\var}{Var}
\begin{document}
\lecture{21: Large Scale Models}{Apr
14, 2020}{Instructor:\hspace{0.08cm}\emph{Alex Andoni }}{\emph{Yotam Alexander}}
\section{Recap: I/O model (External Memory Model)}
\bigskip
\bigskip
\begin{figure}[H]
\centering
\includegraphics[scale = 0.8]{EMM}
\caption{I/O Model}
\label{fig1}
\end{figure}
\bigskip
We briefly recall the I/O model:
\begin{itemize}
\item We have a cache of size M, and an external memory unit which has unbounded storage space.
\item Both are composed of lines of length B.
\item To access a location in external memory, we need to import its entire line to the cache.
\item We define "Time"/"cost" of an algorithm to be the total number of lines brought into the cache.
\end{itemize}
\bigskip
\textbf{Problem \(0\)}: Searching for \(x\) in an (unsorted) array A of size N via linear scan. This can be done in "time" \(O(\frac{N}{B}+1)\), which is unsurprising because an array of size N occupies \(\approx\frac{N}{B}\) cache lines.
\bigskip
\textbf{Problem \(1\)}: Searching for \(x\) in a sorted array of size N via binary search. We know that in the standard (RAM) model, this can be done in \(O(\log(N))\). In the I/O model the analysis proceeds as follows:
\begin{enumerate}
\item if \(N \leq B\) then we need at most 2 CLs.
\item if \(N>>B\): At the beginning of each iteration we search in the range \([l,r]\subset\{1,2,...,N\}\). As long as \(r-l>B\), we need to bring another CL for each iteration/halving of the range. When \(r-l\leq B\), the previous case applies.
\end{enumerate}
Thus the overall cost is \(O(\log(\frac{N}{B})+1)\).
\bigskip
Note that compared to the standard model, we achieved only an additive improvement (\(O(\log(N)-\log(B)+1)\) vs \(O(\log(N)\)), whereas in the unsorted search we achieved a multiplicative speedup (\(O(\frac{N}{B})\) vs \(O(N)\)). Thus we would like to design an algorithm that achieves an analogous speedup for the sorted array problem as well.
\bigskip
\textbf{Problem \(2\)}: Searching for x in a Binary Search Tree:
\bigskip
\begin{figure}[H]
\centering
\includegraphics[scale = 0.8]{Tree1}
\caption{BST, with each node storing an element of A}
\label{fig1}
\end{figure}
Where \(a\) is the current node, and the binary tree is ordered in the sense that \(x\leq a\leq y\) for any \(x,y\) in the subtrees \(b,c\) respectively.
\bigskip
\textbf{Analysis}:
\begin{itemize}
\item if \(N \leq 2B\) then we need at most 3 CLs.
\item if \(N>2B\): As long as the subtree rooted at the current node is of size \(\geq 2B\), we need a new CL for each "split".
\end{itemize}
Thus overall again we require \(O(log(\frac{N}{B})+1)\), which isn't an improvement...
\bigskip
\textbf{Problem \(3\)}:
\bigskip
\textbf{Idea}: To get a multiplicative speedup, we re-organize the tree to utilize the CLs more efficiently.
\bigskip
\textbf{B-Tree}
\begin{figure}[H]
\centering
\includegraphics[scale = 0.8]{B-Tree}
\caption{B-Tree}
\label{fig1}
\end{figure}
\bigskip
Each nodes now stores \(B-1\) elements of A, and has B children, and the tree satisfies the sortedness property:
\[A_{1}\leq a_{1}\leq A_{2}\leq a_{2}\leq...\leq a_{B-1}\leq A_{B}\]
Note the if B=1 we recover the definition of a BST.
\bigskip
The search proceeds as before.
\bigskip
\textbf{Analysis}: Each node requires at most 2 CLs (we store \(B-1\) numbers and \(B\) pointers to its children). This implies that the number of CLs we need to bring is \(\leq 2\times\) (height of the tree). The height of the tree is \(\lceil \log_{B}({N})\rceil\), so overall we require \(O(\log_{B}({N}+1)=O(\frac{Log(N)}{Log(B)}+1)\) (where as before the constant is required to deal with the case that \(N\leq B)\).
\bigskip
\textbf{Example}: If \(N=2^{20}\) and \(B=2^{10}\), using a binary tree would require \(20-10=10\) CLs, whereas a \(B\) tree would require only \(\frac{20}{10}=2\).
\bigskip
\textbf{Remark 1}: The runtime (number of CPU operations), once we take into account operations within each CL, is \(\log_{B}(N)\times B\) (using a naive scan of the elements in each node), but using Binary Search instead reduces this to \(\log_{B}(N)\times \log(B)=\log(N))\).
\bigskip
\textbf{Remark 2}: Sorting can be implemented at cost \(O(\frac{N}{B}\times \log_{\frac{M}{B}}(\frac{N}{B}))\), using an appraoch analogous to a \(\frac{M}{B}\) -way merge-sort.
\section{Cache Oblivious Model}
In the I/O model, using the above algorithms requires knowledge of B,M. This is problematic for the following reasons:
\begin{enumerate}
\item We would like to design algorithms that are hardware independent.
\item The value of of the parameters \(B,M\) could change over time, e.g because multiple processors use the same cache.
\item In realistic applications there are multiple caches with different parameters.
\end{enumerate}
Thus we would like to design efficient algorithms that don't require knowledge of B,M.
\begin{theorem} We can achieve the same performance, i.e \(O(\frac{\log(N)}{\log(B)})\), in the cache oblivious model.
\end{theorem}
\textbf{First Approach- Heaps}: A heap is a (balanced) BST stored in an array, where if a node is stored in the i-th cell, its children are stored in the cells \(2i+1\) and \(2i+2\). This approach doesn't work however, because searching for \(x\) in this heap requires:
\begin{itemize}
\item A single CL for the first B elements, which correspond to depth \(Log(B)\) from the root.
\item Another CL for each additional "layer", of which \(\log(N)-\log(B)\) remain.
\end{itemize}
And thus \(\log(N)-\log(B)+1\) CLs overall.
\begin{figure}[H]
\centering
\includegraphics[scale = 0.8]{van-emde-boas-layout}
\caption{VEB Layout}
\label{fig1}
\end{figure}
\textbf{New Data Structure}: We store the BST using the Van Emde Boas layout. To store a tree \(T\) with \(N\) nodes, we proceed as follows:
\begin{itemize}
\item If \(N\leq 16\) we store T in an arbitrary way.
\item Otherwise, we store \(T_{0},T_{1},...,T_{\sqrt N}\) consecutively, where \(T_{0}\) consists of the upper half of T (namely all nodes of distance \(\leq \frac{n}{2}\) from the root), and \(T_{0},T_{1},...,T_{\sqrt N}\) are the subtrees rooted at the leaves of \(T_{0}\). Each one of the trees \(T_{0},T_{1},...,T_{\sqrt N}\) is stored recursively.
\end{itemize}
We will prove that using binary search on a BST stored in this way costs \(O(\frac{\log(N)}{\log(B)})\) in the next lecture.
\end{document}