126 lines
7.8 KiB
TeX
126 lines
7.8 KiB
TeX
\documentclass{article}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[margin=1in]{geometry}
|
|
\usepackage{amsfonts, amsmath, amssymb}
|
|
\usepackage[none]{hyphenat}
|
|
\usepackage{fancyhdr}
|
|
\usepackage{titling}
|
|
\usepackage{changepage}
|
|
\usepackage{graphicx}
|
|
\usepackage{wrapfig}
|
|
|
|
\renewcommand\maketitlehooka{\null\mbox{}\vfill}
|
|
\renewcommand\maketitlehookd{\vfill\null}
|
|
|
|
\pagestyle{fancy}
|
|
\fancyhead[L]{\slshape Algorithm Details}
|
|
\fancyhead[R]{Kenneth Jao}
|
|
\fancyfoot[C]{\thepage}
|
|
|
|
\title{Lossy Compression through Multi-dimensional Fourier Transforms }
|
|
\author{Kenneth Jao}
|
|
\date{February 2019}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
\pagebreak
|
|
|
|
\section{Introduction}
|
|
This document covers the details of the mathematics of the compression algorithm. Each main mathematical application will be explained in detail. Firstly an overview of the algorithm as a whole will be given.
|
|
|
|
\subsection{Notation}
|
|
\begin{adjustwidth}{.5in}{}
|
|
Data Matrix: \[ \left[ {\begin{array}{cccc} A & B & C & \dots \end{array}} \right] or \left[\ A,\ B,\ C,\ \dots \ \right]\]
|
|
Will mean that the dimensions 0, 1, 2, \dots will correspond to A, B, C, \dots and represents an array of data, not a mathematical matrix.\\
|
|
\newline
|
|
N-Dimensional Polynomial:\\
|
|
\begin{adjustwidth}{.5in}{}
|
|
$\vec{e_i}=(e_1,\ e_2,\ e_3, \dots e_n) \in \mathbb{N}^n$ will represent the i-th permutation of $\vec{e}$.\\
|
|
$\vec{x}^{\vec{e}} = x_1^{e_1}\cdot x_2^{e_2}\cdot \dots x_n^{e_n}$ where $\vec{x}=(x_1,\ x_2,\ x_3, \dots x_n) \in \mathbb{R}^n$\\
|
|
An arbitrary n-dimensional polynomial of degree m $P(\ x_1,\ x_2,\ x_3,\dots,\ x_n)$ will equal: \[ a_1x^{\vec{e_1}} + a_2x^{\vec{e_2}} + a_3x^{\vec{e_3}} + \dots + a_{max}x^{\vec{e_{max}}}\]
|
|
Where $max=(m+1)^n$
|
|
\end{adjustwidth}
|
|
|
|
\end{adjustwidth}
|
|
|
|
\subsection{Algorithm Overview}
|
|
|
|
\begin{adjustwidth}{.5in}{}
|
|
A set of video data in given in the format: \[ \left[ {\begin{array}{cccc} Frame & Y Pixel & X Pixel & RGB \end{array}} \right] \]
|
|
After receiving this video data, it will be processed into square blocks of dimension $N$, denoted by $Block_{w_1,w_2,w_3,...}$ where ${w_1, w_2, w_3,...}$ represents the sorting of $N$ dimensional blocks in a format that correspond to the location of the data in the ordered video. For instance, given a data in format $[\ x,\ y,\ z\ ]$, square blocks of dimension $N$ can be extracted such that the new data structure is now in the form $[\frac{x}{N},\frac{y}{N},\frac{z}{N}, N, N, N]$, and thus, in this case, $w_1 = \frac{x}{N},\ w_2 = \frac{y}{N},\ w_3 = \frac{z}{N}$. Afterwards, there are two possible steps.
|
|
|
|
\begin{enumerate}
|
|
\item The first option is to purely fit a polynomial to the data of each block. That is, find a polynomial of degree $N-1$ and of dimension $N$ where each $Block_{w_1,w_2,w_3,...}[\ x_1,\ x_2,\ x_3,\ \dots \ ]$ will correspond to some $P(\ x_1,\ x_2,\ x_3,\ \dots,\ x_n)$. The output will be the coefficients of the new polynomial.
|
|
\item The second option is to attempt to create a polynomial of degree $N-1$ and of dimension $N$ where its coefficients match first $N$ terms of the Taylor Polynomial of $cos(x_1+x_2+x_3+\dots +x_n)$. The coefficients of the new polynomial will attempt to as close to the Taylor Polynomial. The output will be the coefficients of the new polynomial along with a separate function, its usage explained later in the methodology.
|
|
\end{enumerate}
|
|
|
|
\noindent An $N$ dimensional Discrete Fourier Transform (DFT) will now be performed on these blocks. The output is the first $k$ coefficients, where $k$ is determined when the sum of a sufficient $k$ coefficients is greater than 90\% of the convergence of the $N$ dimensional Fourier Series. The 'first' components will be determined by a counting method discussed later.\\
|
|
\newline
|
|
Now, these coefficients corresponding to each $Block_{w_1,w_2,w_3,...}$ will be flattened to 2 dimensions. A Discrete Cosine Transform (DCT) will now be applied onto these coefficients, with the rounding threshold higher, for the sake of precision on coefficients.\\
|
|
\newline
|
|
The algorithm is now complete. The final products will include: a 2 dimensional DCT of the coefficients of an $N$ dimensional DFT, function for each block. The raveling process will be standard for a given dimensional size, so this needs not be stored.
|
|
\end{adjustwidth}
|
|
|
|
\pagebreak
|
|
|
|
\section{Changing Dimensions}
|
|
|
|
Firstly, the video is turned into a basic 3 dimensions. This is done by stacking the Frame and RGB dimensions and swapping the Y Pixel and X Pixel to make it more intuitively oriented. The new form becomes: \[ \left[ {\begin{array}{ccc} X Pixel & Y Pixel & Frame+RGB \end{array}} \right] \]
|
|
That is to say, $[\ x,\ y,\ 3f\ ]$, $[\ x,\ y,\ 3f+1\ ]$, $[\ x,\ y,\ 3f+2\ ]$ would lie in frame $f$, and represent the R, G, and B values, respectively. From here, we can start slicing the data in certain ways to obtain our desired block dimension.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=2in]{images/RGB.png}
|
|
\caption{Visualization of flattened video data.}
|
|
\end{figure}
|
|
|
|
\subsection{3-Dimensional Block}
|
|
|
|
\begin{adjustwidth}{.5in}{}
|
|
To obtain 3-Dimensional blocks, in this case, we simply can just extract cubes from the already flattened video data. (See Figure 2.) This arises in a data format of $[\frac{x}{N},\frac{y}{N},\frac{z}{N}, N, N, N]$. In this algorithm, $N=6$ was chosen.
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=3in]{images/3DBlocks.png}
|
|
\caption{3D Blocked Video Data}
|
|
\end{figure}
|
|
|
|
\end{adjustwidth}
|
|
|
|
\subsection{4-Dimensional Block}
|
|
To obtain 4-Dimensional blocks, in this case, we extract a $NxNxN$ cube and then take $N$ cubes downward. (See Figure 3.) This arises in a data format of $[\frac{x}{N},\frac{y}{N^2},\frac{z}{N}, N, N, N, N]$. In this algorithm, $N=6$ was chosen.
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=3in]{images/4DBlocks.png}
|
|
\caption{4D Blocked Video Data}
|
|
\end{figure}
|
|
|
|
\section{Polynomial Fitting}
|
|
There are two potential outputs of polynomial fitting. FINISH
|
|
\subsection{Pure Polynomial Fitting}
|
|
The given data $D$ is in format: \[ \left[ \begin{array}{ccccc} x_1 & x_2 & x_3 & \dots & x_n \end{array} \right] \]
|
|
Where its size is $(m+1,\ m+1,\, m+1,\ \dots,\ m+1)$. To find an n-dimensional polynomial of degree m, all that is necessary is find a solution to this system of linear equations: \
|
|
\[ \left[ \begin{array}{ccccc}
|
|
\vec{e_1}^{\vec{e_1}} & \vec{e_1}^{\vec{e_2}} & \vec{e_1}^{\vec{e_3}} & \dots & \vec{e_1}^{\vec{e_{max}}}\\
|
|
\vec{e_2}^{\vec{e_1}} & \vec{e_2}^{\vec{e_2}} & \vec{e_2}^{\vec{e_3}} & \dots & \vec{e_2}^{\vec{e_{max}}}\\
|
|
\vec{e_3}^{\vec{e_1}} & \vec{e_3}^{\vec{e_2}} & \vec{e_3}^{\vec{e_3}} & \dots & \vec{e_3}^{\vec{e_{max}}}\\
|
|
\vdots & \vdots & \vdots & \vdots & \vdots \\
|
|
\vec{e_{max}}^{\vec{e_1}} & \vec{e_{max}}^{\vec{e_2}} & \vec{e_{max}}^{\vec{e_3}} & \dots & \vec{e_{max}}^{\vec{e_{max}}}
|
|
\end{array} \right]
|
|
\left[ \begin{array}{c} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_{max} \end{array} \right] =
|
|
\left[ \begin{array}{c} D[\vec{e_1}] \\ D[\vec{e_2}] \\ D[\vec{e_3}] \\ \vdots \\ D[\vec{e_{max}}] \end{array} \right]\]
|
|
Thus, to find the coefficients, all that is necessary is to compute:
|
|
\[ {\left[ \begin{array}{ccccc}
|
|
\vec{e_1}^{\vec{e_1}} & \vec{e_1}^{\vec{e_2}} & \vec{e_1}^{\vec{e_3}} & \dots & \vec{e_1}^{\vec{e_{max}}}\\
|
|
\vec{e_2}^{\vec{e_1}} & \vec{e_2}^{\vec{e_2}} & \vec{e_2}^{\vec{e_3}} & \dots & \vec{e_2}^{\vec{e_{max}}}\\
|
|
\vec{e_3}^{\vec{e_1}} & \vec{e_3}^{\vec{e_2}} & \vec{e_3}^{\vec{e_3}} & \dots & \vec{e_3}^{\vec{e_{max}}}\\
|
|
\vdots & \vdots & \vdots & \vdots & \vdots \\
|
|
\vec{e_{max}}^{\vec{e_1}} & \vec{e_{max}}^{\vec{e_2}} & \vec{e_{max}}^{\vec{e_3}} & \dots & \vec{e_{max}}^{\vec{e_{max}}}
|
|
\end{array} \right]}^{-1}
|
|
\left[ \begin{array}{c} D[\vec{e_1}] \\ D[\vec{e_2}] \\ D[\vec{e_3}] \\ \vdots \\ D[\vec{e_{max}}] \end{array} \right]\]
|
|
|
|
\noindent Thus, $a_1,\ a_2,\ a_3, \dots,\ a_3$ has now been found.
|
|
|
|
\subsection{Cosine Taylor Polynomial Fitting}
|
|
|
|
\end{document}
|