Chapter 5: Overview of the Decoding Process / 5章復号処理の概要

A VP8 decoder needs to maintain four YUV frame buffers whose resolutions are at least equal to that of the encoded image. These buffers hold the current frame being reconstructed, the immediately previous reconstructed frame, the most recent golden frame, and the most recent altref frame.

　VP8復号器は4枚のYUVフレームのバッファを維持する必要がある．そのフレームの解像度は少なくとも符号化画像のそれと同じである．このバッファは以下のフレームを保持している．現在再構成しているフレーム，直前に再構成されたフレーム，最近のゴールデンフレーム，そして最近の代替参照フレームである．

Most implementations will wish to “pad” these buffers with “invisible” pixels that extend a moderate number of pixels beyond all four edges of the visible image. This simplifies interframe prediction by allowing all (or most) prediction blocks ― which are not guaranteed to lie within the visible area of a prior frame ― to address usable image data.

　たいていの実装は見えない画素を使ってこれらのバッファを埋めたいだろう．それらの見えない画素は，見えている画像の四辺全ての領域を超えてある程度拡張する．全て（若しくは大部分）の予測ブロックを利用可能な画像データに位置づけることを許容することでインターフレーム予測を単純化する．事前フレームの見えている領域の中に位置づけることを保証はしない．

Regardless of the amount of padding chosen, the invisible rows above (below) the image are filled with copies of the top (bottom) row of the image; the invisible columns to the left (right) of the image are filled with copies of the leftmost (rightmost) visible row; and the four invisible corners are filled with copies of the corresponding visible corner pixels. The use of these prediction buffers (and suggested sizes for the halo) will be elaborated on in the discussion of motion vectors, interframe prediction, and sub-pixel interpolation later in this document.

　埋め込み量の選択によらず，画像の上部（下部）の不可視行群は画像の上部（下部）行のコピーにより充填され，画像の左部（右部）の不可視列群は画像の左部（右部）のコピーにより充填され，四つの不可視コーナーは対応する角の画素により充填される．これらの予測バッファの利用（とハロとして示唆されたサイズ）は後で本稿の動きベクトル，インターフレーム予測，サブピクセル補間の議論の中で詳しく述べる．

As will be seen in the description of the frame header, the image dimensions are specified (and can change) with every key frame. These buffers (and any other data structures whose size depends on the size of the image) should be allocated (or re-allocated) immediately after the dimensions are decoded.

　フレームヘッダの詳述で見るように，画像の次元数は全てのキーフレームで特定される（変更も可能）．これらのバッファ（と画像のサイズに依存した他のデータ構造のサイズ）は次元が復号されると同時に確保（若しくは再確保）されるべきである．

Leaving most of the details for later elaboration, the following is an outline the decoding process. First, the frame header (beginning of the first data partition) is decoded. Altering or augmenting the maintained state of the decoder, this provides the context in which the per-macroblock data can be interpreted.

　この後の労作のために大部分の詳細は残しておいて，以下では復号処理の流れを述べる．最初にフレームヘッダ（最初のデータ区分から始まる）が復号される．復号器の維持状態を代替または改善するために，マクロブロックごとのデータを解釈出来るコンテキストを提供する．

The macroblock data occurs (and must be processed) in raster-scan order. This data comes in two or more parts. The first (prediction or mode) part comes in the remainder of the first data partition. The other parts comprise the data partition(s) for the DCT/WHT coefficients of the residue signal. For each macroblock, the prediction data must be processed before the residue.

マクロブロックデータはラスタスキャン順に発生し（そして処理される）．このデータは二つ，またはそれ以上の区分へ入る．最初の（予測かモードの）領域は最初のデータ区分の残りに入っている．他の領域は差分信号のDCT/WHT係数のためにデータ領域を構成する．それぞれのマクロブロックについて，予測データは差分の前に処理されなければならない．

Each macroblock is predicted using one (and only one) of four possible frames. All macroblocks in a key frame, and all intra-coded macroblocks in an interframe, are predicted using the already-decoded macroblocks in the current frame. Macroblocks in an interframe may also be predicted using the previous frame, the golden frame or the altref frame. Such macroblocks are said to be inter-coded.

　それぞれのマクロブロックは4つの利用可能なフレームの一つ（一つだけ）を用いて予測される．キーフレーム内の全マクロブロック，およびインターフレーム内の全イントラ符号化マクロブロックは現在フレームの中のですでに復号されたマクロブロックを使って予測される．インターフレーム内のマクロブロックは直前のフレーム，ゴールデンフレーム，代替参照フレームのいずれかを用いて予測されるかもしれない．このようなマクロブロックはインター符号化されていると説明される．

The purpose of prediction is to use already-constructed image data to approximate the portion of the original image being reconstructed. The effect of any of the prediction modes is then to write a macroblock-sized prediction buffer containing this approximation.

　予測の目的は再構成されつつある原画像の一部を近似するためにすでに構成された画像データをに使うことにある．いずれの予測モードにおける効果は，この近似を含むマクロブロックサイズの予測バッファに書き込むことである．

Regardless of the prediction method, the residue DCT signal is decoded, dequantized, reverse-transformed, and added to the prediction buffer to produce the (almost final) reconstruction value of the macroblock, which is stored in the correct position of the current frame buffer.

　予測方法によらず，差分DCT信号は復号され，逆量子化され，逆変換され，予測バッファに加算されて，マクロブロックの再構成値（たいていは最終的な値）を生成する．これらは現在フレームバッファの該当する位置に蓄積される．

The residue signal consists of 24 (sixteen Y, four U, and four V) 4x4 quantized and losslessly-compressed DCT transforms approximating the difference between the original macroblock in the uncompressed source and the prediction buffer. For most prediction modes, the zeroth coefficients of the sixteen Y subblocks are expressed via a 25th WHT of the second-order virtual Y2 subblock discussed above.

　差分信号は24個（16個の輝度，4個の色差U，4個の色差V）の4x4の量子化され歪みのある圧縮をされたDCT変換から構成される．DCT変換は非圧縮ソースと予測バッファに含まれるマクロブロックの差分を近似している．たいていの予測モードのために，16個のYサブブロックの0番目の係数群は，上記で述べた25番目のWHTである第2仮想Y2サブブロックとしして表される．

Intra-prediction exploits the spatial coherence of frames. The 16x16 luma (Y) and 8x8 chroma (UV) components are predicted independently of each other using one of four simple means of pixel propagation, starting from the already-reconstructed (16-pixel long luma, 8-pixel long chroma) row above and column to the left of the current macroblock. The four methods are:

　イントラ予測はフレームの空間的な一貫性を利用する．16x16の輝度（Y)と8x8の色差（UV）成分は4種類の単純な意味での画素伝搬手法を用いてそれぞれ独立に予測される．現在のマクロブロックからみてすでに再構成された行方向の上と列方向の左から始まる．4つの手法は以下の通り．

1. Copying the row from above throughout the prediction buffer.
2. Copying the column from left throughout the prediction buffer.
3. Copying the average value of the row and column throughout the prediction buffer.
4. Extrapolation from the row and column using the (fixed) second difference (horizontal and vertical)
from the upper left corner.

1. 予測バッファを通じて上部から行をコピーする
2. 予測バッファを通じて左部から列をコピーする
3. 予測バッファを通じて行と列の値の平均値をコピーする
4. （固定の）2次微分（水平と垂直方向）を用いて左上のコーナーからはじめて行と列に対して外挿する

Additionally, the sixteen Y subblocks may be predicted independently of each other using one of ten different modes, four of which are 4x4 analogs of those described above, augmented with six “diagonal” prediction methods. There are two types of predictions, one intra and one prediction (among all the modes), for which the residue signal does not use the Y2 block to encode the DC portion of the sixteen 4x4 Y subblock DCTs. This “independent Y subblock” mode has no effect on the 8x8 chroma prediction.

　加えて，16個のYサブブロックは10個の異なるモードの一つを用いてそれぞれ独立に予測されるかもしれない．そのうち4つは上記の4x4ブロックと類似しており，さらに6方向の予測方法が追加されている．二種類の予測方法と一種類のイントラと（全てのモードで共通の）一種類の予測方法があり，16個の4x4YサブブロックDCTのDC部分を符号化するために，差分信号はY2ブロックを使わない．独立したYサブブロックモードは8x8色差予測にはなんら影響を与えない．

Inter-prediction exploits the temporal coherence between nearby frames. Except for the choice of the prediction frame itself, there is no difference between inter-prediction based on the previous frame and that based on the golden frame or altref frame.

　インター予測は近傍フレーム間における時間的な一貫性を利用する．予測フレームの種類の選択を除いて，前フレームに基づくインター予測とゴールデンフレームや代替参照フレームに基づくインター予測にはなんら違いがない．

Inter-prediction is conceptually very simple. While, for reasons of efficiency, there are several methods of encoding the relationship between the current macroblock and corresponding sections of the prediction frame, ultimately each of the sixteen Y subblocks is related to a 4x4 subblock of the prediction frame, whose position in that frame differs from the current subblock position by a (usually small) displacement. These two dimensional displacements are called motion vectors.

　インター予測は概念としては非常に単純である．効果の理由のために，現在マクロブロックと予測フレームの対応する領域との関係を符号化する方法がいくつかあるのだが，究極的には16個それぞれのYサブブロックは予測フレームの4x4サブブロックと関係づけられ，フレーム内のその位置は，現在サブブロックからの（たいていは小さな）ずれ量によって異なる．

The motion vectors used by VP8 have quarter-pixel precision. Prediction of a subblock using a motion vector that happens to have integer (whole number) components is very easy: the 4x4 block of pixels from the displaced block in the previous, golden, or altref frame are simply copied into the correct position of the current macroblock’s prediction buffer.

　VP8で利用されている動きベクトルは1/4画素精度である．整数成分を持っている場合の動きベクトルを用いたサブブロックの予測は非常に簡単である．直前，ゴールデン，代替参照フレームに含まれる置換ブロックからの4x4ブロックは単に，現在マクロブロックの予測バッファの正しい場所へコピーされるだけである．

Fractional displacements are conceptually and implementationally more complex. They require the inference (or synthesis) of sample values that, strictly speaking, do not exist. This is one of the most basic problems in signal processing and readers conversant with that subject will see that the approach taken by VP8 provides a good balance of robustness, accuracy, and efficiency.

　分数のずれは概念的にも実装的にもよりより複雑になる．厳密に言えば存在しないサンプル値の推定（若しくは合成）を必要する．これは信号処理における最も基本的な問題の一つであり，この問題に精通している読者はVP8が採用した方針は頑健性と正確性と効率性の良いバランスを提供していることが分かるだろう．

Leaving the details for the implementation discussion below, the pixel interpolation is calculated by applying a kernel filter (using reasonable-precision integer math) three pixels on either side, both horizontally and vertically, of the pixel to be synthesized. The resulting 4x4 block of synthetic pixels is then copied into position exactly as in the case of integer displacements.

　以下の実装に関する議論の詳述を離れる前に，画素補間はカーネルフィルタ（現実的な精度の整数計算を利用）を合成される画素の水平と垂直方向の3画素に適用して計算される．次に，得られた合成画素の4x4ブロックは，整数精度の置換と同様に，正しい位置へコピーされる．

Each of the eight chroma subblocks is handled similarly. Their motion vectors are never specified explicitly; instead, the motion vector for each chroma subblock is calculated by averaging the vectors of the four Y subblocks that occupy the same area of the frame. Since chroma pixels have twice the diameter (and four times the area) of luma pixels, the calculated chroma motion vectors have 1/8 pixel resolution, but the procedure for copying or generating pixels for each subblock is essentially identical to that done in the luma plane.

　8個の色差サブブロックはそれぞれ同様に扱われる．それらの動きベクトルは陽には定義されない．その代わりに，それぞれの色差サブブロックの動きベクトルは，フレームの同じ領域を占めているYサブブロックの平均化によって算出される．色差画素は輝度画素の2倍の直径（4倍の面積）を持っており，算出された色差動きベクトルは1/8画素精度であるが，それぞれのサブブロックのためにコピーや生成される画素を生み出すことは，輝度成分で行われる事と本質的に同一である．

After all the macroblocks have been generated (predicted and corrected with the DCT/WHT residue), a filtering step (the loop filter) is applied to the entire frame. The purpose of the loop filter is to reduce blocking artifacts at the boundaries between macroblocks and between subblocks of the macroblocks. The term loop filter is used because this filter is part of the “coding loop,” that is, it affects the reconstructed frame buffers that are used to predict ensuing frames. This is distinguished from the postprocessing filters discussed earlier which affect only the viewed video and do not “feed into” subsequent frames.

　全てのマクロブロックが作成（予測と差分のDCT/WHTによる修正）された後，フィルタリング段階（ループフィルタ）がフレーム全体に適用される．ループフィルタの目的はマクロブロック間やマクロブロックのサブブロック間の境界に発生するブロックノイズを提言することである．最初に議論した事後処理フィルタとは全く異なる．事後処理フィルタは見えている動画像にのみ影響を与え，一連のフレームに反映されることはない．

Next, if signaled in the data, the current frame (or individual macroblocks within the current frame) may
replace the golden frame prediction buffer and/or the altref frame buffer.

　次に，もしデータ内に信号があれば，現在フレーム（もしくは現在フレーム内の個別マクロブロック）はゴールデンフレームの予測バッファか代替参照フレームのバッファを置換するかもしれない．

The halos of the frame buffers are next filled as specified above. Finally, at least as far as decoding is concerned, the (references to) the “current” and “last” frame buffers should be exchanged in preparation for the next frame.

　そして，フレームバッファの穴は上で述べたように埋められる．最後に，少なくとも復号に着目する限りにおいて，（言及している）現在フレームと最終フレームは，次フレームのために準備として交換されるべきである．

Various processes may be required (or desired) before viewing the generated frame. As discussed in the frame dimension information below, truncation and/or upscaling of the frame may be required. Some playback systems may require a different frame format (RGB, YUY2, etc.). Finally, as mentioned in the introduction, further postprocessing or filtering of the image prior to viewing may be desired. Since the primary purpose of this document is a decoding specification, the postprocessing is not specified in this document.

　様々な処理が生成されたフレームを表示する前に必要とされる（または，求められる）．以下のフレームの次元数情報で議論されるように，フレームの切り捨てや拡大が必要かもしれない．いくつかの再生システムはことなるフレーム形式（RGB，YUY2，など）が必要である．最後に，はじめにで注意したように，表示に先だってさらなる画像の事後処理やフィルタリングが求められるだろう．本稿の主目的は復号方法の明細化にあるため，本稿では事後処理は規定しない．

While the basic ideas of prediction and correction used by VP8 are straightforward, many of the details are quite complex. The management of probabilities is particularly elaborate. Not only do the various modes of intra-prediction and motion vector specification have associated probabilities but they, together with the coding of DCT coefficients and motion vectors, often base these probabilities on a variety of contextual information (calculated from what has been decoded so far), as well as on explicit modification via the frame header.

　VP8で利用されている予測と補正の基本的なアイディアは素直だが，それらの細部の多くは極めて複雑である．確率モデルの構成はとりわけ精巧である．イントラ予測と動きベクトルの規定の様々なモードが結びつけられた確率モデルを有しているが，それだけでなくDCT計数と動きベクトルの符号化とともに，それらは様々なコンテキスト情報（これまでに何が復号されたかを用いて計算する）にもとづく確率モデルを基礎においてあり，同様にフレームヘッダを通じて陽な変更も基礎においてある．

The “top-level” of decoding and frame reconstruction is implemented in the reference decoder files onyxd_if.c and decodframe.c .

　復号処理とフレーム再構成のトップレベルは参照復号器のファイルonyxd_if.cとdecodframe.cに実装されている．

This concludes our summary of decoding and reconstruction; we continue by discussing the individual aspects in more depth.

　これが復号処理と再構成のまとめであると結論づける．個々の様子のより深い議論を続ける．

　A reasonable “divide and conquer” approach to implementation of a decoder is to begin by decoding streams composed exclusively of key frames. After that works reliably, interframe handling can be added more easily than if complete functionality were attempted immediately. In accordance with this, we first discuss components needed to decode key frames (most of which are also used in the decoding of interframes) and conclude with topics exclusive to interframes.

　適切な分割統治による復号器の実装方針は，キーフレームから成り立つ排他的なストリームの復号処理から始めることである．正確にこれらが動作した後，インターフレームの処理は，もし同時に全ての機能に注力するとした場合よりも，簡単に追加可能である．これに従って，キーフレームを復号するのに必要な構成要素の議論を最初に行う（大部分の要素はインターフレームの復号にも利用される）．そして，インターフレームをのぞく話題について結論づける．