https://www./hierarchical-b-frames-or-b-pyramid Hierarchical B-Frames or B-PyramidGeneralWhat’s Hierarchical B-Frame Mode or B-pyramid (notice that in my opinion B-pyramid is a bad term)? If there is a run of B frames and some B-frames in the run are used for backward reference for some other B frames – then this mode is called Hierarchical B-Frames Coding or B-pyramid. The following figure is taken from the paper “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF”, by Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, illustrates the conception of B-pyramid: Let’s display the first GOP from the above figure slightly different: So, some geometric form is revealed but not a pyramid. Therefore, in my opinion the term B-pyramid is not a good choicce. To exploit B-pyramid feature fully it's necessary to set GOP size (in frames) to a dyadic number (2^n), e.g. gop size = 16 frames or 32 frames. According to results of the above mentioned article “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF” using of Hierarchical B-Frames commonly improves coding efficiency (e.g. on Football CIF 30Hz, the improvement is about 0.5 Y-PSNR dB). Pros and Cons of Hierarchical B-framesPros: better exploitation of temporal redundancy. Cons: long coding latency (not suitable for low-latency applications) How Detect Hierarchical B-Frames or B-Pyramid?For each frame we check that all following four conditions:
If all above conditions are met then the B-pyramid is detected. If elementary stream is encapsulated in Mpeg-TS container then we can use PTS instead of POC. It's worth mentioning PTS are easily picked while in case of pic_order_cnt_type=1 the derivation of POC is a complicated process. Indeed, to parse the POC value it’s necessary to dive into SPS and pick log2_max_pic_order_cnt_lsb or a dozen other parameters in case of pic_order_cnt_type=1. B-Pyramid versus non-reference B-framesWhat's a gain of B-pyramid GOP structure IPbBbPbBb.... against IPbbbPbbb.... (three consecutive non-reference B-frames). Here 'B' denotes B-frame used for reference and 'b' denotes B-frame not used for reference. i use x264 in constant QP mode (QP=25), closed GOP = 30 frames On the testing yuv-sequence "container" (384x320, 300 frames): the bit-size saving is ~0.7% On the testing yuv-sequence " akiyo" (384x320, 300 frames): the bit-size saving is ~1.7% IPbbbPbbb...: x264 --input-res 384x320 --fps 30 --b-adapt 0 --bframes 3 --b-pyramid none --ref 1 --no-scenecut --keyint 30 --min-keyint 30 --qp 25 --output test_ibbb.h264 container_384x320.yuv IPbBbPbBb... x264 --input-res 384x320 --fps 30 --b-adapt 0 --bframes 3 --b-pyramid strict --ref 1 --no-scenecut --keyint 30 --min-keyint 30 --qp 25 --output test_ibBb.h264 container_384x320.yuv How Detect B-Pyramid if Elementary Stream is Encapsulated in Mpeg-TS or MPEG4 Container?MPEG TS ContainerWhen Elementary Stream is encapsulated in MPEG-TS container we look for video frame boundaries to pick up PTS. We get PTS from the PES header and frame start is mandatory indicated by AUD (nal_type=9) in transport packet payload. Notice that if PTS is not present then PTS=DTS and no B-pyramid can exist in such case. Picture data (or slice data in case of multiple slices per picture) is contained in NALU with nal_type = 1 or 5 (IDR). There is a possibility that slice data is absent in the current transport packet and it’s present in the next or next-next video packet (e.g. if SPS is too long). Once NAL with nal_type 1 or 5 is sensed we need extract nal_ref_idc from the NAL header and two first parameters from the slice header: first_mb_in_slice and slice_type. NAL unit of each slice consists of: Start-code (000001 or 00000001), nal header (1 byte), slice header and slice data. nalType = nal_header & 0x1f nal_ref_idc = ( nal_header & 0x60 )>>5 To determine first_mb_in_slice and slice_type we need read the first byte from the slice header - slh[0] and to execute the following operations:
Hence if the current slice is corresponding to the first slice in a picture (i.e. first_mb_in_slice=1 or MSbit is '1’) and the picture type is B then one of the following two bit-patterns are transmitted in the first byte slh[0] of the slice: 1010 or 100111 Basing on the above patterns we derive the following rules to determine whether the picture type is B or not: if (slh[0]>>4)=0xA then current slice is the first slice and the picture type is B if ( slh[0] & 0xFC ) = 0x9C then then current slice is the first slice and the picture type is B For each frame we check that all following four conditions:
If all above conditions are met then B-pyramid is detected. MPEG4 Container (non-fragmented)With 'stco’ and 'stsz’ tables in meta-data we can access all access units successively in decoding order. For each access unit we skip over non-VCL units (e.g. SEI) until first slice data NAL sensed (nal_type=1 or 5). Then we read NAL header (to determine nal_ref_idc) and the following byte (which corresponds to the first byte of slice header) to determine slice type (B or not B). Slice type and nal_ref_idc are identically determined according to the previous section. Although ref_idc can be derived from sdtp-box provided that this box is present in meta-data (notice it’s not mandatory to signal sdtp-box). With ctts-table in meta data we derive PTS of each access unit (if ctts is not present then PTS = DTS and no B-pyramid can exist in such stream). For each frame we check that all following four conditions:
If all above conditions are met then B-pyramid is detected. |
|