x264 git snapshot proxy




cached from : http://x264.nl/x264/changelog.txt
git-id : a3ac64b8b467eea1264c0053022893bc84b2e9a2
revision : r2334
Author : Anton Mitrofanov
Date: Mon May 6 22:51:11 2013 +0400

OpenCL support improvement/refactoring

Autoload the OpenCL library so that it's not required to run an openCL-enabled
build of x264.

Update X264_BUILD, which should have been changed with the first patch.

git-id : c47347c01eb4d9933e2d9705f44707dbb396f611
revision : r2333
Author : Jason Garrett-Glaser
Date: Thu May 16 13:51:37 2013 -0700

x86: shave a few instructions off AVX deblock

git-id : b547a4ea1169411610855002db9a8182b1e73314
revision : r2332
Author : Henrik Gramner
Date: Tue May 14 18:57:40 2013 +0200

x86: AVX2 dequant_4x4_dc

git-id : 907573d3f7873b7600cc94d1e287d52628e11766
revision : r2331
Author : Henrik Gramner
Date: Tue May 14 18:53:12 2013 +0200

x86: AVX2 high bit-depth dequant

git-id : 442c6a420f8727d2f4087e9f3f317fb1774b9262
revision : r2330
Author : Jason Garrett-Glaser
Date: Thu May 9 17:20:05 2013 -0700

x86-64: 64-bit variant of AVX2 hpel_filter

~5% faster than 32-bit.

git-id : 26a6451591cd7cd25fcfeeacee3850e5dd7a7f7e
revision : r2329
Author : Henrik Gramner
Date: Mon May 6 18:41:24 2013 +0200

x86: AVX2 high bit-depth denoise_dct

28->15 cycles

Also reorder instructions to use fewer registers, 3 cycles faster on Ivy Bridge with 64-bit Windows.

git-id : db95d6af63bec7839b3d3e1f2eb67b8689dc8170
revision : r2328
Author : Henrik Gramner
Date: Sat May 4 18:48:58 2013 +0200

x86: AVX2 high bit-depth quant

quant_4x4: 13->6 cycles
quant_4x4_dc: 14->8 cycles
quant_8x8: 47->24 cycles
quant_4x4x4: 48->25 cycles

git-id : 327386f70836507cb44266e5d71bd1d744fe3d78
revision : r2327
Author : Jason Garrett-Glaser
Date: Wed May 1 14:32:11 2013 -0700

x86: AVX2 add16x16_idct_dc

27 -> 19 cycles

git-id : c82db4ed07d4a69a84ac99d5e79e32f61141494f
revision : r2326
Author : Jason Garrett-Glaser
Date: Mon Apr 29 16:16:54 2013 -0700

x86: faster AVX2 quant_4x4x4

10->9 cycles

git-id : b79f4a6e460b00c85f0ee67b03299bf1d15dd48c
revision : r2325
Author : Jason Garrett-Glaser
Date: Sat Apr 27 21:03:32 2013 -0700

x86: AVX2 intra_sad_x3_8x8c

30->22 cycles

git-id : 2c0bca3f798e20133f61c3517202942e873e00d6
revision : r2324
Author : Henrik Gramner
Date: Sun Apr 28 11:11:03 2013 +0200

x86: AVX2 high bit-depth intra_sad_x3_8x8

43->24 cycles

git-id : b2c30e1a470181b591619b211ae0342e9cc8aac9
revision : r2323
Author : Jason Garrett-Glaser
Date: Wed Apr 24 14:22:15 2013 -0700

x86: AVX2 deblock strength

30->18 cycles

git-id : 37edf16c1955cfc9d2843024af0fa7aa6268ad90
revision : r2322
Author : Henrik Gramner
Date: Wed May 1 17:42:48 2013 +0200

x86: Faster high bit-depth intra_sad_x3_4x4

20->16 cycles on Ivy Bridge

git-id : a9ed051f2bc73c9bfeff006d7328bd2bc99ce147
revision : r2321
Author : Jason Garrett-Glaser
Date: Tue Apr 30 17:36:46 2013 -0700

x86: faster SSSE3 hpel

~7% faster using the pmulhrsw trick from mc_chroma.

git-id : 9373d5fa6e7a5cc5bcc756125cbc2e7fe058ea43
revision : r2320
Author : Jason Garrett-Glaser
Date: Mon Apr 29 14:22:23 2013 -0700

x86-64: faster SSSE3 trellis

~2% faster trellis.

git-id : 2a716040eb8b89efd92ea61ab08ecc41bf0b8623
revision : r2319
Author : Jason Garrett-Glaser
Date: Thu May 2 17:10:26 2013 -0700

x86: 32-byte align the stack if possible

Avoids the need for manual 32 byte array alignment on compilers that support

git-id : eefaff1128ea9eb8dcd6796957ca5e56727337b8
revision : r2318
Author : Henrik Gramner
Date: Sat May 11 23:39:09 2013 +0200

x86inc: Utilize the shadow space on 64-bit Windows

Store XMM6 and XMM7 in the shadow space in functions that clobbers them.
This way we don't have to adjust the stack pointer as often,
reducing the number of instructions as well as code size.

git-id : b4be6e56629cf8fdcf53adc6b879969d8f6760b3
revision : r2317
Author : Henrik Gramner
Date: Fri May 3 23:06:10 2013 +0200

x86: Don't use explicitly aligned versions of SAD on AVX CPUs

On modern CPUs movdqu isn't slower than movdqa when used on aligned data and using the same code in both cases saves cache.

This was already done for the high bit-depth AVX2 implementation but the aligned version still exists as dead code so remove that.

git-id : 99f553ec300d928d23522304ebf4818574b85ed3
revision : r2316
Author : Henrik Gramner
Date: Fri May 3 20:18:03 2013 +0200

x86: Add missing initializations for high bit-depth sad_aligned

git-id : 42f2f78a05985a49fea0fb1bff050c95257810bb
revision : r2315
Author : Jason Garrett-Glaser
Date: Mon May 13 16:52:18 2013 -0700

x86: add Jaguar CPU detection

git-id : f12a17f5ecde41148256cb0c132cb31ac6602f3e
revision : r2314
Author : Henrik Gramner
Date: Tue May 7 17:21:03 2013 +0200

x86inc: Remove .rodata kludges

The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old.

a.out was superseded by ELF on sane systems a few decades ago.

git-id : c3b166a6cf55afaeea5bbc94ebb275b92efbd3d8
revision : r2313
Author : Henrik Gramner
Date: Sat May 4 16:21:32 2013 +0200

checkasm: Use 64-bit cycle counters

Prevents overflows that can occur in some cases.

git-id : e943696e98ba9a75f5100c5692e39708ff2cc422
revision : r2312
Author : Henrik Gramner
Date: Fri May 10 13:55:32 2013 +0200

checkasm: Fix stack alignment bug

git-id : b1749e204d14087a768990e8bfe964d343e0b9a9
revision : r2311
Author : Jason Garrett-Glaser
Date: Wed May 8 10:48:41 2013 -0700

Fix invalid memcpy in sliced-threads

Likely didn't actually break in practice, but memcpy with src==dst
is incorrect.

git-id : 76a5c3a19f97cd34b65aeff050de4042b054bc65
revision : r2310
Author : Jason Garrett-Glaser
Date: Mon Apr 29 12:14:01 2013 -0700

Fix two bugs in slice-min-mbs and slices-max

Slices-max broke slice-max-size when slice-max wasn't used.
Slice-min-mbs broke in rare cases near the end of a threadslice.

git-id : 3b1f1f71459b54b976588b871edc7f459b4d0434
revision : r2309
Author : Jason Garrett-Glaser
Date: Thu Apr 4 18:00:23 2013 -0700

x86: SSSE3 LUT-based faster coeff_level_run

~2x faster coeff_level_run.
Faster CAVLC encoding: {1%,2%,7%} overall with {superfast,medium,slower}.
Uses the same pshufb LUT abuse trick as in the previous ads_mvs patch.

git-id : c05bf544b659510b9008c1037fd8887e8917d30c
revision : r2308
Author : Jason Garrett-Glaser
Date: Mon Mar 25 14:03:37 2013 -0700

x86-64: BMI2 cabac_residual functions

git-id : 437f808579754b5674fb6183331e8ca9bcf53647
revision : r2307
Author : Jason Garrett-Glaser
Date: Wed Mar 20 15:08:35 2013 -0700

x86: SSSE3 ads_mvs

~55% faster ads in benchasm, ~15-30% in real encoding.
~4% faster "placebo" preset overall.

git-id : 2ad961f2d6fc681db6fc87f2c0ca68ff2a00e65e
revision : r2306
Author : Henrik Gramner
Date: Tue Apr 16 23:27:53 2013 +0200

x86: AVX2 pixel_ssd_nv12_core

git-id : 40406b804105964d6b5abea38833d69f6d617815
revision : r2305
Author : Henrik Gramner
Date: Tue Apr 16 23:27:50 2013 +0200

x86: AVX2 high bit-depth pixel_ssd

git-id : c2852be748c66f1ff25f38133d5efbd6059bed6c
revision : r2304
Author : Henrik Gramner
Date: Tue Apr 16 23:27:46 2013 +0200

x86: AVX2 high bit-depth pixel_sad_x3/pixel_sad_x4

Also reduce the number of xmm registers used by sse2/ssse3 pixel_sad_x3.

git-id : fa9dcd02ea386e46314eb0c518b0b5763ef73c80
revision : r2303
Author : Henrik Gramner
Date: Tue Apr 16 23:27:43 2013 +0200

x86: AVX2 high bit-depth vsad

git-id : 567d03619b0af415362454eb20066e0167266a43
revision : r2302
Author : Henrik Gramner
Date: Tue Apr 16 23:27:39 2013 +0200

x86: AVX2 high bit-depth pixel_sad

Also use loops instead of duplicating code; reduces code size by ~10kB with
negligible effect on performance.

git-id : 6cc9f169844cc84a7da8cc4fbf08a3f5dea86c63
revision : r2301
Author : Henrik Gramner
Date: Tue Apr 16 23:27:35 2013 +0200

x86: AVX2 high_bit_depth pixel_avg2, get_ref, mc_copy_w16, mc_luma

Also reduce the number of xmm registers used by mc_copy_* to avoid
saving and restoring xmm6 and xmm7 on 64-bit Windows.

git-id : c3711285a6dd1343197ac3e53bb95acf99c6cb42
revision : r2300
Author : Henrik Gramner
Date: Tue Apr 16 23:27:32 2013 +0200

x86: AVX2 nal_escape

Also rewrite the entire function to be faster and drop the AVX version which is no longer useful.

git-id : 255271fd7999b6b7ff7d65b7b8de1a2dc8919b1a
revision : r2299
Author : Henrik Gramner
Date: Tue Apr 16 23:27:29 2013 +0200

x86: AVX memzero_aligned

git-id : 43632cc8a9115c076204f46e31a5d5c3e58bf934
revision : r2298
Author : Henrik Gramner
Date: Tue Apr 16 23:27:25 2013 +0200

x86: AVX2 predict_16x16_dc

git-id : dcad117131f0e0b5032bf5ca8c27def7fcdce17f
revision : r2297
Author : Henrik Gramner
Date: Tue Apr 16 23:27:22 2013 +0200

x86: AVX2 predict_8x8c_p/predict_8x16c_p

git-id : f5bff68b16e3125dc95705d060c89935a298f0ff
revision : r2296
Author : Henrik Gramner
Date: Tue Apr 16 23:27:18 2013 +0200

x86: AVX2 predict_16x16_p

Also fix the AVX implementation to correctly use the SSSE3 inline asm
instead of SSE2.

git-id : 92eb201b65cb9338500135bda1e2ee4d6861727c
revision : r2295
Author : Henrik Gramner
Date: Tue Apr 16 23:27:14 2013 +0200

x86: AVX high bit-depth predict_16x16_v

Also restructure some code to reduce code size of various functions,
especially in high bit-depth.

git-id : 16f3261076c7159aeea902e68ca064c6d0a2cfd8
revision : r2294
Author : Henrik Gramner
Date: Tue Apr 16 23:27:08 2013 +0200

x86: AVX2 high bit-depth predict_4x4_h

git-id : a38b5fc6ec7348342d8ee4ff21abf3e82c5f7bbf
revision : r2293
Author : Henrik Gramner
Date: Tue Apr 16 23:27:04 2013 +0200

x86: AVX2 high bit-depth predict_16x16_h

git-id : 89f8263b141492a3b45274616fa0327289329c26
revision : r2292
Author : Henrik Gramner
Date: Tue Apr 16 23:27:00 2013 +0200

x86: AVX2 high bit-depth predict_8x8c_h/predict_8x16c_h

git-id : 78b8af872f49aeaa3727ac4e0c8d3b53f0716f51
revision : r2291
Author : Henrik Gramner
Date: Tue Apr 16 23:26:47 2013 +0200

x86util: Support ymm registers in HADD macros

git-id : d07b421cf19fc4d77f0bff9d4d6b11db27d81374
revision : r2290
Author : Jason Garrett-Glaser
Date: Tue Feb 26 16:26:34 2013 -0800

x86: more AVX2 framework, AVX2 functions, plus some existing asm tweaks

AVX2 functions:
zigzag interleave

git-id : e228f65488b02967bc450bbe3b92ac44eb0088d7
revision : r2289
Author : Loren Merritt
Date: Mon Feb 25 21:16:45 2013 +0000

x86inc: create xm# and ym#, analagous to m#

For when we want to mix simd sizes within one function.

git-id : e916dfb774059bc2b63dfe88e32fa21f51abd2b7
revision : r2288
Author : Jason Garrett-Glaser
Date: Fri Apr 5 16:08:35 2013 -0700

x86inc: fix AVX emulation of cmp(p|s)(s|d)

git-id : 5b01ce105051144c4dd91866e1642cc8d7926c89
revision : r2287
Author : Jason Garrett-Glaser
Date: Tue Feb 5 17:15:00 2013 -0800

x86-64: cabac_block_residual assembly

RDO: ~20% faster than C
Bitstream: ~50% faster than C
1-2% faster overall, highest on preset superfast/fast/medium.

git-id : 3a5f6c0aeacfcb21e7853ab4879f23ec8ae5e042
revision : r2286
Author : Steve Borho
Date: Thu Feb 21 12:48:40 2013 -0600

OpenCL lookahead

OpenCL support is compiled in by default, but must be enabled at runtime by an
--opencl command line flag. Compiling OpenCL support requires perl. To avoid
the perl requirement use: configure --disable-opencl.

When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU
device. Lowres intra cost prediction, lowres motion search (including subpel)
and bidir cost predictions are all done on the GPU. MB-tree and final slice
decisions are still done by the CPU. Presets which do not use a threaded
lookahead will not use OpenCL at all (superfast, ultrafast).

Because of data dependencies, the GPU must use an iterative motion search which
performs more total work than the CPU would do, so this is not work efficient
or power efficient. But if there are spare GPU cycles to spare, it can often
speed up the encode. Output quality when OpenCL lookahead is enabled is often
very slightly worse in quality than the CPU quality (because of the same data

x264 must compile its OpenCL kernels for your device before running them, and in
order to avoid doing this every run it caches the compiled kernel binary in a
file named x264_lookahead.clbin (--opencl-clbin FNAME to override). The cache
file will be ignored if the device, driver, or OpenCL source are changed.

x264 will use the first GPU device which supports the required cl_image
features required by its kernels. Most modern discrete GPUs and all AMD
integrated GPUs will work. Intel integrated GPUs (up to IvyBridge) do not
support those necessary features. Use --opencl-device N to specify a number of
capable GPUs to skip during device detection.

Switchable graphics environments (e.g. AMD Enduro) are currently not supported,
as some have bugs in their OpenCL drivers that cause output to be silently

Developed by MulticoreWare with support from AMD and Telestream.

git-id : e436158903c6171ce4abe78f03f013fe04f193bd
revision : r2285
Author : Jason Garrett-Glaser
Date: Mon Mar 4 15:19:47 2013 -0800

weightp: improve scale/offset search, chroma

Rescale the scale factor if the offset clips. This makes weightp more effective
in fades to/from white (and an other situation that requires big offsets).

Search more than 1 scale factor and more than 1 offset, depending on --subme.

Try to find the optimal chroma denominator instead of hardcoding it.

Overall improvement: a few percent in fade-heavy clips, such as a sample from
Avatar: TLA.

git-id : 389d06e8f93916b4fe5766ee4503380f2632ef79
revision : r2284
Author : Jason Garrett-Glaser
Date: Tue Feb 19 13:48:44 2013 -0800

Add slices-max feature

The H.264 spec technically has limits on the number of slices per frame. x264
normally ignores this, since most use-cases that require large numbers of
slices prefer it to. However, certain decoders may break with extremely large
numbers of slices, as can occur with some slice-max-size/mbs settings.

When set, x264 will refuse to create any slices beyond the maximum number,
even if slice-max-size/mbs requires otherwise.

git-id : f546e98eb8f9afd15fb7e8f95ec02fcf65155079
revision : r2283
Author : Jason Garrett-Glaser
Date: Thu Feb 14 17:22:02 2013 -0800

Add slice-min-mbs feature

Works in conjunction with slice-max-mbs and/or slice-max-size to avoid overly
small slices.
Useful with certain decoders that barf on extremely small slices.

If slice-min-mbs would be violated as a result of slice-max-size, x264 will
exceed slice-max-size and print a warning.

git-id : 1db46210d525856a8f9e59944913127287d956c5
revision : r2282
Author : Anton Mitrofanov
Date: Tue Mar 26 18:56:21 2013 +0400

Disable mbtree asm with cpu-independent option

Results vary between versions because of different rounding results.

git-id : fceb3b197f5fcaded3943718c162b662b52b208f
revision : r2281
Author : Anton Mitrofanov
Date: Tue Mar 26 18:30:00 2013 +0400

Show "avs: no" --disable-avs option instead of empty string

git-id : 68ee80a51f6f1de78877a9907e3efcbb1fe13ac6
revision : r2280
Author : Tim Walker
Date: Tue Mar 19 23:42:43 2013 +0100

lavf input: don't use deprecated AVStream fields

Fixes building against newer libavcodecs from the Libav project.

git-id : 5980580d5a4d32eebf32b2f274807dd4aa68836b
revision : r2279
Author : Anton Mitrofanov
Date: Tue Mar 26 19:54:36 2013 +0400

Fix y4m input with C420paldv colorspace

git-id : 580cc69707f6996dad8544d6ef0d5a8bbc1b5864
revision : r2278
Author : Jason Garrett-Glaser
Date: Sat Mar 2 01:22:29 2013 -0800

x86: correctly check stack alignment for Atom hadamard_ac

Regression in r2265 (only affected compilers with broken stack alignment,
like ICL on win32).

git-id : b3c15fcf677a4ceb59c8f4adc39dc93ecd06ff8a
revision : r2277
Author : Loren Merritt
Date: Mon Feb 25 21:23:55 2013 +0000

x86inc: fix some corner cases of SWAP

SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation

git-id : 89aecb440e2939be7fb72d8362eb12504711b94f
revision : r2276
Author : Jason Garrett-Glaser
Date: Wed Feb 27 13:30:22 2013 -0800

Fix array overreads that caused miscompilation in gcc 4.8

git-id : e355b0e12d6cb380c13cdce15b42093eb8eeef44
revision : r2275
Author : Jason Garrett-Glaser
Date: Thu Feb 28 13:32:37 2013 -0800

Fix undefined behavior in x264_ratecontrol_mb

git-id : c832fe995bf3d41cae1d3d22e10cb2288e8a650a
revision : r2274
Author : Stefan Groenroos
Date: Fri Mar 1 22:35:34 2013 +0200

ARM: Fix bug in x264_quant_4x4x4_neon

Regression in r2273.

git-id : b3065e660df391168067f13216d99825260939d4
revision : r2273
Author : Stefan Groenroos
Date: Mon Feb 25 23:43:09 2013 +0200

ARM: update NEON mc_chroma to work with NV12 and re-enable it

Up to 10-15% faster overall.

git-id : e82cf2c8e3bc0d7623f3e8ed9a4684bc3dc40b91
revision : r2272
Author : Jason Garrett-Glaser
Date: Thu Feb 14 15:00:48 2013 -0800

CABAC/CAVLC: use the new bit-iterating macro here too

git-id : 253e2c3f7eab79d74450de4f88a8bf451fd01be4
revision : r2271
Author : Jason Garrett-Glaser
Date: Fri Feb 8 15:34:38 2013 -0800

quant_4x4x4: quant one 8x8 block at a time

This reduces overhead and lets us use less branchy code for zigzag, dequant,
decimate, and so on.
Reorganize and optimize a lot of macroblock_encode using this new function.
~1-2% faster overall.

Includes NEON and x86 versions of the new function.
Using larger merged functions like this will also make wider SIMD, like
AVX2, more effective.

git-id : eaae05ea3f104dc9fa948327e10649ec693adf0e
revision : r2270
Author : Stephen Hutchinson
Date: Tue Feb 12 21:55:43 2013 -0500

Add AvxSynth support to the AviSynth input module.

Uses dlopen to load AvxSynth on Linux and OS X.

Allows the use of --demuxer avs for AvxSynth, though the only source filter it
can currently use is FFMS2.

Add a local copy of avxsynth_c.h and its dependent headers in extras/ so that
users don't need to actually have AvxSynth development headers installed to
enable support for it (mirroring the AviSynth behavior).

Based on a patch by 0x09 (tab@lavabit.com)

git-id : b2c70f6548a68b874006a176d48cd0ca4e03859a
revision : r2269
Author : Jason Garrett-Glaser
Date: Fri Feb 8 00:13:15 2013 -0800

Eliminate some branchiness in ME/analysis

Faster, fewer branch mispredictions.

git-id : 9d600d64194e0b2a77a8d9aa3f05b141cf473af0
revision : r2268
Author : Jason Garrett-Glaser
Date: Wed Feb 6 16:55:39 2013 -0800

Fix some store forwarding stalls
There's quite a few others, but most of them don't help to fix or there's no
easy way to avoid them.

git-id : 9fe40b1e0db6cd93652e3a45dbbd8f24dbe0b70e
revision : r2267
Author : Jason Garrett-Glaser
Date: Tue Feb 5 01:23:23 2013 -0800

x86: faster AVX satd/sa8d/sa8d_satd/hadamard_ac

Use Conroe-style movddup in AVX transforms; both Sandy Bridge and Bulldozer
do movddup in the load unit, so it's totally free this way.

On Sandy Bridge:
~6% faster sa8d_satd
~5% faster hadamard_ac
~9% faster 32-bit satd
~2% faster sa8d

git-id : 4f24bb34453fdedefd161063e20516d148b80f8b
revision : r2266
Author : Jason Garrett-Glaser
Date: Sat Feb 2 12:37:08 2013 -0800

x86: detect Bobcat, improve Atom optimizations, reorganize flags

The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this
and apply the appropriate flags.

It also has an extremely slow palignr instruction; create a flag for this to
avoid massive penalties on palignr-heavy functions.

Improve Atom function selection and document exactly what the SLOW_ATOM flag

Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3
optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on
Atom along with other SIMD multiplies.

Drop TBM detection; it'll probably never be useful for x264.

Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe).

Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.

git-id : d556d5540ab90b2c89a5ba0cd6ce393f87c19faf
revision : r2265
Author : Oskar Arvidsson
Date: Sat Jan 19 01:47:09 2013 +0100

x86: combined SA8D/SATD dsp function

Speedup is most apparent for 8-bit (~30%), but gives some improvements
for 10-bit too (~12%).
64-bit only for now.

git-id : 5c2ca5dee339a215cb331c426d40fa548675f088
revision : r2264
Author : Oskar Arvidsson
Date: Tue Jan 29 23:44:32 2013 +0100

x86: port SSE2+ SATD functions to high bit depth

Makes SATD 20-50% faster across all partition sizes but 4x4.

git-id : b09bc0cc936751f6ad1f20f5e11f523f6051ebc3
revision : r2263
Author : Oskar Arvidsson
Date: Wed Feb 6 02:07:53 2013 +0100

x86: faster high bit depth ssd

About 15% faster on average.

git-id : 91049858f8a051e87efcbe97285657fa3ef9a639
revision : r2262
Author : Jason Garrett-Glaser
Date: Fri Jan 18 22:55:46 2013 -0800

x86: optimize and clean up predictor checking
Branchlessly handle elimination of candidates in MMX roundclip asm.
Add a new asm function, similar to roundclip, except without the round part.
Optimize and organize the C code, and make both subme>=3 and subme<3 consistent.
Add lots of explanatory comments and try to make things a little more understandable.
~5-10% faster with subme>=3, ~15-20% faster with subme<3.

git-id : a216e5c92a1543e5d748928f7531cfd771739cbf
revision : r2261
Author : Jason Garrett-Glaser
Date: Tue Jan 22 12:31:55 2013 -0800

Fix two bugs in predictor checking
pmv wasn't checked properly in some cases, as well as zero vector.
Output-changing portion of the following patch.

git-id : 4d220bc18cb177b6812c381e7fb808f9ae3189e1
revision : r2260
Author : Jason Garrett-Glaser
Date: Thu Jan 10 13:15:52 2013 -0800

Improve lookahead-threads auto selection
Smarter decision to improve fast-first-pass performance in 2-pass encodes.
Dramatically improves CPU utilization on multi-core systems.

Tested on a quad-core Ivy Bridge (12 threads, 1080p):
Fast first pass:
veryfast: ~7% faster
faster: ~11% faster
fast/medium: ~15% faster
slow/slower: ~42% faster
veryslow: ~55% faster
veryfast: ~9% faster
(all others remained the same)

git-id : c63a518d43bb3822342513eb4af109551e86fbd2
revision : r2259
Author : Henrik Gramner
Date: Sun Jan 27 23:01:59 2013 +0100

x86: Use SSE instead of SSE2 for copying data

Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu.
Also merge MMX and SSE versions of memcpy_aligned into a single macro.

git-id : 0ce5b431b94f3934a7229ab264c12f1106e4330d
revision : r2258
Author : Henrik Gramner
Date: Sun Jan 13 18:27:08 2013 +0100

64-bit cabac optimizations

~4% faster PIC

~3% faster and 16 byte shorter cabac_encode_bypass
~8% faster cabac_encode_terminal
Benchmarked on Ivy Bridge

One instruction less in cabac_encode_bypass

git-id : 51a5976144d80d9dc178fcaba2da5224809ee6ba
revision : r2257
Author : Mike Gorchak
Date: Sat Feb 2 23:35:00 2013 -0800

configure: add QNX support

git-id : 486ff39f398401d126fbf0379287b1a7ca7fae6e
revision : r2256
Author : Henrik Gramner
Date: Sun Jan 20 19:35:06 2013 +0100

Windows: Enable DEP and ASLR

git-id : 8da42b78154304ef194747a375a7e1ff3021d0a9
revision : r2255
Author : Henrik Gramner
Date: Thu Jan 17 19:17:24 2013 +0100

x86inc: Set ELF hidden visibility for global constants

git-id : 989019209b2ccc828480f0e1f506747703134db3
revision : r2254
Author : Diego Biurrun
Date: Thu Jan 17 11:18:31 2013 +0100

x86inc: Add cvisible macro for C functions with public prefix

This allows defining externally visible library symbols.

Signed-off-by: Diego Biurrun

git-id : 91b0f0e6415b9cc56b625eb77dd5b471a59d3230
revision : r2253
Author : Diego Biurrun
Date: Thu Jan 17 11:30:37 2013 -0800

x86inc: rename program_name to private_prefix
Synced from libav.
The new name is more descriptive and will allow defining a separate public
prefix for externally visible library symbols.

git-id : a4e77598d2e1e55483bf0918f6ec2fda51ee9507
revision : r2252
Author : Jason Garrett-Glaser
Date: Mon Jan 14 05:35:30 2013 -0800

x264.h: improve x264_encoder_reconfig documentation

git-id : ce9efeafaad38bc6795d4469c952af2d5bb75a84
revision : r2251
Author : Henrik Gramner
Date: Sat Feb 16 19:36:50 2013 +0100

Cosmetics: stricter definition of parameterless functions

git-id : e403db4f9079811f5a1f9a1339e7c85b41800ca7
revision : r2250
Author : Neil
Date: Mon Jan 28 10:47:38 2013 +0800

Update "Install and compile x264" in doc/regression_test.txt

git-id : c13fbaf279d41e6bb8db09e95aec1b638ff026e8
revision : r2249
Author : Anton Mitrofanov
Date: Thu Jan 24 12:11:26 2013 +0400

Fix possible non-determinism with mbtree + open-gop + sync-lookahead

Code assumed keyframe analysis would only pull one frame off the list; this
isn't true with open-gop.

git-id : 736d69b5875587b61c03aa45438e19ddba1f7035
revision : r2248
Author : Anton Mitrofanov
Date: Mon Feb 25 19:28:19 2013 +0400

x86: don't use the red zone on win64

git-id : 637005ebef9f36b816a9777183660ea17f5b249d
revision : r2247
Author : Jason Garrett-Glaser
Date: Sun Feb 10 16:12:34 2013 -0800

x86-64: fix trellis asm with interlacing

Regression in r2145.
Assembly assumed array was [2][64] when it was actually [2][63].
Tiny (~0.1%) compression improvement.

git-id : ba5ce76f7506b5f3d083a9eda8c4705e192f15ff
revision : r2246
Author : Ronald S. Bultje
Date: Wed Jan 30 09:48:14 2013 -0800

x86-32: use simple nop codes for <= sse

The "CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags:
fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng
rng_en ace ace_en) SIGILLs on long nop codes.

git-id : bc13772e21d0e61dea6ba81d0d387604b5b360df
revision : r2245
Author : Loren Merritt
Date: Tue Jan 8 21:30:57 2013 +0000

Bump dates to 2013

git-id : 3508f4a1446c408dcc0febe1a349ad303ae6628c
revision : r2244
Author : Henrik Gramner
Date: Mon Dec 17 21:54:00 2012 +0100

x86inc: Drop tzcnt workaround

It is no longer needed now that we've bumped the version requirement of yasm to 1.2.0.

git-id : b924133cabd125286488e16cfa71488ad4105d63
revision : r2243
Author : Jason Garrett-Glaser
Date: Mon Nov 12 10:28:53 2012 -0800

AVX2/FMA3 version of mbtree_propagate
First AVX2 function for testing.
Bump yasm version to 1.2.0 for AVX2 support.

git-id : d967c09cd93a230e03ec1e0f0f696975d15a01c0
revision : r2242
Author : Henrik Gramner
Date: Tue Dec 11 16:05:34 2012 +0100

x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that were missing before.

git-id : f6c628650558803ed65cb15c1853113cc589ae4a
revision : r2241
Author : Loren Merritt
Date: Sun Dec 2 15:56:30 2012 +0000

x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition.
REP_RET is still needed manually when it's a branch target, but that's much rarer.
The implementation involves lots of spurious labels, but that's ok because we strip them.

git-id : 755fece365c14135c2621585e761f5dfeedefc74
revision : r2240
Author : Ronald S. Bultje
Date: Thu Dec 6 15:40:13 2012 -0800

x86inc: support stack mem allocation and re-alignment in PROLOGUE
Use this in 8-bit loopfilter functions so they can be used if
there is no aligned stack (e.g. x86-32 MSVC or ICC 10.x).

git-id : c69e2d02c4a2ee171b6b8ca0a2e1032213e561bc
revision : r2239
Author : Henrik Gramner
Date: Mon Dec 17 22:15:02 2012 +0100

Update config.guess and config.sub

git-id : 9c4ba4bde8965571159eae2d79f85cabbb47416c
revision : r2238
Author : Anton Mitrofanov
Date: Tue Jan 8 13:29:49 2013 -0800

Fix crash if the first frame is forced to a non-keyframe
This is obviously bad user input, but x264 shouldn't crash if it happens.

git-id : 593e8cc0b374aa7b20d3d961c57feb9bab508979
revision : r2237
Author : Bernhard Rosenkr辰nzer
Date: Sun Dec 30 12:18:00 2012 -0800

Fix build on ARM with binutils >=
GAS doesn't seem to like spaces in vld1 anymore, so remove those.

git-id : 55b5162d7ad9a70e2b6ae5ba3f743a35c2135aaf
revision : r2236
Author : Anton Mitrofanov
Date: Fri Nov 23 18:26:53 2012 +0400

Fix pthread_join emulation on win32 and BeOS
Doesn't actually affect x264, but it's more correct.

git-id : 0059dcf938451134d8f9c8f1ad522a2c6071e7cd
revision : r2235
Author : Jason Garrett-Glaser
Date: Tue Nov 27 07:50:51 2012 -0800

Fix typo in r2222
Slightly wrong numbers in level table.

git-id : 28ee1f47ed4366351477065a0f794f05402e69a7
revision : r2234
Author : Sergio Basto
Date: Thu Nov 22 18:02:50 2012 -0800

configure: fix gpac detection with -Wp,-D_FORTIFY_SOURCE=2

git-id : 67a69c06d7bd7907b5d1e058a26284c06baa93d1
revision : r2233
Author : Sean McGovern
Date: Thu Nov 22 18:01:16 2012 -0800

Solaris: use sysconf to get processor count
Solaris responds correctly to the same value as Cygwin, so let's use that.

git-id : 6e68ab73908f339cdd91c40943fef46fd1f832fa
revision : r2232
Author : Anton Khirnov
Date: Tue Nov 13 21:01:24 2012 +0100

lavf input: allocate AVFrame correctly
Allocate AVFrames correctly with avcodec_alloc_frame().
This caused crashes with newer libavcodecs that try to free frame extradata.

git-id : a632fe1a57baccdf1bcb340197fe48281cd3117f
revision : r2231
Author : Anton Mitrofanov
Date: Sun Nov 11 03:44:02 2012 +0400

Fix crash when using libx264.dll compiled with ICL for X86_64

git-id : 1cffe9f406cc54f4759fc9eeb85598fb8cae66c7
revision : r2230
Author : Anton Mitrofanov
Date: Fri Nov 9 02:31:10 2012 +0400

Fix possible issues with out-of-spec QP values
Fixes a possible regression in r2228.

git-id : 349b9bdefae84b006c4bdb7e07290b88a18bbbb2
revision : r2229
Author : Jason Garrett-Glaser
Date: Wed Sep 26 13:49:02 2012 -0700

Attempt to optimize PPS pic_init_qp in 2-pass mode
Small compression improvement; up to ~0.5% in extreme cases.
Helps more with small slice sizes (tiny resolutions or slice-max-size).
Note that this changes the 2-pass stats file format.

git-id : 8437d0db5de43cf9cd11e02444c80984935e25dc
revision : r2228
Author : Jason Garrett-Glaser
Date: Wed Sep 26 13:05:00 2012 -0700

Improve slice header QP selection
Use the first macroblock of each slice instead of the last of the previous.
Lets us pick a reasonable initial QP for the first slice too.
Slightly improved compression.

git-id : d2d8364ff48f789ef92135d24c6f185c4eccbeba
revision : r2227
Author : Jason Garrett-Glaser
Date: Thu Oct 11 13:27:48 2012 -0700

Update level dpb size calculation to match newer H.264 spec
Doesn't actually change encoding behavior, but makes it more correct.
Warning messages should now be accurate at higher bit depths and non-4:2:0.
Technically, since it redefines x264_level_t, this is an API version increment.

git-id : 28ddb0dd533154b58f9147932fb1dec4c74127c8
revision : r2226
Author : Jan Ekstr旦m
Date: Sun Oct 7 21:12:05 2012 +0300

Add support for the ffmpeg/vapoursynth high bit depth y4m extensions

git-id : 64cbe75cf3cae2bbc8fb34bcda5a9742d22f83f2
revision : r2225
Author : Diego Biurrun
Date: Tue Nov 6 14:48:56 2012 +0100

x86inc: Rename 3dnow2 to 3dnowext
The name "3dnowext" is more common than "3dnow2". Doesn't affect x264.

git-id : f418867a8b76f31acf3a965eed34c5587294e948
revision : r2224
Author : Diego Biurrun
Date: Wed Oct 31 12:23:54 2012 -0700

x86inc: only define program_name if the macro is unset.
This allows overriding the value from outside the file.
This can be useful if x86inc.asm is used outside of x264.

git-id : f6a8615ab0c922ac2cb5c82c9824f6f4742b1725
revision : r2223
Author : David Wolstencroft
Date: Mon Oct 29 09:07:39 2012 -0700

Disable ARM NEON MRC CPU test for Apple devices
The Apple A6 CPU doesn't support performance counters, so this test caused a crash.

git-id : 6889f2cee49314aa380d4803991d645659efc01f
revision : r2222
Author : Jason Garrett-Glaser
Date: Tue Nov 6 12:03:20 2012 -0800

Fix crash with no-scenecut + mbtree

git-id : 4dbfcd462ccdf065654d17c47e1d05d53f213bf1
revision : r2221
Author : Anton Mitrofanov
Date: Fri Oct 12 23:43:40 2012 +0400

Fix reconfiguring to crf=0
Lossless mode can't currently be enabled mid-stream.

git-id : 2ec8c64580efc10bbfc343d4bec2cf6bbb7d68c7
revision : r2220
Author : Derek Buitenhuis
Date: Mon Sep 17 11:09:20 2012 -0700

ICL's preprocessor doesn't handle it correctly.
This fix is similar to libav's fix in 0db2d9.

git-id : 2f154ac1652000afe16140cb12c35d777f0c60c8
revision : r2219
Author : Jason Martens
Date: Thu Sep 13 11:20:40 2012 -0700

Fix use of deprecated av_close_input_file call

git-id : 9fc00654018ff9f8a13dbe66785e31568a0c3229
revision : r2218
Author : Brad Smith
Date: Wed Sep 26 14:13:27 2012 -0700

Fix pkg-config for dynamic vs static linking

git-id : b22f22fdb1f6d61ccc7b0c867b530322ea681133
revision : r2217
Author : Brad Smith
Date: Mon Sep 10 17:52:04 2012 -0700

Set libm in the configure script if the OS has libm
Prerequisite for another configure patch after this.
Idea copied from libpthread.

git-id : 198a7ea13ccb727d4ea24b29f5da9b0292387309
revision : r2216
Author : Jason Garrett-Glaser
Date: Thu Aug 16 13:40:32 2012 -0700

Enhance mb_info: add mb_info_update
This feature lets the callee know which decoded macroblocks have changed.

git-id : f6e9002dd03329b69ea56391b3f4197efca7a690
revision : r2215
Author : Jason Garrett-Glaser
Date: Thu Aug 16 13:01:17 2012 -0700

Fix mb_info_free with sliced threads
x264 would free mb_info before it was completely done using it.

git-id : de725e98eb87198542aae5b8c5ebab4f6c06446e
revision : r2214
Author : Jason Garrett-Glaser
Date: Tue Aug 7 12:43:26 2012 -0700

Enhance nalu_process
Add the input frame opaque pointer to the arguments.
This makes it easier to use with multiple simultaneous x264 encodes.

git-id : 174cfac6344a9fad1577cd1f449b7d0e625d6e28
revision : r2213
Author : Jason Garrett-Glaser
Date: Mon Aug 6 14:55:35 2012 -0700

Improve mb_info constant mb optimization
Allow fast skipping even if the pskip MV isn't zero.

git-id : f57e7070d949b02e1a548382a549c34cf491e05e
revision : r2212
Author : Jason Garrett-Glaser
Date: Mon Jul 30 12:58:34 2012 -0700

Export the average effective CRF of each frame
Useful to judge the resulting quality of a frame when VBV is enabled.

git-id : d7fd6cc060b6ae3f3bcb9e09fc8bf532a8ed3a82
revision : r2211
Author : Brad Smith
Date: Mon Aug 20 23:58:19 2012 -0700

Remove special-casing for OpenBSD pthread handling
Previously it was policy to use -pthread, but OpenBSD now recommends -lpthread.
its been libpthread anyway and policy has changed to stop using -pthread.

git-id : 8f7644865010385efcb4cb5bd239b28edb4b49e2
revision : r2210
Author : Ronald S. Bultje
Date: Thu Jul 26 18:01:49 2012 -0700

x86inc: automatically insert vzeroupper for YMM functions
Backported from libav.

git-id : 68dfb7b352c4d273e44668c1f6e4a9a283a37e84
revision : r2209
Author : Kieran Kunhya
Date: Tue Jul 24 08:47:45 2012 -0700

Free user supplied data when deleting a frame
This eliminates a memory leak when calling x264_encoder_close.

git-id : d9d2288cabcfd1a90f29f2f11c8cce5450a08ffa
revision : r2208
Author : Jason Garrett-Glaser
Date: Wed Jul 18 08:33:41 2012 -0700

Revert r2204
People don't seem to like this so I'm just going to get rid of it.

git-id : 5f615f7f93d830e55e6fe4f04d214b93d8cb4b53
revision : r2207
Author : Jason Garrett-Glaser
Date: Tue Jul 10 14:10:44 2012 -0700

Faster predictor checking with subme<3
Fix a typo that made an early-skip less effective.
Avoid a relatively unpredictable branch.
Slightly changed output due to the typo-fix.
~50 cycles faster on Core i7.

git-id : 5af86bedd71c89fc48b50bbb7e8a8bec3d360d3a
revision : r2206
Author : Jason Garrett-Glaser
Date: Mon Jun 25 18:01:29 2012 -0700

Try 8x8 transform analysis even when sub8x8 partitions are present
Turn off the sub8x8 partitions, try it, and turn them back on if it didn't help.
Small compression improvement with p4x4 on (~0.1-0.5%).
Also update related comments.

git-id : 913485d26b19dddb6340f7115843d63cde8bb836
revision : r2205
Author : Jason Garrett-Glaser
Date: Fri Jun 8 18:19:59 2012 -0700

Support changing resolutions between passes with macroblock-tree
Implement a basic separable bilinear filter to rescale the quantizer offsets.
Structure inspired by swscale, but floating-point instead of fixed-point.
Not as optimized as it could be, but it's quite fast already.

Example compression penalties on a 720p video game recording:
First pass with 720p and second as 480p: ~-1.5% (vs. same res)
First pass with 480p and second as 720p: ~-3% (vs. same res)

git-id : 8b535d9006d87e32c4ff939691b920da823ae85a
revision : r2204
Author : Alexander Prikhodko
Date: Tue Jun 12 20:21:35 2012 +0300

Print elapsed time in encoding progress indicator

git-id : 11e32c534a213168d8f466fb64bee75e1534d7af
revision : r2203
Author : Anton Mitrofanov
Date: Sat Jun 2 21:27:50 2012 +0400

Cap ratecontrol predictor parameters
Limits VBV mispredictions after long periods of relatively constant video.

git-id : e21e9c972ed830ac7ad264912b41543adf7e720f
revision : r2202
Author : Loren Merritt
Date: Tue Jul 3 14:38:04 2012 -0700

x86inc: import patches from libav
Allow manual invocation of WIN64_SPILL_XMM even under INIT_MMX
SSE version of mova is movaps rather than movdqa.
YMM version of movnta.
Add mp size for named arguments.
Fix DEFINE_ARGS when used outside of a cglobal.
Define a few more cpuflags.
3-argument wrappers for a few more instructions.

git-id : 37be55213a39db40cf159ada319bd482a1b00680
revision : r2201
Author : Anton Mitrofanov
Date: Fri Jun 22 22:02:24 2012 +0400

Fix crash with --fps 0
Fix some integer overflows and check input parameters better.
Also fix incorrect type specifiers for demuxer info printing.

git-id : 999b753ff0f4dc872077f4fa90d465e948cbe656
revision : r2200
Author : Jason Garrett-Glaser
Date: Tue May 8 15:42:56 2012 -0700

Threaded lookahead

Split each lookahead frame analysis call into multiple threads. Has a small
impact on quality, but does not seem to be consistently any worse.

This helps alleviate bottlenecks with many cores and frame threads. In many
case, this massively increases performance on many-core systems. For example,
over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system.
Realtime 1080p30 at --preset slow should now be feasible on real systems.

For sliced-threads, this patch should be faster regardless of settings (~10%).

By default, lookahead threads are 1/6 of regular threads. This isn't exacting,
but it seems to work well for all presets on real systems. With sliced-threads,
it's the same as the number of encoding threads.

git-id : ecfbf9d8025e39783bc4262dc1972ca742d8a993
revision : r2199
Author : Anton Mitrofanov
Date: Fri May 4 17:18:12 2012 +0400

Add support for RGB formats in bit-depth conversion filter

git-id : 1c97f3570fba02f768fbf649b9f7d48beb720048
revision : r2198
Author : Anton Mitrofanov
Date: Sat May 12 13:57:49 2012 +0400

Fix some bugs in mb_info code

git-id : 69a0443e7d8ab032a7f3c3468a42177d5e64daa2
revision : r2197
Author : Jason Garrett-Glaser
Date: Thu Mar 29 14:14:07 2012 -0700

Add mb_info API for signalling constant macroblocks
Some use-cases of x264 involve encoding video with large constant areas of the frame.
Sometimes, the caller knows which areas these are, and can tell x264.
This API lets the caller do this and adds internal tracking of modifications to macroblocks to avoid problems.
This is really only suitable without B-frames.
An example use-case would be using x264 for VNC.

git-id : df6252cfed7c23fbe883456f4e0607a7f8e91ad8
revision : r2196
Author : Henrik Gramner
Date: Sat Apr 7 00:40:09 2012 +0200

Faster chroma weight cost calculation

New assembly function with SSE2, SSSE3 and XOP implementations for calculating absolute sum of differences.

git-id : cce88ebc9e517b0fa8735b81ac30b4e6a79c8154
revision : r2195
Author : Lucien
Date: Sat Mar 31 13:42:49 2012 +0100

Add Level 5.2 support

git-id : ee30c84e38b30896ffa6ddc417f3b4c281a86d1a
revision : r2194
Author : Henrik Gramner
Date: Thu Apr 12 19:14:43 2012 +0200

Eradicate all mention of Extended Profile
x264 never supported it and never will because nobody uses it.

git-id : 8ca49cc5c40813d8b98544989eb684e167b06aa0
revision : r2193
Author : Anton Mitrofanov
Date: Tue Apr 3 21:46:52 2012 +0400

Fix disabling of mbtree when using 2pass encoding and zones

git-id : 3691332c0b33a68f9d6f519edaa2b848ed34a38c
revision : r2192
Author : Alexander Prikhodko
Date: Sat Mar 31 12:06:21 2012 +0300

configure: force select -mXX gcc option for i386/x86-64
Makes multilib compilation more convenient.

git-id : e1ccbf9bb3abdd25d3f0c76682926ec49f3f8001
revision : r2191
Author : Rafa谷l Carr辿
Date: Sun Apr 15 21:20:14 2012 -0400

Update config.guess and config.sub
Adds support for a bunch of targets, including:
aarch64 (armv8)

git-id : f87619768dba73c1effbcfb08875d096575e079e
revision : r2190
Author : Alexander Prikhodko
Date: Sat Mar 31 11:33:41 2012 +0300

configure: correct use of RC variable and add --extra-rcflags

git-id : 35cf912671fddcb3e701bf667a75f77dd8b28264
revision : r2189
Author : Steven Walters
Date: Wed Mar 28 21:15:04 2012 -0400

ICL/MSVS: Fix shared library generation and usage
MSVS requires exported variables to be declared with the DATA keyword, and requires that imported variables be declared with dllimport.
This does not fix x264 cli being unable to use a shared library built by ICL however.

git-id : 259a6e57ae25c71acc1669e0aefde7ffe7e235ec
revision : r2188
Author : Kieran Kunhya
Date: Tue Mar 27 17:38:56 2012 +0100

Fix intra-refresh + hrd

git-id : e0351cdfeb45bf7f891eeb1dc475292154bb9d82
revision : r2187
Author : Anton Mitrofanov
Date: Sun Mar 25 17:34:24 2012 +0400

Fix frame input colorspace check

git-id : 7392c8c31f791e9b4c10e4959f8715c8a8233d25
revision : r2186
Author : Jason Garrett-Glaser
Date: Thu Mar 22 13:56:50 2012 -0700

Fix comment in deblock.c
The code does, in fact, handle CAVLC+8x8dct correctly already.

git-id : 6979713216d792e44e3cbaeeba74b455e0a07c62
revision : r2185
Author : Jason Garrett-Glaser
Date: Tue Mar 13 14:37:26 2012 -0700

Fix sliced-threads ratecontrol bug
Was using qp instead of qscale; could cause NANs (not to mention less accurate results).

git-id : 5c85e0a2b7992fcaab09418e3fcefc613cffc743
revision : r2184
Author : Anton Mitrofanov
Date: Sun Mar 11 23:08:18 2012 -0700

Fix clobbering of mutex/cvs
Regression in r2183.
Bizarrely seemed to work on many platforms, but crashed on win64 and may have been slower.
Only affected sliced threads during encoding, but could cause crashes on x264 encoder close even without sliced threads.

git-id : c522ad1fed167d0e985e4f9dcdee042473cf74db
revision : r2183
Author : Jason Garrett-Glaser
Date: Fri Feb 24 13:34:39 2012 -0800

Sliced-threads: do hpel and deblock after returning
Lowers encoding latency around 14% in sliced threads mode with preset superfast.
Additionally, even if there is no waiting time between frames, this improves parallelism, because hpel+deblock are done during the (singlethreaded) lookahead.
For ease of debugging, dump-yuv forces all of the threads to wait and finish instead of setting b_full_recon.

git-id : 6a27a481d4c3508ce778a61a139a4734bb8126f7
revision : r2182
Author : Jason Garrett-Glaser
Date: Fri Feb 24 13:16:52 2012 -0800

Add full-recon API option
Fully reconstruct frames even without dump-yuv.

git-id : e856755d2a67f45249c24cb51aa38fc4fa192321
revision : r2181
Author : Jason Garrett-Glaser
Date: Wed Feb 22 13:33:36 2012 -0800

x86inc: switch to amdnops
Recent AMD CPUs' instruction decoders choke horribly on extremely long nops (i.e. with 4 prefixes).
Won't affect much, since we don't use ALIGN much.

git-id : 5a242c5862baaa4bd5829bd1b43dc11cf5c86344
revision : r2180
Author : Jason Garrett-Glaser
Date: Tue Feb 14 16:54:03 2012 -0800

BMI1 decimate functions
Intel was nice enough to make tzcnt equal to "rep bsf", which is backwards-compatible.
This means we don't actually have to add new functions to make it work.

git-id : ac31c59a98c6c690894670b9c9af2612f799d85b
revision : r2179
Author : Jason Garrett-Glaser
Date: Tue Feb 14 15:07:10 2012 -0800

Minor asm changes

git-id : 83561e55dde06f3247aa9b99fa62ead38d7a406e
revision : r2178
Author : Jason Garrett-Glaser
Date: Thu Feb 9 14:23:52 2012 -0800

Add row-reencoding support to VBV for improved accuracy
Extremely accurate, possibly 100% so (I can't get it to fail even with difficult VBVs).
Does not yet support rows split on slice boundaries (occurs often with slice-max-size/mbs).
Still inaccurate with sliced threads, but better than before.

git-id : 5a69f8e105663497794d4bb4e58cf7baa5cd29cb
revision : r2177
Author : Jason Garrett-Glaser
Date: Thu Feb 9 12:38:44 2012 -0800

Abstract bitstream backup/restore functions
Required for row re-encoding.

git-id : 037d123cf62c4af2dc13742b8606882b6d0d3d9e
revision : r2176
Author : Anton Mitrofanov
Date: Thu Feb 9 15:27:53 2012 -0800

Add an small per-MB cost penalty for lowres
Helps avoid VBV predictors going nuts with very low-cost MBs.
One particular case this fixes is zero-cost MBs: adaptive quantization decreases the QP a lot, but (before this patch), no cost penalty gets factored in for this, because anything times zero is zero.

git-id : de5a0adca1a7d08b1233b317ec092dbf19263d2f
revision : r2175
Author : Jason Garrett-Glaser
Date: Mon Feb 13 18:31:51 2012 -0800

Remove explicit run calculation from coeff_level_run
Not necessary with the CAVLC lookup table for zero run codes.

git-id : 9f1ac3b36eb2666e9d2ec4b859f3b63f60827bf0
revision : r2174
Author : Jason Garrett-Glaser
Date: Mon Feb 13 13:20:06 2012 -0800

Export PSNR/SSIM in x264 API

git-id : 7e85ec036df4290697239f5dc9f4a793313ceebc
revision : r2173
Author : Ronald S. Bultje
Date: Wed Feb 8 13:10:31 2012 -0800

x86inc: support yasm -f win64
Not necessary for x264, as -m amd64 already does the right thing, but used by external users of x86inc.

git-id : 02c3d5ec58d6bcbc5e22715ae80d53d8556f3c8f
revision : r2172
Author : Henrik Gramner
Date: Wed Feb 1 23:52:48 2012 +0100

Fix incorrect zero-extension assumptions in x86_64 asm
Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero.
This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI.
As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations.
Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary.
Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.

git-id : 01f7a333e6c6a6d91a7fe977b491a448ddf4c117
revision : r2171
Author : Jason Garrett-Glaser
Date: Thu Feb 23 09:11:23 2012 -0800

Fix possible alignment crash when linking from MSVC
x264_cavlc_init needs to be stack-aligned now.

git-id : b17c247178a24c218843639c3f46bcfde0edab0a
revision : r2170
Author : Anton Mitrofanov
Date: Tue Feb 21 12:58:22 2012 -0800

Fix rare overflow in 10-bit intra_satd_x3_16x16 asm

git-id : 1446fe7c47cf660d764b4cbf53694bc3df9b04de
revision : r2169
Author : Steven Walters
Date: Sat Feb 11 22:56:43 2012 -0500

ICL: fix out of tree building and resource file usage on Windows

git-id : d3efb00abbedd2bbb70156bd989beefe06468116
revision : r2168
Author : Oka Motofumi
Date: Mon Feb 6 06:07:34 2012 +0900

Add error handling for out-of-tree build

git-id : ec41b19edc67ee4eca09c0e3b37e6290844c5e1f
revision : r2167
Author : Anton Mitrofanov
Date: Tue Mar 6 17:34:02 2012 +0400

Fix RGB colorspace input
BGR/BGRA input was correct.

git-id : 39a4c6fecaaa0d6cde8d89d31ef6cd1d25ab802b
revision : r2166
Author : Jason Garrett-Glaser
Date: Mon Feb 13 16:40:32 2012 -0800

Fix interlaced + extremal slice-max-size
Broke if the first macroblock in the slice exceeded the set slice-max-size.

git-id : 3f72c99a15a07511b758d9e94217223480865124
revision : r2165
Author : Henrik Gramner
Date: Sun Feb 5 20:43:09 2012 +0100

Fix regression in r2141
Broke register preservation in x264_cpu_cpuid and x264_cpu_xgetbv.
Did not cause any problems.

git-id : da19765d723b06a1fa189478e9da61a1c18490f8
revision : r2164
Author : Jason Garrett-Glaser
Date: Thu Jan 19 14:56:54 2012 -0800

TBM, AVX2, FMA3, BMI1, and BMI2 CPU detection support
TBM and BMI1 are supported by Trinity/Piledriver.
The others (and BMI1) will probably appear in Intel's upcoming Haswell.
Also update x86inc with AVX2 stuff.

git-id : efef20090a06a38f9d95755588d7830fb92a2a02
revision : r2163
Author : Loren Merritt
Date: Fri Feb 3 06:27:18 2012 +0000

x86inc: add TAIL_CALL macro to abstract a common asm idiom

git-id : a7e6e1793b4d2b49c9449d767320c71daa855cb6
revision : r2162
Author : Jason Garrett-Glaser
Date: Wed Jan 25 16:44:38 2012 -0800

Minor asm optimizations/cleanup

git-id : 56ba096141d16ffcbabd805e2d27014f62f0d722
revision : r2161
Author : Jason Garrett-Glaser
Date: Tue Jan 24 19:03:58 2012 -0800

Clean up and optimize weightp, plus enable SSSE3 weight on SB/BDZ
Also remove unused AVX cruft.

git-id : 961a278e0123eb662b46a6f136a48a43f6a2d427
revision : r2160
Author : Jason Garrett-Glaser
Date: Mon Jan 23 18:57:58 2012 -0800

XOP frame_init_lowres
Covers both 8-bit and 16-bit, ~5-10% faster on Bulldozer.

git-id : c5809994990df6c63b4250546844dc77181fee0f
revision : r2159
Author : Jason Garrett-Glaser
Date: Tue Jan 17 15:25:10 2012 -0800

XOP 8x8 zigzags
Field: 35(mmx) ->16(xop) cycles
Frame: 32(ssse3)->20(xop) cycles

git-id : 14dc11f7c52fa29576e0003c8c16857a78bf5fbf
revision : r2158
Author : Jason Garrett-Glaser
Date: Mon Jan 23 15:09:38 2012 -0800

AVX 32-bit hpel_filter_h
Faster on Sandy Bridge.
Also add details on unsuccessful optimizations in these functions.

git-id : 2fcd0446b5d91ae52e143682c30000a49441e4a1
revision : r2157
Author : Jason Garrett-Glaser
Date: Fri Jan 27 16:29:30 2012 -0800

x86inc: add high halfword register support
Might be useful in a few cases.

git-id : 5c4b8484ea9aaabfb70523ba1f9c4d8343ad3221
revision : r2156
Author : Ronald S. Bultje
Date: Wed Jan 25 13:53:59 2012 +0800

Change %ifdef directives to %if directives in *.asm files
This allows combining multiple conditionals in a single statement.

git-id : 1b558de42dc08a303c2faf79fc9999b48a876370
revision : r2155
Author : Anton Mitrofanov
Date: Sun Jan 22 22:13:52 2012 +0400

Use TV range algorithm for bit-depth conversions
Such sources are more common, so better to be correct for the common case.
This also produces less error for the case of full range than the previous algorithm produced for the case of TV range.

git-id : 83c371deba853a4ebb28739e868df86b3153fb3e
revision : r2154
Author : Hii
Date: Wed Jan 25 16:29:22 2012 +0800

Bump dates to 2012

git-id : a2925c5a707e833c34fa0a64d497c02e6dcfe6e6
revision : r2153
Author : Henrik Gramner
Date: Sat Jan 28 21:38:27 2012 +0100

Add Windows resource file
Displays version info in Windows Explorer.

git-id : 98ade832d053f6bfca4d0dd2ab0cd1c88531721d
revision : r2152
Author : Sergey Radionov
Date: Mon Jan 16 13:22:44 2012 -0800

Fix win32 pthread_cond_signal
Isn't used by x264 currently, so didn't cause a problem.
Fix backported from libav.

git-id : a3f44077dc238dea92c0894d352b5a8723b9201b
revision : r2151
Author : Mans Rullgard
Date: Wed Feb 1 15:55:25 2012 -0800

ARM: align asm functions to 4 bytes.
Some linkers apparently fail to correctly align ARM functions when mixing with Thumb code.

git-id : d3a39c92f5c130cad6d45e9daffa5a2beb145ebb
revision : r2150
Author : Anton Mitrofanov
Date: Sun Jan 22 13:00:23 2012 +0400

Fix normalization of colorspace when input is packed YUV 4:2:2

git-id : 236763e39d8a756db0e8179745396ed88c1bfc2d
revision : r2149
Author : Jason Garrett-Glaser
Date: Sat Jan 21 12:54:40 2012 -0800

Force keyint-min 1 with Blu-ray
Fixes an issue with referencing across I-frames that's prohibited in Blu-ray for some godforsaken reason.

git-id : 8a1189abd1355c4cf6f786fbc2a4b8c22f398710
revision : r2148
Author : Oka Motofumi
Date: Sun Jan 29 20:34:41 2012 +0900

Fix crash in --demuxer y4m with unsupported colorspace

git-id : 1ab0877a40417a2f4f26ff0356e8b02182d9d996
revision : r2147
Author : Anton Mitrofanov
Date: Mon Jan 16 14:02:53 2012 -0800

Fix overread/possible crash with intra refresh + VBV

git-id : bcd41dbcaa4430b2118d9f6828c2b9635cf9d58d
revision : r2146
Author : Loren Merritt
Date: Wed Jan 18 15:47:07 2012 -0800

Fix trellis 2 + subme >= 8
Trellis didn't return a boolean value as it was supposed to.
Regression in r2143-5.

git-id : 748fe16c1303b89d2a1d0378addd83fb4198f51a
revision : r2145
Author : Loren Merritt
Date: Fri Jan 6 15:53:29 2012 +0000

CABAC trellis opts part 4: x86_64 asm
Another 20% faster.
18k->12k codesize.

This patch series may have a large impact on encoding speed.
For example, 24% faster at --preset slower --crf 23 with 720p parkjoy.
Overall speed increase is proportional to the cost of trellis (which is proportional to bitrate, and much more with --trellis 2).

git-id : cfdb36ece729209631f7213506685ae733d7f5d4
revision : r2144
Author : Loren Merritt
Date: Fri Jan 6 15:53:04 2012 +0000

CABAC trellis opts part 3: make some arrays non-static

git-id : 65bd12ae875a768a06b67ec6297dec18323e0768
revision : r2143
Author : Loren Merritt
Date: Thu Dec 22 17:56:06 2011 +0000

CABAC trellis opts part 2: C optimizations

Hoist the branch on coef value out of the loop over node contexts.
Special cases for each possible coef value (0,1,n).
Special case for dc-only blocks.
Template the main loop for two common subsets of nodes, to avoid a bunch of branches about which nodes are live.
Use the nonupdating version of cabac_size_decision in more cases, and omit those bins from the node struct.
CABAC offsets are now compile-time constants.
Change TRELLIS_SCORE_MAX from a specific constant to anything negative, which is cheaper to test.
Remove dct_weight2_zigzag[], since trellis has to lookup zigzag[] anyway.

60% faster on x86_64.
25k->18k codesize.

git-id : e176619d010fc32c970c7ab7a769bbfbe2665f61
revision : r2142
Author : Loren Merritt
Date: Thu Dec 22 17:55:06 2011 +0000

CABAC trellis opts part 1: minor change in output
Due to different tie-break order.

git-id : 4e87f36a0e1a78242f04db611e06f80b6b38d900
revision : r2141
Author : Henrik Gramner
Date: Sun Jan 8 04:14:10 2012 +0100

x86inc improvements for 64-bit

Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

git-id : 84a06e611aff1267a720bf9552b3bcf263bd83b5
revision : r2140
Author : Ilia Valiakhmetov
Date: Sun Jan 15 04:47:58 2012 -0600

High bit depth SSE2/AVX add8x8_idct8 and add16x16_idct8
From Google Code-In.

git-id : c605e3174410ba5c7d1d0a777082e2397734d637
revision : r2139
Author : Edward Wang
Date: Wed Jan 4 15:35:54 2012 -0800

MMX/SSE2/AVX predict_8x16_p, high bit depth fdct8
From Google Code-In.

git-id : 6b06f6d3f7f800dca1a4ea154f54427d5b3cea2b
revision : r2138
Author : Jason Garrett-Glaser
Date: Thu Dec 22 14:03:15 2011 -0800

XOP 8-bit fDCT
Use integer MAC for one of the SUMSUB passes. About a dozen cycles faster for 16x16.

git-id : c4b54c83629bb92af6c4836a8859e9432dc7333a
revision : r2137
Author : Cristian Militaru
Date: Wed Jan 4 12:38:08 2012 -0800

High bit depth intra_sad_x3_4x4
From Google Code-In.

git-id : c032fbaa3801fb4cf8dd1dd95a6479ca5bd262e2
revision : r2136
Author : Jason Garrett-Glaser
Date: Thu Dec 8 13:45:41 2011 -0800

Use a large LUT for CAVLC zero-run bit codes
Helps the most with trellis and RD, but also helps with bitstream writing.
Seems at worst neutral even in the extreme case of a CPU with small L2 cache (e.g. ARM Cortex A8).

git-id : ebb1429e2d24f57aa4ea75284386a15f2eab553e
revision : r2135
Author : Matt Habel
Date: Fri Dec 16 23:16:09 2011 -0800

High bit depth intra_sad_x3_8x8, intra_satd_x3_4x4/8x8c/16x16
Also add an ACCUM macro to handle accumulator-induced add-or-swap more concisely.

git-id : 6d921c5bdefae1a733a3a4c29d88ea15fcece76e
revision : r2134
Author : Shitiz Garg
Date: Sat Dec 3 15:34:57 2011 -0800

MMX 10-bit predict_8x8c_h and predict_8x16c_h
From Google Code-In.

git-id : 47cdaa9c3d8197d4deb711d9bcc4af869ef8a426
revision : r2133
Author : Aaron Schmitz
Date: Wed Nov 30 00:15:45 2011 -0600

Some MBAFF x86 assembly functions.
deblock_chroma_420_mbaff, plus 422/422_intra_mbaff implemented using existing functions.
From Google Code-In.

git-id : 027b05e0a22421e477847506a205a49b151ae5bf
revision : r2132
Author : George Stephanos
Date: Thu Dec 1 16:53:45 2011 -0800

More ARM NEON assembly functions
predict_8x8_v, predict_4x4_dc_top, predict_8x8_ddl, predict_8x8_ddr, predict_8x8_vl, predict_8x8_vr, predict_8x8_hd, predict_8x8_hu.
From Google Code-In.

git-id : 658a3585b74f77fd8f78588f3f39e0abefb104c4
revision : r2131
Author : Ilia
Date: Mon Nov 28 05:20:09 2011 -0800

More 4:2:2 asm functions
High bit depth version of deblock_h_chroma_422.
Regular and high bit depth versions of deblock_h_chroma_intra_422.
High bit depth pixel_vsad.
SSE2 high bit depth and MMX 8-bit predict_8x8_vl.
Our first GCI patch this year!

git-id : 978abe065737089913feccffece483bc69a9e5b0
revision : r2130
Author : Henrik Gramner
Date: Thu Dec 8 16:14:35 2011 +0100

SSE2 and SSSE3 versions of sub8x16_dct_dc
Also slightly faster sub8x8_dct_dc

git-id : 61a78a1b417595c4b5d7ef6831692904a243a9fc
revision : r2129
Author : Steven Walters
Date: Mon Dec 5 08:46:34 2011 -0500

Resize filter updates
Use AVPixFmtDescriptors to pick the most compatible x264 csp for any pixel format.
Fix deprecated use of av_set_int.
Now requires libavutil >= 51.19.0

git-id : bc6c98cf4f76c779c8c07f43aa97ac29b1150bc0
revision : r2128
Author : Oka Motofumi
Date: Thu Jan 5 14:23:50 2012 -0800

Add out-of-tree build support

git-id : f33c8cb0f8fff7a83100b3e9d15baba53c6f6a35
revision : r2127
Author : Anton Mitrofanov
Date: Fri Dec 16 18:17:00 2011 +0400

Limit SSIM to 100db
Avoids floating point error for infinite SSIM (lossless).

git-id : c0d698859c36be611d465f968762f042853be817
revision : r2126
Author : Reynaldo H. Verdejo Pinochet
Date: Wed Jan 4 13:16:12 2012 -0300

Fix wrong conditional inclusion of inttypes.h
inttypes.h is required by encoder/ratecontrol.c for SCNxxx macros, and HAVE_STDINT_H does not imply having inttypes.h.
stdint.h is a subset of inttypes.h, but this isn't enough for x264.
This change fixes building x264 with Android's toolchain.

git-id : b081d179e741ceffee2217f6fda06779693dce56
revision : r2125
Author : Anton Mitrofanov
Date: Wed Dec 21 11:08:56 2011 +0400

Fix crash with sliced threads and input height <= 112

git-id : 64da5f9df46ac33a5a6b56ca1510d2082e6fbb62
revision : r2124
Author : Phillip Blucas
Date: Mon Dec 19 17:43:41 2011 -0600

Fix loading custom 8x8 chroma quant matrices in 4:4:4

git-id : 4c08e42504af81cdbe5789a309e868ca8eda2c1f
revision : r2123
Author : Anton Mitrofanov
Date: Fri Dec 16 01:48:07 2011 +0400

Fix PCM cost overflow

git-id : 489a9b2d04c4828877930d2a9104ce93dde8cb85
revision : r2122
Author : Anton Mitrofanov
Date: Fri Dec 9 01:54:22 2011 +0400

Fix overflow in 8-bit x86 vsad asm function

git-id : c291a9d09263708e9d9f02e28f8442fdbe46bb06
revision : r2121
Author : Anton Mitrofanov
Date: Wed Dec 7 19:14:52 2011 +0400

Fix crash in --fullhelp when compiled against recent ffmpeg
Don't assume all pixel formats have a description.

git-id : 0c7dab9c2a106ce3ee5d6ad7282afb49e1cc3954
revision : r2120
Author : Jason Garrett-Glaser
Date: Tue Dec 6 14:39:21 2011 -0800

Fix regression in r2118
Broke trellis with i16x16 macroblocks.

git-id : 0637cd67cb245fce5ba190fa4b9c341319ea2b37
revision : r2119
Author : Jason Garrett-Glaser
Date: Wed Nov 30 13:02:12 2011 -0800

Modify MBAFF chroma deblock functions to handle U/V at the same time
Allows for more convenient asm implementations.

git-id : 67f1fdc4d9c030568eac8cf9ab9d0bb249f520db
revision : r2118
Author : Jason Garrett-Glaser
Date: Thu Nov 10 16:16:13 2011 -0800

CABAC trellis optimizations: use SIMD quant
Significant speed increase, minor change in output due to rounding.

git-id : e047b3c475cd42b6647397a244e239ebfca53bf6
revision : r2117
Author : Steven Walters
Date: Sun Nov 6 09:48:30 2011 -0800

YUV range detection and support for x264CLI
Two new options: --input-range and --range.
--input-range forces the range of the input in case of misdetection; auto by default.
-- range sets the range of the output; x264cli will convert if necessary, TV by default.
--fullrange is now removed as a CLI option (but the libx264 API is unchanged).

git-id : 00df989cc06208050230756525633438d76b5a6a
revision : r2116
Author : Kieran Kunhya
Date: Fri Nov 4 20:09:13 2011 +0000

Pass through user data

git-id : 04a0aeefd2f5b152c5dbca4a1c6569bd27c9f721
revision : r2115
Author : Jason Garrett-Glaser
Date: Thu Oct 27 14:05:56 2011 -0700

Remove unpredictable branch in CABAC dqp

git-id : 4185ee883b04d9cee57a64fdebd153830b7b27ba
revision : r2114
Author : Loren Merritt
Date: Sun Oct 23 23:15:11 2011 +0000

x86inc: AVX symmetry optimization
3-arg AVX ops with a memory arg can only have it in src2,
whereas SSE emulation of 3-arg prefers to have it in src1 (i.e. the move).
So, if the op is symmetric and the wrong one is memory, swap them.
Eliminates redundant moves in some cases when using 3-operand without AVX with memory arguments.
Also fix movss and movsd in some cases, and flag shufps correctly as float.

git-id : cc129adcaaf5604f3d4fea9ebcb289403192a741
revision : r2113
Author : Anton Mitrofanov
Date: Tue Nov 29 13:45:13 2011 -0800

checkasm: shut up gcc warnings, fix some naming of functions in results

git-id : f0ccc98bb747b8ee0fe9329f4205cf382788bb89
revision : r2112
Author : Mans Rullgard
Date: Mon Nov 28 16:29:12 2011 -0800

checkasm: fix build on ARM
Because of how ALIGNED_ARRAY_16 is defined on ARM, array initialisers cannot be used here. Use memset() instead.

git-id : d8d8e756b1fee72b4771761d6aa4cfb31edc0b67
revision : r2111
Author : Anton Mitrofanov
Date: Sat Nov 12 01:31:49 2011 +0400

Improve makefile rules
Remove the need for "make clean" after most reconfigures.

git-id : e6d33a931c08918e78dcae97e4d80d0c3411bf2c
revision : r2110
Author : Anton Mitrofanov
Date: Sat Nov 12 00:47:48 2011 +0400

Mark some local functions as static, cosmetics

git-id : e0c11dc6e283569606aaa97767401c6a13c2529d
revision : r2109
Author : Anton Mitrofanov
Date: Fri Nov 11 23:19:02 2011 +0400

Fix crash if timecode file opening fails

git-id : a14db080c3fdba4cadc38152a292bb1fa216d50e
revision : r2108
Author : Fabian Greffrath
Date: Fri Nov 11 13:25:43 2011 -0800

Configure: force PIC for shared build on PARISC and MIPS

git-id : 6a0bd421bf5fd006012ddcd1be2072a8736b2d27
revision : r2107
Author : Anton Mitrofanov
Date: Sat Oct 22 19:41:07 2011 +0400

Improve yasm version check
Previous check allowed certain earlier versions that weren't fully compatible.

git-id : 07efeb45db224b7757880d4d63bb549fb454f6db
revision : r2106
Author : Jason Garrett-Glaser
Date: Tue Oct 18 14:30:26 2011 -0700

Add fenc prefetching to adaptive quant
Many fewer cache misses, faster adaptive quant.

git-id : 81a99842b76834c11a46438f354d7f2a9e89752a
revision : r2105
Author : Jason Garrett-Glaser
Date: Tue Oct 18 14:14:03 2011 -0700

Split prefetch_fenc between colorspaces
Add 4:2:2 version.

git-id : 9f872e137c16e8ee0a46d8ca00ac5d670c219d5f
revision : r2104
Author : Jason Garrett-Glaser
Date: Tue Oct 11 17:04:32 2011 -0700

Some more 4:2:2 x86 asm
coeff_last8, coeff_level_run8, var2_8x16, predict_8x16c_dc, satd_4x16, intra_mbcmp_8x16c_x3, deblock_h_chroma_422

git-id : f52aa86c184d69b4e97b0f63f5f27166b19aa280
revision : r2103
Author : Loren Merritt
Date: Tue Oct 11 18:12:43 2011 +0000

Remove obsolete versions of intra_mbcmp_x3
intra_mbcmp_x3 is unnecessary if x9 exists (SSSE3 and onwards).

git-id : 2f0384dcd68bb85f98fb566b70b863b40082c83e
revision : r2102
Author : Loren Merritt
Date: Mon Oct 10 05:42:36 2011 +0000

SSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sa8d_x9)
x86_64 only for now, due to register requirements (like sa8d_x3).

i8x8 analysis cycles (per partition):
penryn sandybridge bulldozer
616->600 482->374 418->356 preset=faster
892->632 725->387 598->373 preset=medium
948->650 789->409 673->383 preset=slower

git-id : 46d1f3ab24e8aead7ccb3f89a382e7c92721ba96
revision : r2101
Author : Jason Garrett-Glaser
Date: Fri Sep 30 19:09:19 2011 -0700

SSSE3/SSE4/AVX 9-way fully merged i8x8 analysis (sad_x9)
~3 times faster than current analysis, plus (like intra_sad_x9_4x4) analyzes all modes without shortcuts.

git-id : 077d4532c9d9c7914e31ef9250096cc379042bcb
revision : r2100
Author : Loren Merritt
Date: Wed Oct 5 13:29:21 2011 -0700

Merge i4x4 prediction with intra_mbcmp_x9_4x4
Avoids a redundant prediction after analysis.

git-id : 55a9d38348a1d0bee687293e194b018b21a6ad96
revision : r2099
Author : Jason Garrett-Glaser
Date: Wed Oct 5 13:17:31 2011 -0700

Inline i4x4/i8x8 encode into intra analysis
Larger code size, but faster.

git-id : afd9bc24823b0f9f0727c0332a0db24db66876d2
revision : r2098
Author : Jason Garrett-Glaser
Date: Wed Sep 21 17:12:10 2011 -0700

Initial XOP and FMA4 support on AMD Bulldozer
~10% faster Hadamard functions (SATD/SA8D/hadamard_ac) plus other improvements.

git-id : 8cf50493e5b80d9e33aaf0c9d55551d6411e1be4
revision : r2097
Author : Mans Rullgard
Date: Tue Sep 27 21:14:14 2011 +0400

ARM: update NEON chroma deblock functions to NV12 pixel format

git-id : f7e640a33fe66838ecece1da267b566342f3be24
revision : r2096
Author : Sean McGovern
Date: Mon Oct 17 12:45:15 2011 -0700

Add /usr/lib/{64/}values-xpg6.o to $LDFLAGS on Solaris
This is required for POSIX.1-2001 compliance.

git-id : a0ce295b33a1ba87f732e661e22dba1a307e3405
revision : r2095
Author : Sean McGovern
Date: Mon Oct 17 12:44:03 2011 -0700

Fix linker test for -Bsymbolic
The Solaris linker only accepts -Bsymbolic for objects compiled in dynamic mode (i.e. shared objects), so pass -shared to gcc.
Additionally, for x86_32 unresolved textrels cause a linker error so mark the .text section as 'impure'.

git-id : 9aa0f65f72514cfa8c478fbffdafd937c70b5f5d
revision : r2094
Author : Sean McGovern
Date: Mon Oct 17 12:43:28 2011 -0700

Add $SOFLAGS to exported SOFLAGS make variable

git-id : d32d091d519c5ff710b2fb7b2f255fd510e4a6d8
revision : r2093
Author : Henrik Gramner
Date: Sat Sep 24 15:56:08 2011 +0200

Allow setting a chroma format at compile time
Gives a slight speed increase and significant binary size reduction when only one chroma format is needed.

git-id : 6eac7c35a5da6c176cedc2644c53ff9d019f7fb0
revision : r2092
Author : Harfe Leier
Date: Fri Sep 30 12:49:33 2011 -0700

Improve profile help
List high422/high444 profiles, and don't show non-high-bit-depth profiles in high bit depth builds.

git-id : 896fff46dd9a0fba9bb5285d536d03e0d5f86da2
revision : r2091
Author : Yusuke Nakamura
Date: Thu Oct 20 03:09:51 2011 +0900

Fix infinite loop parsing TDecimate Mode 3 timecode v1 files

git-id : 2697313a6f223f0b270ba9533c6b47967fa7d246
revision : r2090
Author : Jason Garrett-Glaser
Date: Mon Oct 10 17:44:31 2011 -0700

Fix some integer overflows/signedness errors found by IOC
The only real bug here is in slicetype.c, which may or may not affect real encodes.

git-id : d2594831dd858d6ed8efcfd4160ea5ac7f1357c7
revision : r2089
Author : Jason Garrett-Glaser
Date: Wed Oct 12 09:16:32 2011 -0700

Fix pixel_var2 with 4:2:2 encoding
Might have caused artifacts or suboptimal chroma compression.

git-id : 1fe87df5857266f0099a473962e7f32a89d9b701
revision : r2088
Author : Anton Mitrofanov
Date: Sun Oct 9 19:14:16 2011 +0400

Fix chroma intra analysis in 4:4:4 lossless mode

git-id : c4644d878dc82f8812482f660f651948d53d4b43
revision : r2087
Author : Anton Mitrofanov
Date: Sun Oct 9 01:13:29 2011 +0400

Fix use of uninitialized MVs in sub8x8 RDO

git-id : f8825a4a6f827bb28fffb75a7cc1a6c386088828
revision : r2086
Author : Fabian Greffrath
Date: Fri Oct 7 19:04:17 2011 -0700

Fix detection of Alpha CPU arch on alphaev67

git-id : 8a62835b0b669e79c75b6522b1f7632fe16105d9
revision : r2085
Author : Jason Garrett-Glaser
Date: Wed Sep 14 14:53:04 2011 -0700

Optimize x86 asm for Intel macro-op fusion
That is, place all loop counter tests right before their conditional jumps.

git-id : f7cd45b9bbc1f7f5bfd2df6421e79895655552ab
revision : r2084
Author : Jason Garrett-Glaser
Date: Mon Sep 12 11:51:23 2011 -0700

CAVLC: clean up and restructure
Somewhat faster CAVLC and RD bit-counting.

git-id : 4a89c200a2c50f17bcf657f3254f6f05b2c0df41
revision : r2083
Author : Jason Garrett-Glaser
Date: Thu Sep 8 17:27:02 2011 -0700

CABAC: clean up and restructure
Somewhat faster CABAC and RD bit-counting.

git-id : 62fc472989765a6bea4485c8988d7b246e7ceeb5
revision : r2082
Author : Jason Garrett-Glaser
Date: Sun Sep 4 11:31:29 2011 +0200

Some initial 4:2:2 x86 asm

git-id : bb9216dc319a39eb6f2a5508a98e36d6492ffa7e
revision : r2081
Author : Henrik Gramner
Date: Fri Aug 26 15:57:04 2011 +0200

4:2:2 encoding support

git-id : b7fa2ff50ef74eb8a27e675f8e418754965115e2
revision : r2080
Author : Loren Merritt
Date: Mon Aug 15 18:18:55 2011 +0000

SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9)

i4x4 analysis cycles (per partition):
penryn sandybridge
184-> 75 157-> 54 preset=superfast (sad)
281->165 225->124 preset=faster (satd with early termination)
332->165 263->124 preset=medium
379->165 297->124 preset=slower (satd without early termination)

This is the first code in x264 that intentionally produces different behavior
on different cpus: satd_x9 is implemented only on ssse3+ and checks all intra
directions, whereas the old code (on fast presets) may early terminate after
checking only some of them. There is no systematic difference on slow presets,
though they still occasionally disagree about tiebreaks.

For ease of debugging, add an option "--cpu-independent" to disable satd_x9
and any analogous future code.

git-id : b1bb23c7570bded53b698800e48c755240d4aa35
revision : r2079
Author : Loren Merritt
Date: Mon Aug 15 17:43:42 2011 +0000

Faster intra_mbcmp_x3 for versions without dedicated asm
Select asm subroutines more intelligently in the wrapper functions.

git-id : 265cfae4441ffe76cf6885e5a2448b945deb9b0c
revision : r2078
Author : Loren Merritt
Date: Sat Aug 13 19:01:22 2011 +0000

Optimize x86 intra_predict_4x4 and 8x8

High bit depth Penryn, Sandybridge cycles:
4x4_ddl: 11->10, 9-> 8
4x4_ddr: 15->13, 12->11
4x4_hd: , 15->12
4x4_hu: , 14->13
4x4_vr: 15->14, 14->12
8x8_ddl: 32->19, 19->14
8x8_ddr: 42->19, 21->14
8x8_hd: , 15->13
8x8_hu: 21->17, 16->12
8x8_vr: 33->19,

8-bit Penryn, Sandybridge cycles:
4x4_ddr: 24->15,
4x4_hd: 24->16,
4x4_hu: 23->15,
4x4_vr: 23->16,
4x4_vl: 10-> 9,
8x8_ddl: 23->15,
8x8_hd: , 17->14
8x8_hu: , 15->14
8x8_vr: 20->16, 17->13

git-id : fe5729ddfc8e574f27d8678b992d4e356bab84c9
revision : r2077
Author : Loren Merritt
Date: Sat Aug 13 06:44:28 2011 +0000

Use realistic alignment for intra pred benchmarks in checkasm

git-id : 926a03a9c1f48d0fbd54b0e802d740774c100a78
revision : r2076
Author : Yusuke Nakamura
Date: Wed Sep 21 01:15:38 2011 +0900

Fix frame packing SEI with --frame-packing 0
According to the spec, when frame_packing_arrangement_type is equal to 0, quincunx_sampling_flag shall be equal to 1.

git-id : 03a542a6ca08ba3f96d9e1bf0e36fa21dc9e7762
revision : r2075
Author : Oka Motofumi
Date: Mon Sep 5 11:50:37 2011 +0900

Fix install/uninstall shared libs if SYS is WINDOWS/CYGWIN

git-id : 2641b9e20f5effc4e22ca600df00c79e5f60b446
revision : r2074
Author : Reinhard Tartler
Date: Wed Aug 10 00:16:46 2011 -0700

Add Hurd support to configure

git-id : dabc19ec94f56660fac10aae60da2b711e84368f
revision : r2073
Author : Loren Merritt
Date: Sat Aug 13 00:39:35 2011 +0000

Optimize x86 intra_satd_x3_*
~7% faster.

git-id : 419f1bf97acffc3c9a943d53cc67a7b282b02cce
revision : r2072
Author : Loren Merritt
Date: Fri Aug 12 19:13:07 2011 +0000

Optimize x86 intra_sa8d_x3_8x8
~40% faster.
Also some other minor asm cosmetics.

git-id : b581755785751a4c05db0dcb020d5daff278cfe0
revision : r2071
Author : Loren Merritt
Date: Fri Aug 12 02:15:46 2011 +0000

Scale interlaced refs/mvs for mvr predictors
Slightly improves compression and fixes a Valgrind error.

git-id : 0c2ddc1748f582333b9afcf097b9eb0cd99c8dff
revision : r2070
Author : Loren Merritt
Date: Thu Aug 11 15:03:12 2011 +0000

Optimize predict_8x8_filter and incidentally remove a valgrind false-positive

git-id : afeb24049e5dc3c88dc9aceb18be4ec3897476f7
revision : r2069
Author : Anton Mitrofanov
Date: Mon Aug 15 12:22:18 2011 +0400

Don't override flat SSE2 dequant functions with non-flat AVX ones
Slightly faster.

git-id : d89a7d5dd4496e38da657879574b4eb3fbde5071
revision : r2068
Author : Loren Merritt
Date: Mon Aug 8 13:40:53 2011 +0000

Shut up some valgrind false-positives

git-id : 6881f2475a98dca0de8b33e18223d9341f90fb12
revision : r2067
Author : Jason Garrett-Glaser
Date: Tue Aug 16 13:02:24 2011 -0700

Avoid some unnecessary allocations with B-frames/CABAC off

git-id : ddf82cdd4ab3f8172723201e9da22602e27e1204
revision : r2066
Author : Jason Garrett-Glaser
Date: Mon Aug 22 17:07:03 2011 -0700

Fix typo in p8x8 RD analysis
Passed wrong idx to trellis.

git-id : 14de3782abaa278a133887b8980ff7b716bfd159
revision : r2065
Author : Anton Mitrofanov
Date: Sun Aug 21 02:44:45 2011 +0400

Fix invalid memory accesses in x86 lowres_init when width <= 16

git-id : 83326434ca9a60b831cd4c7f15cf7bbc764b3200
revision : r2064
Author : Anton Mitrofanov
Date: Mon Aug 15 12:03:09 2011 +0400

Fix intermediate conversion for YUVJ* pixfmts with 4:4:4 encoding

git-id : 86c5510ea34314274431ff107d67eefc3c3753f8
revision : r2063
Author : Henrik Gramner
Date: Sun Aug 14 13:39:29 2011 +0200

Fix pic_out returned by x264_encoder_encode with 4:4:4

git-id : b232a9dcc5dacf80ad9498ce95e1192c324dcc8b
revision : r2062
Author : Loren Merritt
Date: Thu Aug 11 22:12:26 2011 +0000

Fix zeroing of mvr predictors in bskip blocks

git-id : fd3fb348f9bc1ae2ebac62315da8807554ff3a52
revision : r2061
Author : Loren Merritt
Date: Thu Aug 11 01:33:13 2011 +0000

Fix: chroma planes for weightp analysis were not initted if U early-terminates and V doesn't.

git-id : ada894da37b88f9f92b8747d91245fe88944a8fb
revision : r2060
Author : Henrik Gramner
Date: Wed Aug 10 20:25:07 2011 +0200

Expand borders before chroma weightp analysis
Prevents mc from using uninitialized source pixels.

git-id : b21640e2fb7808f4b8e3184156be19ee58dccc53
revision : r2059
Author : Henrik Gramner
Date: Wed Aug 10 19:29:14 2011 +0200

Another 4:4:4 chroma weightp bug fix

git-id : 317d883819667eac3b2268cfb313ff177ffa7131
revision : r2058
Author : Jason Garrett-Glaser
Date: Wed Aug 10 00:17:26 2011 -0700

Fix typo in help

git-id : 0ba8a9c6973897ec35e1a5d241a71f4f5a4f81aa
revision : r2057
Author : Jason Garrett-Glaser
Date: Sat Aug 6 10:45:47 2011 -0700

Improve support for varying resolution between passes
Should give much better quality, but still doesn't support MB-tree yet.
Also check for the same interlaced options between passes.
Various minor ratecontrol cosmetics.

git-id : fd681ea8e719ab9a26a20220e5eece6651f93ebd
revision : r2056
Author : Loren Merritt
Date: Sun Aug 7 22:57:27 2011 +0000

asm cosmetics: base-4 constants for shuffles

git-id : a718aad0045b2930d871fd7b6bf33fc0192d526f
revision : r2055
Author : Loren Merritt
Date: Wed Aug 3 14:58:50 2011 +0000

Enable some existing asm functions that were missing function pointers
pixel_ads1_avx, predict_8x8_hd_avxx
High bit depth mc_copy_w8_sse2, denoise_dct_avx, prefetch_fenc/ref, and several pixel*sse4.

git-id : 444dae123cc0f41508a0172e29e83327aaed47e6
revision : r2054
Author : Loren Merritt
Date: Wed Aug 3 14:57:06 2011 +0000

Remove some unused, broken, and/or useless functions
Unused frame_sort.
Unused x86_64 dequant_4x4dc_mmx2, predict_8x8_vr_mmx2.
Unused and broken high_depth integral_init*h_sse4, optimize_chroma_*, dequant_flat_*, sub8x8_dct_dc_*, zigzag_sub_*.
Useless high_depth dequant_sse4, dequant_dc_sse4.

git-id : 43f62a2f07a0f5d2667e6fa89f5de87b2e57ddd2
revision : r2053
Author : Loren Merritt
Date: Wed Aug 3 14:56:27 2011 +0000

asm cosmetics: merge all the variants of ABS macros

git-id : 4b429c69797ebb88c89fd3dfe7555f0db787d6dd
revision : r2052
Author : Loren Merritt
Date: Wed Aug 3 14:53:29 2011 +0000

asm cosmetics part 2
These changes were split out of the cpuflags commit because they change the output executable.

git-id : c0cda6aa0bf775f453e9cea319203975c2875773
revision : r2051
Author : Loren Merritt
Date: Wed Aug 3 14:46:41 2011 +0000

asm cosmetics: INIT_MMX/XMM/YMM now support a cpuflags argument

Reduces the number of macro args that need to be passed around.
Allows multiple implementations of a given macro (e.g. PALIGNR) to check
cpuflags at the location where the macro is defined, instead of having
to select implementations by %define at toplevel.
Remove INIT_AVX, as it's replaced by "INIT_XMM avx".

This commit does not change the stripped executable.

git-id : 12f12a268d47b12d9e26f57919c5022e2f234f9d
revision : r2050
Author : Loren Merritt
Date: Wed Aug 3 14:43:34 2011 +0000

Import x86inc.asm patches from libav

git-id : b78735138b32e667d5710dbd63aa93ea6498487f
revision : r2049
Author : Loren Merritt
Date: Wed Aug 3 14:42:12 2011 +0000

Cosmetics: s/mmxext/mmx2/

git-id : 59cb2eb53913ac463f7aef99332d3558a95ff040
revision : r2048
Author : Henrik Gramner
Date: Sun Aug 7 11:58:36 2011 +0200

Fix two bugs in 4:4:4 chroma weightp analysis
Caused slightly worse compression.

git-id : fafa6a2fca75967ff90f98a8acc59d3ffdd8bb7f
revision : r2047
Author : Loren Merritt
Date: Wed Aug 3 14:40:01 2011 +0000

Fix "--asm avx"
Previously required "--asm sse2fast,fastshuffle,sse4.2,avx".

git-id : 47f2263e0fea2f70d22b1cbef699e283d6063764
revision : r2046
Author : Anton Mitrofanov
Date: Fri Aug 5 15:59:20 2011 +0400

Re-add support for glibc <2.6, which doesn't have CPU_COUNT

git-id : 9949438453f2f11d1f4cb5d39cfc884c28c98042
revision : r2045
Author : Yasuhiro Ikeda
Date: Tue Aug 2 08:59:15 2011 +0900

Avoid using deprecated libavformat functions
Replace av_find_stream_info with avformat_find_stream_info.
Now requires libavformat 53.3.0 or newer.

git-id : 392e762151d1657abc8ae5d345c144c3ef280819
revision : r2044
Author : Henrik Gramner
Date: Wed Jul 27 02:23:12 2011 +0200

Use assembly versions of some deblocking functions in MBAFF

git-id : 1c7dbec5f17ee091bae445584a4f05783e4aae9e
revision : r2043
Author : Anton Mitrofanov
Date: Thu Jul 28 00:26:27 2011 +0400

Move X264_VERSION / X264_POINTVER from config.h to x264_config.h
This makes them available to external programs as part of the public API.

git-id : 178455cd4df5f5a36f39c49c8f7b03965b269a91
revision : r2042
Author : Henrik Gramner
Date: Fri Jul 29 20:15:52 2011 +0200

Fix padding bug in x264_expand_border_mbpair

git-id : e927cc334c74a3265e912bafaadb7009e44c27f8
revision : r2041
Author : Yusuke Nakamura
Date: Fri Jul 29 23:39:26 2011 +0900

Timecode parsing: Add missing initialization
Fix crash when failed to parse timecode file before malloc pts.
Fix detection of user timebase considered to be exceeding H.264 maximum.

git-id : 75eb1cde5b489a030e172ccc0b6724939c095865
revision : r2040
Author : Anton Mitrofanov
Date: Thu Jul 28 13:37:24 2011 +0400

Fix crash with high bitdepth 4:2:0 input

git-id : 45b6e459debd4644b0863511fd0c8f55549bc9d7
revision : r2039
Author : Daniel Kang
Date: Tue Jul 26 21:57:39 2011 -0400

x86 asm cosmetics
Use FDEC_STRIDEB where appropriate.

git-id : c28471e2500a6071aab8f9c3adac104f201e5f2a
revision : r2038
Author : Jason Garrett-Glaser
Date: Tue Jul 26 07:40:23 2011 -0700

Fix a bug in lossless sub-8x8 RD
Caused crashes in rare cases with lossless encoding. Regression in 4:4:4.

git-id : f8ebd4ab2679c6eedd47bb7f138533259020984b
revision : r2037
Author : Jason Garrett-Glaser
Date: Mon Jul 18 23:10:30 2011 -0700

Improved p8x4/4x8 search decision
Use the same thresholding as for p16x8/8x16.
Does p8x4/4x8 search more often, for a small compression improvement.

git-id : 9977b595fb7591e3616fa98677baf6e84e0f7029
revision : r2036
Author : Dan Larkin
Date: Wed Jul 13 12:45:23 2011 -0500

Add --subme 11, which disables all early terminations in analysis
Necessary for a future trellis mode decision/motion estimation patch.
Also add the slowest presets to the regression test.

git-id : 207ca3e95b38d734400e12f57faa16b778f0706c
revision : r2035
Author : Dan Larkin
Date: Wed Jul 13 11:33:48 2011 -0500

Some trivial changes to RD thresholds
The output-changing portion of the next patch.