The video encoder will accept N frames before creating any output. In some cases, N will be 1, and you will see the output frame shortly after providing one input frame. Other codecs will want to collect a bright bit of video data before embarking on a release. It seems you were able to solve your current situation by doubling the frames and discarding half the output, but you should know that different devices and different codecs will behave differently (assuming portability is a problem).
CSD BUFFER_FLAG_CODEC_CONFIG. MediaCodec , . ( , VP8, .) AVC . CSD, .
, API , CSD, .
, , 0,1,2 0,2,1. - , . PTS , "" , .