17. Real-time Video Encoding on FPGAs

High quality and efficient video encoding is critical for modern video streaming services. Bodo and Xilinx are collaborating on a video encoding solution called Efficient Elastic Ensemble (E3) that provides high quality encoding in real-time at a fraction of total cost of ownership (TCO) of existing solutions. E3 combines the simplicity of Python and efficiency of FPGA encoding using Bodo’s automatic parallelization, workload distribution, and accelerator management.

The figure below demonstrates the components of this solution:

  1. The video application is written in standard Python code without any API changes.

  2. The Bodo compiler recognizes the operations that can be offloaded to FPGAs and generates optimized FPGA-enabled parallel code.

  3. The Bodo runtime manages data and computation across the FPGAs in the compute cluster.

  4. The compute cluster can have any number of FPGAs which are fully managed by Bodo elastically.

Efficient Elastic Ensemble (E3) Architecture

Efficient Elastic Ensemble (E3) Architecture

A simple E3 program has the following structure:

  1. Load input video file

  2. Process and encode the video

  3. Write output video to file

Here is an example using Numpy to load an uncompressed video:

@bodo.jit
def process_video(raw_input_filename, encoded_output_filename):
    # load data
    data = np.fromfile(raw_input_filename, np.uint8)
    # reshape to array of frames
    data = data.reshape(len(data) // FRAME_SIZE, FRAME_SIZE)
    # compute output video
    result = ...
    # write output video
    result.tofile(encoded_output_filename)

This program is written as regular sequential Python and is parallelized automatically by the JIT compiler. This enables fully elastic and scalable execution due to Bodo’s transformations:

  1. Bodo splits the input file read (np.fromfile) across processors to provide scalable I/O.

  2. The reshape operation (data.reshape) is performed in parallel while handling the frame boundaries properly.

  3. Computation is parallelized and offloaded to FPGA devices automatically.

  4. The output is written to a file in parallel (result.tofile), which essentially “stitches” the data chunks together.

To execute a E3 program all you need to do is execute your normal Python program in MPI and indicate the number of processes you want.

For example to use 8 processes, you would execute the command:

mpiexec -n 8 python -u E3.py video_args

The components of this command are:

  1. mpiexec -n 8 - Create 8 MPI processes.

  2. python -u E3.py video_args - Execute your python program using the MPI processes.

This page will include more details of supported APIs and operations as incorporated in future versions of Bodo.