I3D (RGB + Flow)

PWC-Net is deprecated

The default behavior has changed in the recent version. Now, the optical flow is extracted with RAFT instead of PWC-Net (deprecated).

The Inflated 3D (I3D) features are extracted using a pre-trained model on Kinetics 400. Here, the features are extracted from the second-to-the-last layer of I3D, before summing them up. Therefore, it outputs two tensors with 1024-d features: for RGB and flow streams. By default, it expects to input 64 RGB and flow frames (224x224) which spans 2.56 seconds of the video recorded at 25 fps. In the default case, the features will be of size Tv x 1024 where Tv = duration / 2.56.

Please note, this implementation uses RAFT optical flow extraction instead of the TV-L1 algorithm, which was used in the original I3D paper as TV-L1 hampers the speed. Yet, it might possibly lead to worse peformance. Our tests show that the performance is reasonable. You may check if the predicted distribution satisfies your requirements for an application. To get the predictions that were made by the classification head, providing the --show_pred flag.

Supported Arguments

Argument	Default	Description
`stack_size`	`64`	The number of frames from which to extract features (or window size).
`step_size`	`64`	The number of frames to step before extracting the next features.
`streams`	`null`	I3D is a two-stream network. By default (`null` or omitted) both RGB and flow streams are used. To use RGB- or flow-only models use `rgb` or `flow`.
`flow_type`	`raft`	By default, the flow-features of I3D will be calculated using optical from calculated with RAFT (originally with TV-L1).
`extraction_fps`	`null`	If specified (e.g. as `5`), the video will be re-encoded to the `extraction_fps` fps. Leave unspecified or `null` to skip re-encoding.
`device`	`"cuda:0"`	The device specification. It follows the PyTorch style. Use `"cuda:3"` for the 4th GPU on the machine or `"cpu"` for CPU-only.
`video_paths`	`null`	A list of videos for feature extraction. E.g. `"[./sample/v_ZNVhz7ctTq0.mp4, ./sample/v_GGSY1Qvo990.mp4]"` or just one path `"./sample/v_GGSY1Qvo990.mp4"`.
`file_with_video_paths`	`null`	A path to a text file with video paths (one path per line). Hint: given a folder `./dataset` with `.mp4` files one could use: `find ./dataset -name "*mp4" > ./video_paths.txt`.
`on_extraction`	`print`	If `print`, the features are printed to the terminal. If `save_numpy` or `save_pickle`, the features are saved to either `.npy` file or `.pkl`.
`output_path`	`"./output"`	A path to a folder for storing the extracted features (if `on_extraction` is either `save_numpy` or `save_pickle`).
`keep_tmp_files`	`false`	If `true`, the reencoded videos will be kept in `tmp_path`.
`tmp_path`	`"./tmp"`	A path to a folder for storing temporal files (e.g. reencoded videos).
`show_pred`	`false`	If `true`, the script will print the predictions of the model on a down-stream task. It is useful for debugging.

Quick Start

Ensure that the environment is properly set up before proceeding. See Setup Environment for detailed instructions.

Activate the environment

conda activate video_features

and extract features from ./sample/v_GGSY1Qvo990.mp4 video and show the predicted classes

python main.py \
    feature_type=i3d \
    device="cuda:0" \
    video_paths="[./sample/v_GGSY1Qvo990.mp4]" \
    show_pred=true

Examples

Make sure the environment is set up correctly. For instructions, refer to Setup Environment.

Activate the environment

conda activate video_features

The following will extract I3D features for sample videos. The features are going to be extracted with the default parameters.

python main.py \
    feature_type=i3d \
    device="cuda:0" \
    video_paths="[./sample/v_ZNVhz7ctTq0.mp4, ./sample/v_GGSY1Qvo990.mp4]"

The video paths can be specified as a .txt file with paths

python main.py \
    feature_type=i3d \
    device="cuda:0" \
    file_with_video_paths=./sample/sample_video_paths.txt

It is also possible to extract features from either rgb or flow modalities individually (--streams) and, therefore, increasing the speed

python main.py \
    feature_type=i3d \
    streams=flow \
    device="cuda:0" \
    file_with_video_paths=./sample/sample_video_paths.txt

The features can be saved as numpy arrays by specifying --on_extraction save_numpy or save_pickle. By default, it will create a folder ./output and will store features there

python main.py \
    feature_type=i3d \
    device="cuda:0" \
    on_extraction=save_numpy \
    file_with_video_paths=./sample/sample_video_paths.txt

You can change the output folder using --output_path argument.

Also, you may want to try to change I3D window and step sizes

python main.py \
    feature_type=i3d \
    device="cuda:0" \
    stack_size=24 \
    step_size=24 \
    file_with_video_paths=./sample/sample_video_paths.txt

By default, the frames are extracted according to the original fps of a video. If you would like to extract frames at a certain fps, specify --extraction_fps argument.

python main.py \
    feature_type=i3d \
    device="cuda:0" \
    extraction_fps=25 \
    stack_size=24 \
    step_size=24 \
    file_with_video_paths=./sample/sample_video_paths.txt

A fun note, the time span of the I3D features in the last example will match the time span of VGGish features with default parameters (24/25 = 0.96).

If --keep_tmp_files is specified, it keeps them in --tmp_path which is ./tmp by default. Be careful with the --keep_tmp_files argument when playing with --extraction_fps as it may mess up the frames you extracted before in the same folder.

Credits

License

The wrapping code is MIT and the port of I3D weights from TensorFlow to PyTorch. RAFT BSD 3-Clause.