Demo for paper "First Order Motion Model for Image Animation"

Clone repository

!git clone https://github.com/AliaksandrSiarohin/first-order-model

Cloning into 'first-order-model'...
remote: Enumerating objects: 85, done.[K
remote: Total 85 (delta 0), reused 0 (delta 0), pack-reused 85[K
Unpacking objects: 100% (85/85), done.

cd first-order-model

/content/first-order-model

Mount your Google drive folder on Colab

from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive

Add folder https://drive.google.com/drive/folders/1kZ1gCnpfU0BnpdU47pLM_TQ6RypDDqgw?usp=sharing to your google drive.

Load driving video and source image

import imageio
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from skimage.transform import resize
from IPython.display import HTML
import warnings
warnings.filterwarnings("ignore")

source_image = imageio.imread('/content/gdrive/My Drive/first-order-motion-model/02.png')
driving_video = imageio.mimread('/content/gdrive/My Drive/first-order-motion-model/04.mp4')


#Resize image and video to 256x256

source_image = resize(source_image, (256, 256))[..., :3]
driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]

def display(source, driving, generated=None):
    fig = plt.figure(figsize=(8 + 4 * (generated is not None), 6))

    ims = []
    for i in range(len(driving)):
        cols = [source]
        cols.append(driving[i])
        if generated is not None:
            cols.append(generated[i])
        im = plt.imshow(np.concatenate(cols, axis=1), animated=True)
        plt.axis('off')
        ims.append([im])

    ani = animation.ArtistAnimation(fig, ims, interval=50, repeat_delay=1000)
    plt.close()
    return ani
    

HTML(display(source_image, driving_video).to_html5_video())

Create a model and load checkpoints

from demo import load_checkpoints
generator, kp_detector = load_checkpoints(config_path='config/vox-256.yaml', 
                            checkpoint_path='/content/gdrive/My Drive/first-order-motion-model/vox-cpk.pth.tar')

Perfrorm image animation

from demo import make_animation
from skimage import img_as_ubyte

predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=True)

#save resulting video
imageio.mimsave('../generated.mp4', [img_as_ubyte(frame) for frame in predictions])
#video can be downloaded from /content folder

HTML(display(source_image, driving_video, predictions).to_html5_video())

100%|██████████| 211/211 [00:07<00:00, 29.03it/s]

In the cell above we use relative keypoint displacement to animate the objects. We can use absolute coordinates instead, but in this way all the object proporions will be inherited from the driving video. For example Putin haircut will be extended to match Trump haircut.

predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=False, adapt_movement_scale=True)
HTML(display(source_image, driving_video, predictions).to_html5_video())

100%|██████████| 211/211 [00:07<00:00, 28.72it/s]

Running on your data

First we need to crop a face from both source image and video, while simple graphic editor like paint can be used for cropping from image. Cropping from video is more complicated. You can use ffpmeg for this.

!ffmpeg -i /content/gdrive/My\ Drive/first-order-motion-model/07.mkv -ss 00:08:57.50 -t 00:00:08 -filter:v "crop=600:600:760:50" -async 1 hinton.mp4

ffmpeg version 3.4.6-0ubuntu0.18.04.1 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.3.0-16ubuntu3)
  configuration: --prefix=/usr --extra-version=0ubuntu0.18.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Input #0, matroska,webm, from '/content/gdrive/My Drive/first-order-motion-model/07.mkv':
  Metadata:
    ENCODER         : Lavf57.83.100
  Duration: 00:14:59.73, start: 0.000000, bitrate: 2343 kb/s
    Stream #0:0(eng): Video: vp9 (Profile 0), yuv420p(tv, bt709), 1920x1080, SAR 1:1 DAR 16:9, 29.97 fps, 29.97 tbr, 1k tbn, 1k tbc (default)
    Metadata:
      DURATION        : 00:14:59.665000000
    Stream #0:1(eng): Audio: aac (LC), 44100 Hz, stereo, fltp (default)
    Metadata:
      HANDLER_NAME    : SoundHandler
      DURATION        : 00:14:59.727000000
Stream mapping:
  Stream #0:0 -> #0:0 (vp9 (native) -> h264 (libx264))
  Stream #0:1 -> #0:1 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
-async is forwarded to lavfi similarly to -af aresample=async=1:min_hard_comp=0.100000:first_pts=0.
[1;36m[libx264 @ 0x55cfa7862800] [0musing SAR=1/1
[1;36m[libx264 @ 0x55cfa7862800] [0musing cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[1;36m[libx264 @ 0x55cfa7862800] [0mprofile High, level 3.1
[1;36m[libx264 @ 0x55cfa7862800] [0m264 - core 152 r2854 e9a5903 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'hinton.mp4':
  Metadata:
    encoder         : Lavf57.83.100
    Stream #0:0(eng): Video: h264 (libx264) (avc1 / 0x31637661), yuv420p, 600x600 [SAR 1:1 DAR 1:1], q=-1--1, 29.97 fps, 30k tbn, 29.97 tbc (default)
    Metadata:
      DURATION        : 00:14:59.665000000
      encoder         : Lavc57.107.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      HANDLER_NAME    : SoundHandler
      DURATION        : 00:14:59.727000000
      encoder         : Lavc57.107.100 aac
frame=  240 fps=2.9 q=-1.0 Lsize=    1301kB time=00:00:08.01 bitrate=1330.6kbits/s speed=0.0971x    
video:1166kB audio:125kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.761764%
[1;36m[libx264 @ 0x55cfa7862800] [0mframe I:1     Avg QP:22.44  size: 28019
[1;36m[libx264 @ 0x55cfa7862800] [0mframe P:62    Avg QP:23.31  size: 12894
[1;36m[libx264 @ 0x55cfa7862800] [0mframe B:177   Avg QP:28.63  size:  2068
[1;36m[libx264 @ 0x55cfa7862800] [0mconsecutive B-frames:  0.8%  1.7%  2.5% 95.0%
[1;36m[libx264 @ 0x55cfa7862800] [0mmb I  I16..4: 12.7% 76.2% 11.1%
[1;36m[libx264 @ 0x55cfa7862800] [0mmb P  I16..4:  1.9%  8.9%  1.1%  P16..4: 35.3% 21.3% 10.8%  0.0%  0.0%    skip:20.7%
[1;36m[libx264 @ 0x55cfa7862800] [0mmb B  I16..4:  0.0%  0.1%  0.0%  B16..8: 39.1%  5.4%  1.0%  direct: 1.4%  skip:52.9%  L0:35.4% L1:48.5% BI:16.2%
[1;36m[libx264 @ 0x55cfa7862800] [0m8x8 transform intra:75.2% inter:77.3%
[1;36m[libx264 @ 0x55cfa7862800] [0mcoded y,uvDC,uvAC intra: 61.9% 52.1% 5.8% inter: 15.2% 6.9% 0.0%
[1;36m[libx264 @ 0x55cfa7862800] [0mi16 v,h,dc,p: 69%  8%  8% 15%
[1;36m[libx264 @ 0x55cfa7862800] [0mi8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 10% 19%  5%  8% 11%  8%  9%  6%
[1;36m[libx264 @ 0x55cfa7862800] [0mi4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23%  8% 11%  5% 12% 21%  7%  9%  4%
[1;36m[libx264 @ 0x55cfa7862800] [0mi8c dc,h,v,p: 53% 20% 19%  8%
[1;36m[libx264 @ 0x55cfa7862800] [0mWeighted P-Frames: Y:21.0% UV:1.6%
[1;36m[libx264 @ 0x55cfa7862800] [0mref P L0: 57.9% 21.2% 14.0%  5.9%  1.1%
[1;36m[libx264 @ 0x55cfa7862800] [0mref B L0: 93.5%  5.3%  1.2%
[1;36m[libx264 @ 0x55cfa7862800] [0mref B L1: 97.4%  2.6%
[1;36m[libx264 @ 0x55cfa7862800] [0mkb/s:1192.28
[1;36m[aac @ 0x55cfa7863700] [0mQavg: 534.430

Another posibility is to use some screen recording tool, or if you need to crop many images at ones use face detector(https://github.com/1adrianb/face-alignment) , see https://github.com/AliaksandrSiarohin/video-preprocessing for preprcessing of VoxCeleb.

source_image = imageio.imread('/content/gdrive/My Drive/first-order-motion-model/09.png')
driving_video = imageio.mimread('hinton.mp4', memtest=False)


#Resize image and video to 256x256

source_image = resize(source_image, (256, 256))[..., :3]
driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]

predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=True,
                             adapt_movement_scale=True)

HTML(display(source_image, driving_video, predictions).to_html5_video())

100%|██████████| 240/240 [00:08<00:00, 29.00it/s]

nauch/first-order-model

Demo for paper "First Order Motion Model for Image Animation"

Running on your data

简介

发行版

贡献者

近期动态