video_scores: Run FER on a YouTube video using a Hugging Face CLIP model

View source: R/video_scores.R

video_scoresR Documentation

Run FER on a YouTube video using a Hugging Face CLIP model

Description

This function retrieves facial expression recognition (FER) scores from a specific number of frames extracted from a YouTube video using a specified Hugging Face CLIP model. It utilizes Python libraries for facial recognition and emotion detection in text, images, and video.

Usage

video_scores(
  video,
  classes,
  nframes = 100,
  face_selection = "largest",
  start = 0,
  end = -1,
  uniform = FALSE,
  ffreq = 15,
  save_video = FALSE,
  save_frames = FALSE,
  save_dir = "temp/",
  video_name = "temp",
  model = "oai-base",
  local_model_path = NULL
)

Arguments

video

The URL of the YouTube video to analyze.

classes

A character vector specifying the classes to analyze.

nframes

The number of frames to analyze in the video. Default is 100.

face_selection

The method for selecting faces in the video. Options are "largest", "left", or "right". Default is "largest".

start

The start time of the video range to analyze. Default is 0.

end

The end time of the video range to analyze. Default is -1 and this means that video won't be cut. If end is a positive number greater than start, the video will be cut from start to end.

uniform

Logical indicating whether to uniformly sample frames from the video. Default is FALSE.

ffreq

The frame frequency for sampling frames from the video. Default is 15.

save_video

Logical indicating whether to save the analyzed video. Default is FALSE.

save_frames

Logical indicating whether to save the analyzed frames. Default is FALSE.

save_dir

The directory to save the analyzed frames. Default is "temp/".

video_name

The name of the analyzed video. Default is "temp".

model

A string specifying the CLIP model to use. Options are:

  • "oai-base": "openai/clip-vit-base-patch32" (default)

  • "oai-large": "openai/clip-vit-large-patch14"

  • "eva-8B": "BAAI/EVA-CLIP-8B-448" (quantized version for reduced memory usage)

  • "jina-v2": "jinaai/jina-clip-v2"

  • Any valid HuggingFace model ID

Note: Using custom HuggingFace model IDs beyond the recommended models is done at your own risk. Large models may cause memory issues or crashes, especially on systems with limited resources. The package has been optimized and tested with the recommended models listed above. Video processing is particularly memory-intensive, so use caution with large custom models.

local_model_path

Optional. Path to a local directory containing a pre-downloaded HuggingFace model. If provided, the model will be loaded from this directory instead of being downloaded from HuggingFace. This is useful for offline usage or for using custom fine-tuned models.

On Linux/Mac, look in ~/.cache/huggingface/hub/ folder for downloaded models. Navigate to the snapshots folder for the relevant model and point to the directory which contains the config.json file. For example: "/home/username/.cache/huggingface/hub/models–cross-encoder–nli-distilroberta-base/snapshots/b5b020e8117e1ddc6a0c7ed0fd22c0e679edf0fa/"

On Windows, the base path is C:\Users\USERNAME\.cache\huggingface\transformers\

Warning: Using very large models from local paths may cause memory issues or crashes depending on your system's resources, especially when processing videos with many frames.

Value

A result object containing the analyzed video scores.

Data Privacy

All processing is done locally with the downloaded model, and your video frames are never sent to any remote server or third-party.

Author(s)

Aleksandar Tomasevic <atomashevic@gmail.com>


transforEmotion documentation built on June 8, 2025, 10:25 a.m.