Yahoo Web Search

Search results

  1. Code for paper Unsupervised Learning of Video Representations using LSTMs by Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov; ICML 2015. We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences.

  2. We experiment with two kinds of input sequences - patches of image pixels and high-level representations ("percepts") of video frames extracted using a pretrained convolutional net. We explore different design choices such as whether the decoder LSTMs should condition on the generated output.

  3. May 4, 2015 · In this paper, we present a simple yet surprisingly powerful approach for unsupervised learning of CNN. Specifically, we use hundreds of thousands of unlabeled videos from the web to learn visual representations. Our key idea is that visual tracking provides the supervision.

    • Xiaolong Wang, Abhinav Gupta
    • 2015
  4. Abstract. Unsupervised multi-object segmentation has shown impressive results on images by utilizing powerful semantics learned from self-supervised pretraining. An additional modality such as depth or motion is often used to facilitate the segmentation in video sequences.

  5. Unsupervised Learning of Visual Representations using Videos. Is strong supervision necessary for learning a good visual representation? Do we really need millions of semantically-labeled images to train a Convolutional Neural Network (CNN)? In this paper, we present a simple yet surprisingly powerful approach for unsupervised learning of CNN.

  6. Jan 28, 2023 · Our unsupervised model treats video summarization as a sequence of decision problems. Given an input video, the probability that a video frame is selected as a part of the summary is obtained through a non-local convolutional network, and a strategy gradient algorithm of reinforcement learning is adopted for optimization in the training phase.

  7. People also ask

  8. Unsupervised video summarization approaches overcome the need for ground-truth data (whose production requires time-demanding and laborious manual annotation procedures), based on learning mechanisms that require only an adequately large collection of original videos for their training.

  1. People also search for