Sharing and Composing Video Viewing Experience

Akio Takashima; Yuzuru Tanaka

doi:10.5772/7731

Author Information

Show +

Akio Takashima*
- Tokyo University of Technology
Yuzuru Tanaka
- Hokkaido University, Japan

*Address all correspondence to:

1. Introduction

People often take their similar action if they were in a similar situation. Most of these actions are selected based on the individual’s empirical knowledge. In this study, we have focused on such actions; we call it habitual behaviours, encountered during the video viewing process. Video viewing is increasing in popularity for novices and the relationship between videos and the users who watch these videos has been gradually changing. We used to watch TV programs passively (Figure 1. (1. then select videos on demand Figure 2. (2). Now, with the advancement in technology, we can watch videos more actively by skipping commercials, zooming into an important object in a certain video frame, examining a particular scene at various playing speeds, and so on (Figure 1 (3). This notion, which is called “active watching,” allows users to experience videos from numerous viewpoints Takashima04. Many researches on video viewing have aimed toward summarizing the videos such that they can be watched briefly; however, it is just one type of video viewing style. This study focuses on a method to utilize these video viewing styles to share video viewing experience, and compose styles for creating novel viewing experience Figure 1 4).

2. Related Work

This work is illustrated by three research field, information recommendation, video viewing experience, and knowledge media technology.

Information Recommendation

The method for information recommendation is roughly divided into explicit way and implicit way. In the explicit way, a user input her/his preferences to a system directory. On the other hand, in the implicit way, which is also the method employed in this study, a system will recommend what the user wants based on her/his action history that is not related to their preference directory. As for implicit way, several researches have been reported for the web browsing process. Seo et al. described a method for an information filtering agent to understand the users’ preferences by analyzing their web browsing behaviours such as time taken for reading, book marking, scrolling, and so on Seo00. Sakagami et al. developed a system that extracts the user preferences during the reading of

online news by monitoring ordinary user operations such as scrolling and enlarging articles in an Internet browser Sakagami97. In the field of video viewing, a few studies have been reported. Yu et al. proposed an algorithm named ShotRank, which is similar to the PageRank system developed by Google Brin98; ShotRank is used to measure the interestingness and importance of segmented video scenes by using the data of how many times the users selected and watched each video shot Yu03Mahmood et al. modeled the users’ viewing behaviours using the hidden Markov model (HMM) and developed a system that generates video previews without any prior knowledge of the video content Mahmood01. Although both these studies employ the implicit method, they investigate only the summarizing of a particular video; no study has attempted to combine the user preferences encountered during the video viewing process.

In this study, we employ the user’s behaviour analysis (and not semantic content analysis) and the implicit method for extracting the user’s preferences (and not the explicit method); we then assist the users in reusing their habitual behaviours and combine them.

Video Viewing Experience

Many researches on video viewing have aimed toward summarizing the videos such that they can be watched briefly. However, the way to watch videos is gradually changing: summarization is just one type of video viewing style. The manner in which people interact with videos during their everyday lives involves complex knowledge-construction processes and not simple naïve information-receiving processes. Further, we have a large number of opportunities to use videos in increasing our knowledge, such as monitoring events, reflection on physical performances, learning subject matter, or analyzing scientific experimental phenomena. In such ill-defined situations, the domain knowledge about such contents is insufficient; hence, the users interact with videos according to their viewing styles Yamamoto05. However, such type of tacit knowledge, which is acquired through user experiences Polany99, has not been effectively managed.

Figure 2.
Video viewing style as meme medium.

Knowledge Media Technology

Many studies have been reported in the area of knowledge management systems Alavi99.As media for editing, distributing, and managing knowledge (called knowledge media), Tanaka introduced the meme media architecture and framework for reusing and combining such knowledge media by means of direct manipulation Tanaka03In this framework, however, the target objects for reusing or sharing have been limited to the resources that are easily describable such as functions used in software Fujima07or services provided by web applications Sugibuchi04. In this study, this approach has been extended to be more user-friendly, which considers indescribable resources such as know-how or skills of human behaviour (i.e., tacit knowledge).

In this study, we consider video viewing styles as a type of knowledge media. Video viewing styles, which are considered as habitual behaviours in video viewing, are used to externalize one’s viewing skills or know-how of video viewing; it allows the users to experience videos in various viewing styles. “To experience videos in various viewing styles” stands for to watch video which are automatically played based on the viewing skill Figure 2. In this case, experience is re-produced by watching automatically playing videos.

3. Reproducing Video Viewing Experience

3.1. Extracting video viewing style

To allow users to experience videos through video viewing styles, our approach employs extraction of relationship between video and human. From this point of view, we assume the following characteristics during video viewing:

- People often browse through videos in consistent and specific patterns

- User interaction with videos can be associated with the low-level features of such videos

While the user’s manipulation of a particular video depends on the meaning of the content and the thought process of the user, it is difficult to observe these aspects. In this study, we attempted to estimate these associations between the video features and user manipulations. The low-level features (e.g., color distribution, optical flow, and sound level) have been associated with user manipulations, which reveal the changing speeds (e.g., fast-forwarding, rewinding, and slow playing). The identification of the associations from these aspects, which can be easily observed, implies that the user can possess a video viewing style even without the domain knowledge of the video content.

3.2. Scenario

In this section, we illustrate several examples for reusing and composing various video viewing styles.

3.2.1. Scenarios for reusing viewing style

Once a particular video viewing style is extracted, this video viewing experience can be reproduced. Here, we describe the reusing of scenarios that utilize the user’s video viewing style as well as those of others.

Reusing the user’s video viewing style

By using the user’s video viewing style, the user can experience unknown videos through personalized efficient playback.

If the user is the coach of a particular football team, the video viewing style of the coach for the videos of football games may be distinguishable from the styles of novices. When a coach analyzes a particular game video of an opponent team, the coach may try to determine the weaknesses of the team by judging the positioning of the players or the formation shift of the entire team. In order to achieve this, if the coach repeatedly watches zoomed-out scenes, this habitual behaviour will be included in the video viewing style. For a diligent football player, the video viewing style might include behaviours that explore scenes, including tricky techniques, at a slower speed. On the other hand, novices tend to skip to scenes that excite them, such as goal scenes, to save time. This is also a type of video viewing style.

Several sports (e.g., American football, golf, Japanese sumo wrestling, and so on) include frequent interval scenes (out-of-play scenes). Skipping such scenes is a practical video viewing style that ensures efficiency in time spending.

For checking the weather report just before leaving one’s home, weather forecasts from other areas might be a waste of time. If the weather report for a particular town is frequently watched (and the other areas are skipped), the weather information related to that particular town is included in the video viewing style.

Reusing others’ video viewing styles

In addition to reusing one’s own video viewing style, videos can be experienced through various types of video viewing styles produced by others.

The video viewing style of a particular famous head coach of a football team may have some special characteristics and a film reviewer may have a different video viewing style. By using such types of specialized video viewing styles, a particular video can be experienced in an unusual manner, leading to some serendipitous findings about the video.

3.2.2. Scenarios for composing viewing styles

One of the features of this study is the introduction of the notion of the composition of video viewing styles. These compositions will be utilized for experiencing both general and personal video viewing.

Composition for experiencing general video viewing

As described in the section on reusing scenarios, novices may tend to skip to the scenes that excite them, such as goal scenes, to save time. If most people skipped the same scenes based on their own respective video viewing style, these habitual skipping behaviours can be composed (i.e., pick up the same behaviours) and the scenes that are generally assumed to be attractive can be experienced. That is, making a digest video or summarizing the video is one of the practical applications of the composition of video viewing styles. Further, this composition can achieve social filtering without any domain knowledge about the video content.

Similarly, people can re-edit some lengthy home video with the recording of a playing child. After the video has been played by skipping unwanted scenes, the video would transform into a digest version of the original video. If the video is viewed by many people, another viewer will find the clue to view a video effectively. This resembles a book that has been read repeatedly and has dog’s-ears, finger marks, and stains left by the readers.

Composition for comparing personal video viewing experiences

One practical application obtained by comparing a particular video viewing experience with others is collaborative filtering. People with similar video viewing styles can experience video viewing through the others’ video viewing styles, which may lead to a serendipitous way to watch the video. Collaborative filtering indicates, for instance, that “the viewers who have replayed this scene also replayed another particular scene,” which is similar to Amazon.com’s recommendation system in which “customers who bought this item also bought...” is mentioned.

Further, the user’s video viewing style can be compared with a particular person’s style, such as a famous head coach of a football team or a film reviewer, for learning their strategies in video viewing. These compositions help in the sophistication of the user’s video viewing style.

4. System

In this section, we describe a system that extracts the user’s video viewing style and allows the users to reuse and compose these styles.

4.1. System overview

To extract the associations between users’ manipulations and low-level video features and then reproduce the viewing styles for other videos, we have developed a system called the video viewing experience reproducer (VVER). The VVER consists of the “association extractor” and “behaviour applier” blocks Figure 3.

The association extractor block identifies the relationships between the low-level features of videos and the user manipulation of these videos. This block requires several training videos and viewing logs of a particular user as input for these videos. In order to record the viewing logs, a user views the training videos using a simple video browser, which enables the user to control the playing speed. We categorized the patterns of changing the playing speeds into three types based on the patterns frequently used during an informal user observation Takashima04]. The three types are (1) skip, (2) re-examine (rewind and then play at a speed less than the normal speed), and (3) others. The viewing logs possess pairs of the video frame numbers and the pattern for changing playing speed at which the user actually played the video. As low-level features, the system analyzes more than eighty properties of each frame. These features are categorized into five aspects, (1) statistical data of colour values in a frame, (2) representative colour data, (3) optical flow data, (4) number

Figure 3.
Overview of the Video Viewing Experience Reproducer on reusing a user’s video viewing style.

of moving objects, and (5) sound levels. Then, the association extractor generates a classifier that determines the patterns that associated with low-level features. In generating the classifier, we use the WEKA engine—a data mining software WEKA.

The behaviour applier block creates a play schedule which plays the frames of the target video automatically at each speed in accordance with the pattern for changing playing speed that is produced by the classifier. We designed the mapping from the three types of patterns into specific speeds. The skipping behaviour is reproduced by playing at a faster speed (5.0x). The other behaviour is reproduced by playing at normal speed. The re-examining behaviour is reproduced by playing at a slower speed (0.5x). This play schedule possesses the pairs of the video frame number and estimated speed for the frame. The behaviour applier can remove the outliers from the sequence of frames, which should be played at the same speed, and it can visualize all the behaviours applied to each frame of the video. In addition, the behaviour applier allows the users to compose several play schedules.As described before, user manipulation was associated with the video features; in other words, the video viewing style was decomposed into rules. Then, each rule was composed so that other video viewing styles could be created. The timing when a particular composition is executed is after the generation of the play schedules Figure 4. As described earlier, a play schedule is a list of associations between each video frame in a video and the user’s behaviour. A single play schedule is generated by a classifier of the behaviour applier; therefore, for Figure 4, two classifiers are formulated in order to generate two play schedules.

One play schedule is generated by one classifier of the behaviour applier; therefore, in this case, two classifiers are formulated in order to generate two play schedules. A composition is executed after the generation of the play schedules for composition purposes.

Figure 4.
Overview of the Video Viewing Experience Reproducer on composing two users’ video viewing styles.

To compose the video viewing styles, the system composes the play schedules via several operations. We defined a few simple operations, such as intersection A ∩ B : = { x | x ∈ A a n d x ∈ B } , difference A \ B : = { x | x ∈ A a n d x ∉ B } , and union A ∪ B : = { x | x ∈ A o r x ∈ B } , where A and B are sets of video frames that are associated with specific behaviours such as fast-forwarding or re-examining and x is a specific video frame.

Some examples made by using these operations are as follows:

S U S E R 1 ∩ S U S E R 2 ∩ S Y O U E1

( S U S E R 1 ∩ S U S E R 2 ) \ S Y O U E2

( S U S E R 1 ∩ S U S E R 2 ) ∪ S Y O U E3

Figure 5 illustrates these examples. The upper three belts in this figure indicate the estimated behavior of the video through the VVER based on the video viewing styles of three persons. For instance, the first belt shows that User1 may initially browse through the video at normal speed and later skip (fast-forward) the second part, re-examine the subsequent scenes, skip the fourth part, and then normally browse through the last part. The second and third belts can be described in a similar manner. The lower three belts correspond to three examples of composed behaviour (i.e., composed play schedules). The details are as follows:

Figure 5.
Examples of estimated viewing behaviours and their composition.

Ex.1 describes the intersection of the three video viewing styles. In this case, the system estimates that these three persons will skip the earlier scenes. This operation detects meaningful manipulations for all the users. In other words, the operation functions as a social filtering system if the number of users is sufficiently large.

Ex.2 shows the difference between S_YOU and the intersection of S_USER1 and S_USER2. This operation determines the habitual behaviour of other users that do not tend to be selected by a particular user. This operation can be regarded as an active help system Fischer85

Ex.3 describes the union of S_YOU and the intersection of S_USER1 and S_USER2. The user can experience his/her habitual behaviour while taking into account the habitual behaviours of other users. (It should be noted that this union operation needs to determine the priority in order to avoid conflicts between the behaviours.)

These examples show that the simple compositions of the associations between a video frame and the viewing behaviour can create other meaningful viewing styles.

5. User study

We conducted two user studies concerning the reusing and composing the video viewing styles.

5.1. Setting

In both studies, we used three types of video content, broadcasted football game, broadcasted Japanese sumo wrestling match, and recording from a surveillance camera installed at the entrance of a building. Five 5-min videos for each type of video content were used as training videos for training each classifier respectively. And three 5-min videos were used as target video for applying the viewing style and automatic playing. The target videos are not included in the training videos. Four subjects were being observed. Each subject is a typical computer user.

	Football	Sumo	Surveillance Camera
SubA	OP: skipped RP: played at normal IP (center area): skipped IP (shoot scene): Re-examined	OP: skipped RP: played at normal	OP(person not exist): skipped IP(person exist): played at normal speed
SubB	OP: skipped RP: skipped (low frequency)	OP: skipped RP: played at normal	OP(person not exist): skipped IP(person exist): played at normal speed
SubC	OP: skipped RP: played at normal IP (center area): skipped IP (goal scene): Re-examined IP (faul scene): Re-examined	OP: skipped RP: played at normal	OP(person not exist): skipped IP(person exist): played at normal speed, or Re-examined
SubD	OP: skipped RP: played at normal IP (goal scene): Re-examined (low frequency)	OP: skipped RP: played at normal OP (commentator): played at normal	OP(person not exist): skipped IP(person exist): played at normal speed, or Re-examined

Table 1.

OP: Out of play scene, IP: In-play scene, RP: Replay scene The video viewing style of each user

5.2. Generating classifiers

In order to generate each subject’s classifiers for each type of video content respectively, at first, subjects were asked to explore the five training videos and find interesting scenes. Table 1. shows the characteristics of subjects’ video viewing style. “In-play scene” indicates the scenes that players are moving and the game is active for sports video. As for surveillance video, this means the scenes that include person in the frame. “Out of play scene” indicates the opposite. In viewing sports video, all subjects tended to skip the out-of-play scenes and play replay scenes at a normal speed. In surveillance video, all subjects tended to skip the frames that have no person in the frames.

VVER generated their classifiers by using their video viewing logs and low-level features of the videos.

5.3. Reusing video viewing styles

The first study involved observing the users’ impression when they reuse their video viewing styles.

A few days after from the day that the subjects viewed the training videos, they viewed the target videos which are played automatically based on their own classifier (i.e. video viewing style) through the VVER. The result of interview asking them their impression shows that all subjects recognize the automatic plays were similar to their own viewing style.

In addition this interview, we evaluated the performance of the VVER quantitatively. This evaluation investigated how the automatic play result through the VVER resembles to the result produced by human using “intentional viewing style”. An intentional viewing style here means the style that consists of some predefined viewing rules, for instance, skipping out of play scene of football game and re-examining goal scenes and so on. In order to

		subA	subB	subC	subD	VVER
Football	video01	0.397	0.519	0.050	0.057	0.119
	video02	0.332	0.445	0.005	0.032	0.116
	video03	0.154	0.482	0.023	0.050	0.200
	Avr.	0.294	0.482	0.026	0.046	0.145
Sumo	video01	0.033	0.028	0.020	0.036	0.240
	video02	0.026	0.022	0.026	0.042	0.172
	video03	0.037	0.061	0.029	0.030	0.189
	Avr.	0.032	0.037	0.025	0.036	0.200
Surveillance camera	video01	0.045	0.151	0.055	0.073	0.103
	video02	0.012	0.018	0.018	0.109	0.023
	video03	0.025	0.031	0.030	0.033	0.020
	Avr.	0.027	0.067	0.034	0.072	0.049

Table 2.

The gaps from AnswerSet to ClassifierSet (VVER) and HumanSet (SubA,B,C,D)

generate an intentional classifier, we first decided the predefined rules, and then check the whole frames of target videos manually, then created viewing logs. The VVER generated an intentional classifier by using the viewing logs. We call a play schedule for a target video that created with the intentional classifier ClassifierSet. As the same manner, we constrained subjects to view training video with the intentional viewing style. The VVER generated an intentional classifier based on a subject’s viewing logs. A play schedule for the target video that created with the intentional classifier called HumanSet. To compare the similarity of ClassifierSet and HumanSet, we defined AnswerSet which is a play schedule that produced manually by checking whole frame of a target video. Ideally, both ClassifierSet and HumanSet should be same as AnswerSet. To calculate the gap between AnswerSet and the other sets, we introduce specific value for user manipulations; Re-examine=1, Normal=0, Skip=-1. The gap from AnswerSet to ClassifierSet and HumanSet is defined by following expressions respectively as error where A, C, and H indicate AnswerSet, ClassifierSet, and HumanSet. n indicates the number of target video, and m is the number of frame that each video possesses.

e r r o r c l a s s i f i e r = 1 n m ∑ i = 1 n ∑ j = 1 m ( A i j − C i j ) 2 1 2 e r r o r h u m a n = 1 n m ∑ i = 1 n ∑ j = 1 m ( A i j − H i j ) 2 E4

Table 2 shows the evaluation result which is calculated with three target videos for three type of video content. When a number is 0, there is no gap between AnswerSet and others. As the result on Table 2, the gap between A and C is almost same level as the one between A and H. This result brings us to the fact that when a user has a consistent viewing style, the VVER can reproduce viewing experiences as similar as the user does. In addition this, human views videos with understanding their content, however, the VVER reproduce such viewing experience only with the low-level video features (i.e. without the domain knowledge of the video content).

5.4. Composing video viewing styles

The second study involves the composition of these several viewing styles by the users. The purpose of this study is to investigate what kind of impression the composition gave to users. We interviewed the four subjects about their impression after viewing automatically playing nine target videos based on three compositions described below.

S S u b A ∩ S S u b B ∩ S S u b C ∩ S S u b D E5

( S S u b A ∩ S S u b B ∩ S S u b C ) \ S S u b D E6

S S u b A ∪ S S u b B ∪ S S u b C ∪ S S u b D E7

Subscripts in cmp.2 assumes the user is SubD. The four subjects and nine target videos are the same as the study for reusing.

If the predicted automatic playing speed is the common for all users, Cmp.1 plays the frame at the speed. If not, Cmp.1 plays the frame at the normal speed. As shown in Table 1, the skipping behaviour in viewing sports videos were common. As a result, all subject recognize the automatic play through cmp.1 is almost the same as their own viewing style.

Cmp.2 produce an viewing experience that is generally common but not for SubD. SubB who does not re-examine video usually were interested in the scene played at a slower speed because other three subjects did. On the other hand, all subjects complained about not skipping the scenes that they wanted to skip. It is because, in the current setting, cmp.2 plays frames at a normal speed if all predicted speed were the same.

If even one subject take abnormal behaviour, Cmp.3 tries to reproduce the behaviour. Hence, this composition plays the videos with varied speed. As a result, SubB and SubC said that the football game videos were played like digest versions of the games. SubA found that there is characteristic walking of one person and gazed it.

6. Discussion / Future work

6.1. Reusing video viewing styles

Through the user study for reusing video viewing style, we found that the VVER allow users to experience the target videos in the same viewing style of them. In order to satisfy users, it is necessary for a genre of a video to be specified, and behaviours of a user must be consistent. As an additional user study, we applied classifiers to the target videos which are not the same genre of the training videos. The video viewing style that is produced with football game videos influenced to the target videos of sumo and vice versa. However they were not produce meaningful experience to users. Moreover, the video viewing style with surveillance camera skipped almost whole frames of the target videos of the other two sports game. In order to let a classifier learn user’s viewing style, it might be necessary for the genre of videos to be specified by using meta-data information etc.

6.2. Composing video viewing styles

Through the user study for composing video viewing style, several subjects had unexpected experiences serendipitously. On the other hand, especially in the result through cmp.2, all subjects had negative impression when the scene did not played at the speed they wanted. It is not so easy to understand user’s desire, however, a mechanism that allow users to understand the situation is needed. The visuallization of predicted speed for the video might be one solution.

7. Conclusion

In this chapter, we describe the notion of reusing the video viewing style, and present examples of composition using these styles to create new video viewing styles.

Contrary to researches that employ content-based domain knowledge, little has been reported about the composition of tacit knowledge such as video viewing styles in knowledge studies. It is well known that video data essentially have temporal aspects that might make users view the videos passively than other media such as text or images. On the other hand, the fact that we have increasingly more opportunities to use videos for our knowledge might facilitate the active browsing of videos. We believe that novices might be able to operate videos more freely and develop their own video viewing styles. To support these users, we need to clarify not only the semantic understanding of the video content but also the habitual behaviour of each user.

It is suggested that the social navigation technique for supporting the user’s activity by using past information is useful Dieberger00.The contributions of our study are the provision of not only the notion of reusing past information but also the creation of new video viewing styles. In this study, we present three types of composition manipulations; composition manipulations have a greater possibility of generating new and meaningful video viewing styles. Refining composition manipulations can be considered as a future work.

Sharing and Composing Video Viewing Experience

Human-Computer Interaction

Author Information

Akio Takashima*

Yuzuru Tanaka

1. Introduction