doi:10.1016/j.eswa.2007.08.056


Available online at www.sciencedirect.com
www.elsevier.com/locate/eswa

Expert Systems with Applications 35 (2008) 1444–1450

Expert Systems
with Applications
Constructing and application of multimedia TV-news archives q

H.T. Pao a,*, Y.H. Chen b, P.S. Lai b, Y.Y. Xu b, Hsin-Chia Fu b

a
Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan, ROC
b

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, ROC
Abstract

This paper addresses an integrated information mining techniques for broadcasting TV-news. This utilizes technique from the fields of
acoustic, image, and video analysis, for information on news story title, newsman and scene identification. The goal is to construct a
compact yet meaningful abstraction of broadcast TV-news, allowing users to browse through large amounts of data in a non-linear fash-
ion with flexibility and efficiency. By adding acoustic analysis, a news program can be partitioned into news and commercial clips, with
90% accuracy on a data set of 400 h TV-news recorded off the air from July 2005 to August 2006. By applying speaker identification and/
or image detection techniques, each news stories can be segmented with a better accuracy of 95.92%. On-screen captions or subtitles are
recognized by OCR techniques to produce the text title of each news stories. The extracted title words can be used to link or to navigate
more related news contents on the WWW. In cooperation with facial and scene analysis and recognition techniques, OCR results can
provide users with multimodal query on specific news stories. Some experimental results are presented and discussed for the system reli-
ability, performance evaluation and comparison.
� 2007 Published by Elsevier Ltd.

Keywords: TV-news archives; Multimedia; Information mining; Multimodal query; Video OCR
1. Introduction

Among the major sources of news program, TV has
clearly had the dominant influence atleast since the 1960s.
Yet it is easy to find the old newspaper in microfilm in
any public library, but it is impossible to find the old foot-
age of television news in the same library. TV-news archive
has existed in the United States for 35 years. Paul C. Simp-
son founded the Vanderbilt University Television News
archive in 1968. In Huffman, Yang, Yan, and Sanders
(1990), a team in the University of Missouri-Columbia
decided to do a content analysis of the three US network
coverage of the 1989 Tiananmen Massacre, they located
these news items in the Vanderbilt Archive Index Vander-
bilt television. The Vanderbilt archive promptly provided
the 11 h-video clips all related to the Tiananmen Massacre.
0957-4174/$ - see front matter � 2007 Published by Elsevier Ltd.
doi:10.1016/j.eswa.2007.08.056

q This research was supported in part by the National Science Council
under Grant NSC 94-2213-E009-139.

* Corresponding author.
E-mail address: htpao@cc.nctu.edu.tw (H.T. Pao).
At the same time, the Missourian team also planned to do a
comparable study of Taiwanese reportage on Tiananmen
Massacre. But the equivalent material of the Vanderbilt
archive did not exist in Taiwan then. Therefore, that study
only contained the US perspective of the Tiananmen Mas-
sacre. This paper proposes an integrated methodology for
the information mining on a multimedia TV-news archive
in Taiwan. As described in Xu, Chen, Tseng, Lai, and Fu
(2004) Lai, Lai, Tseng, Chen, and Fu (2004), a fully auto-
mated Web-Based TV-news system were implemented to
achieve the following goals:

1. Academic and applied aspects: This archive will greatly
improve the quality of TV-news. As Dan Rather, the
CBS anchorman, once mentioned that he lives with
two burdens – the ratings and the Vanderbilt Television
News Archive Therefore, once the archive is there, the
researchers and the public will do some content analysis
on the TV-news. And the journalists will be more careful
in what they report.

mailto:htpao@cc.nctu.edu.tw


Fig. 1. The overall architecture and information processing flow of the
proposed fully automated web-based TV-news system.

Fig. 2. The flow diagram of TV news acquisition and content
segmentation.

H.T. Pao et al. / Expert Systems with Applications 35 (2008) 1444–1450 1445
2. Timing factor: Vanderbilt archive started its project with
Betacam videotapes in 1968. There will be a problem of
preservation because these tapes deteriorate along the
years. Today, we can save all the TV-news in hard disc,
VCD or DVD.

Informedia (http://www.informedia.cs.cmu.edu/) is an
integrated project launched in Carnegie Mellon university.
Its overall goal is to use modern AI techniques to archive
video and film media. VACE-II Informedia (http://
www.informedia.cs.cmu.edu/arda/vaceii.htm), a sub-pro-
ject of Informedia, automatically detects, extracts, and
edits highly interested people, patterns, and story evolves
and trends in visual content from news video.

This paper proposes a integrated information mining
technique that can automatically generate semantic labels
from news video Daniel and Daniel (2002), and statistical
methods to discover hidden information. We intend to
expect that the following significance will come to exist.

• Although web-news provides another efficient way to
access news, watching TV-news already became habit
of many people. Beside this, most of web-news system
can only provide text-based news.

• There are so many channels providing TV-news. People
need more information for searching like-minded channel.

• Although almost every channel announced that they are
dispassion, real dispassion is hard to archive with
human editing. We need some evaluation to check if
the channel is really dispassion.

The rest of this paper is organized as follows. Section 2
introduces the overall concepts of the multimedia TV-news
archive. In Section 3, methods of generating necessary
semantic labels from the recording TV-news video are pre-
sented. Section 4 focus on describing the information min-
ing from these semantic labels. Finally, summary and
concluding remarks are given in Section 5.

2. TV-news archive

A fully automated Web-based TV-news system Lai, Lai,
Tseng, Chen, and Fu (2004); Xu et al. (2004) consists of
three modules: (1) TV-news video acquisition and content
segmentation, (2) news content analysis, and (3) user inter-
face for news query, search and retrieval. Fig. 1 depicts the
overall architecture and interaction of these three modules.
The major tasks of the acquisition module are to record
TV-news programs in a proper video format, and to fetch
related news text contents from Internet webs. Content
analysis module segments the recorded news video into
story based units, and extracts news title and keywords
from each story unit. Providing a friendly querying and
browsing environment for retrieving interested news stories
is the most important task of the user interface module.

An overall news video processing and content analyzing
are depicted in Fig. 2. At the beginning, a TV-news pro-
gram is captured and encoded into stream video format
(Iain & Richardson, 2003; Wang, Ostermann, & Zhang,
2002). The recorded streaming video is named with dates
first, and then stored in database. In the meantime, a shot
detector is used to segment the streaming video into scene
based shots for news unit generation and key-frame extrac-
tion (Lee, Yoo, & Jang, 2006; Patel & Sethi, 1997). Within
a scene shot, speaker identification techniques are then
applied to detect anchor frames (Cheng, Wang, & Fu,
2004). The close-captions in the anchor frames are then
extracted and recognized by using video OCR techniques
(Lin, Liu, & Chen, 2001) as candidates for the news title
and keywords of each news units. The extracted keywords
can then be used to match with (1) Internet news stories to
construct links between TV-news stories and Internet news,

http://www.informedia.cs.cmu.edu/
http://www.informedia.cs.cmu.edu/arda/vaceii.htm
http://www.informedia.cs.cmu.edu/arda/vaceii.htm


Detect Commercial

A A A A

WA A A A

WA A A ACC

WA A A ACC

Story1 Story2 Story3
A:anchor   C:commercial   W:weathercast

Detect Anchor Video Clip

Detect Weather Report Shot

News Program

Extract News Stories

1446 H.T. Pao et al. / Expert Systems with Applications 35 (2008) 1444–1450
and (2) the users’ query words for retrieving interested and/
or related news stories. In addition, the extracted charac-
ters by video OCR from each news units can also be used
as semantic labels of the news units. More description on
the semantic labels will be presented in Section 3.3.

3. News information tree generation

The most important things in news story writing are that
journalists commonly refer to as the 5 W’s: who, what,
when, where and why. These questions are crucial for
catching a reader’s attention and introducing the essential
facts of the story. Standing on this basic rules, the news
archive system introduced in Section 2, are further
improved to extract more information from a recorded
news video. A news information tree (Feinstein & Morris,
1988) is suggested to structure the contents of recorded
video clips for helping the user focus on specific news infor-
mation, and information that is a little more general.

3.1. News information tree

Fig. 3 illustrates a hierarchical structure of a news infor-
mation tree. The hierarchical tree contains five types of
video information records: (1) date (when), (2) channel
(where), (3) title (what), (4) content (how), and (5) commer-
cial. The title record contains the starting time, length, and
brief description of the corresponding video clips. The con-
tent record can also be further divided into the following
sub-records: (a) on-site locations, (b) interview, and (c)
tables or quoted word.

3.2. Analysis units

Usually, a TV-news program contains the following
items: news stories, commercials and weather reports.
Fig. 3. The data structure of a news information tree.
Complete description of shot detection and scene segmen-
tation can be found in Huang, Lai, and Fu (2004). A flow
chart of TV-news program segmentation is shown in Fig. 4,
and brief introduction are described as follows. Among
various scene shots, anchor video clips are detected first
(Kim, Kim, Ra, & Choi, 1999).

In general, anchor segments are the most appeared
video clips (Saracoglu, Tutuncu, & Allahverdi, 2007), thus
we propose to use BIC (Fraley & Raftery, 1998), an unsu-
pervised method, to cluster anchor segments from the other
clips. As shown in Fig. 5, this method contains the follow-
ing procedures:

1. The MFCC audio feature sequence X is generated from
input audio at first.

2. BIC segments X into segments X1,X2,. . .Xn.
3. These segments then are clustered as several clusters

C1,C2,. . .,C3.
4. The cluster containing most clip segments is the set of

anchor clip.
Fig. 4. The flow diagram of the proposed news story analysis and
information extraction processes.

Fig. 5. The flow diagram of a BIC-based audio segmentation method.


H.T. Pao et al. / Expert Systems with Applications 35 (2008) 1444–1450 1447
After locating each anchor shot, a SVM model based
video classifier (Sun, Tseng, Chen, Chuang, & Fu, 2004)
is used to detect weather report shots. Finally, commercials
are detected and separated from on-site news stories. The
following feature detecting techniques are integrated to
achieve a high performance commercial detector:

• The variation rate of zero crossing rate.
• Short time energy.
• Shot change rate.
• Clip length.

At this stage, anchor’s briefing, weather reports, commer-
cial, and the background stories are all separated and iden-
tified. For each on-site stories, as shown in Fig. 6, we
further segment and classify each on-site scene into three
categories: locations, interview and tables or quoted words
(what).

Fig. 7 shows how to partition the on-site news story into
(1) location, interview, and tables or quoted words scenes.
In general, on-site narration is not active during the inter-
view scene, as shown in Fig. 6, the interview scene can be
distinguished from location scene. By using its special char-
Newshawk

Scene Clip The Speaker

Newshawk

Newshawk

B
ac

kg
ro

un
d 

S
to

ry

Locality

Interview

Locality

Data Chart

Locality

Newshawk

Interviewee

Fig. 6. The general structure of a news story. On-site scene story contains
three major news contents: locations, interview and tables or quoted
words.

:locality

News Story Newshawk Voice Model

Mark the Rest of Newshawk Periods as Locality

Mark Data Chart from Newshawk’s Period

Mark the Rest as Interview

Mark Newshawk Speaking Periods N N

N NI

N NI
C

L I L C

L L
IN N

C

N
I

:newshawk
:interview

C
L

:data chart

Fig. 7. On-site scene segmentation flow. The narration periods can be
detected by using speaker identification techniques to distinguish narra-
tor’s speech voice from the rest of scenes. Detecting the screen characters
regions can find the tables or quoted words scenes. Then, the rest of scenes
that are not belonging to interview or tables or quoted words scenes must
be the location scenes.
acteristics of the character regions, tables or quoted words
scene can be distinguished from location scenes.

3.3. Semantic labels of units

This section describes how to assign each segmented unit
with semantic labels. Basically, the text words for each
label are extracting from text streams in close-caption.

Usually, a TV-news program often provides audience a
quick overview of each news story in on-screen captions,
such as names of location, people, and keywords of events,
. . . etc. In general, these texts are quiet enough to give
enough information for labeling each segmented units.

Fig. 8 shows how to establish a news information tree.
The information tree establishing process contains two
phases: story and scene phases.

In story phase, all on-screen characters are recognized
by video OCR first. Then, the recognized characters or
words are used to match with text-based news documents,
which are usually retrieved from Internet. The title and
contents of best matched text-news document not only fill
out the story information record of news information tree,
but also used to picking label candidates, including loca-
tions, people names, event words, quoted phrases, and tab-
ular data, up for scene phase processing.
Fig. 8. Information flow of the generation of a news information tree. For
each story, video OCR (optical character recognition) technique is applied
to extract characters in the close-caption. The extracted characters will
then be used to match with the retrieved news document over the Internet
web sites. From the matched documents, key information, such as
associated people, event location, reporters’ names etc., can be retrieved
accordingly. Finally, video clips associated with location, interview, tables
and quoted words scene can then be labeled according to the extracted
keywords.


1448 H.T. Pao et al. / Expert Systems with Applications 35 (2008) 1444–1450
In scene phase, picking semantic labels up from label
candidates for each scene is done in this phase. As shown
as Fig. 9, the on-screen captions of locality scene provide
location name and event descriptions. Therefore, in locality
scene, the location and event words of label candidates are
searched from on-screen captions to find which ones
exactly appeared.

Fig. 10 is an example of interview scene. In interview
scene, the interviewee’s name and their points are always
given by on-screen captions. Therefore, in interview scene,
we search people name, and events word of label candi-
dates instead.

The example of data chart scene is shown as Fig. 11. In
general, on-screen captions fill data chart scene. These cap-
tions may present quoted sentence or tabular data. There-
Fig. 10. An example of interview scene frame. An interview scene is used
to present a reporter’s point of view. Normally, the reporter’s name and
opinions can be extracted from subtitles or closed captioning.

Fig. 11. An example of data chart scene frame. The data chart scene is
used to present information in a organized manner. Additional informa-
tion is also available from the on-screen characters.

Fig. 9. An example of locality scene frame. The locality scene is used to
show where and what news occurred. Thus, the location information and
event description can be retrieved from the close-captions of a locality
scene.
fore, searching for quoted sentence or tabular data in data
chart scene is the major task.

4. Data mining on the news information tree

This section presents how and what to mine from a news
information tree (NIT). In Section 4.1, we propose to mine
the favored or preferred news contents of a TV-station.
The news information tree can also be used to track the
evolution of a series of news stories (see Section 4.2). In
addition, the mining results from the NIT and the realtime
ratings can be combined to provide TV-news commercial
buyers a very useful guidance.
4.1. Mine the news preference of a TV-Station

Generally speaking, a TV-station arranges the broad-
casting sequence of each story in a news program according
to their impact and attractiveness to audience. In fact, a
preferred news story often gets more time on the air. By
analyzing the sequence order and the length of stories,
the preferred or the favored news stories of a TV-station
can be roughly estimated or judged. Mining the NIT to
extract favored or preferred types of news story from a
TV-station will help audience to find the favored news
channel.

The proposed news mining method is described as fol-
lows: Given N sets of keywords, K1,K2, . . ., Ki, . . ., KN,
which correspond to N news topics (or subjects), let the fol-
lowing delta function d(k,Ki) define the relations between a
keyword k and a keyword set Ki:

dðk; K iÞ¼
1; if keyword k 2 K i
0; otherwise:

�

1. Extract keywords {klsj ; l ¼ 1; � � � ; Lj} from a scene unit sj

in a news program.


 16
"elect"

H.T. Pao et al. / Expert Systems with Applications 35 (2008) 1444–1450 1449
2. For each scene units sj, compute its association fre-
quency F(Kijsj) with respect to a subject Ki,
 0

 500

 1000

 1500

 2000

 2500

 3000

 3500

 4000

 0

Fig. 12
appear

 12

 14
FðK ijsjÞ¼
PLj

l¼1dðk
l
sj
; K iÞPN

i¼1
PLj

l¼1dðk
l
sj
; K iÞ
 0

 2

 4

 6

 8

 10

 0  20  40  60  80  100  120  140  160

Fig. 13. The life-cycle of a specific news events along with a period of
time.
3. Compute the the association frequency of news program
Fd(tjKi) at time t and day d:

F dðt; K iÞ¼

FðK i; s1Þ; for s1:start 6 t 6 s1:end
FðK i; s2Þ; for s2:start 6 t 6 s2:end
..
. ..

.

FðK i; sMÞ; for sM:start 6 t 6 sM:end;

8>>>><
>>>>:

where sj.start and sj.end are the start and the end time of a
scene unit sj in a news program at time t of the day d.

The associated frequency distribution from one segment of
news program is not enough to represent the overall pref-
erence or trend of a news channel, thus long term statistics
is needed.

By accumulating a longer period (say one month) of
associated frequency of news subjects, the preference of a
channel can be discovered. As shown in Fig. 12, keywords
that are related to social news, political news, and enter-
tainment news are applied to associate with and to accumu-
late frequency of news topics. As we can see in this
example, the monitored news channel favors social and
political news more than entertainment news.

4.2. The evolution of a series of news stories

The evolution of a news story can also be mined from
the news information tree. By associating the keywords of
a specific event with recorded news scenes over a period
of days, then the accumulated association frequency of
matched scene units presents an overall developing and
progressing of the specific news stories. Fig. 13 shows a
sort of life-cycle of a particular news events. In addition,
 500  1000  1500  2000  2500  3000  3500  4000

"politics"
"society"

"sport"

. Three sets of (representative) keywords are used to associate the
ing frequency of social, political and sport news in a news program.
the spreading of the specific events to other areas, e.g. cit-
ies, counties, countries, etc., can also be retrieved from the
associated names of locations in the matched scene units.
For example, one can query a news story by using a par-
ticular people’s name, then the person’s daily schedule
and/or whereabouts can be retrieved from the recorded
NIT.

4.3. The mining on TV commercial

Beside background stories, commercial records are also
valuable information. Huang et al. (2004) proposed com-
mercial detecting and identifying methods in TV video
clips. When a commercial frame contains image keywords
in a video frame, video OCR techniques can be used to
extract keywords to label the corresponding video clips.
Otherwise, keyblock-based image retrieval methods (Zhu,
Rao, & Zhang, 2002) may be utilized to represent and to
identify each commercial clips. However, manual annota-
tion is needed to label the keyblock. By gathering statistical
information of these labels and keywords in news pro-
grams, cross relationship between TV commercials, real-
time ratings, and news stories can be observed and
analyzed to achieve a useful marketing database. Two
example areas, customer modeling and cross-selling, in data-
base marketing are discussed in the following.

4.3.1. Customer modeling

The basic idea behind customer (i.e. the commercial buy-
ers and news audience) modeling is to improve audience
response rates by targeting prospects that are predicted as
most likely to respond to a particular advertisement or pro-
motion. This is achieved by building a model to predict the
likelihood that groups of news audience will respond based
on news type, viewing time and news channels as well as
previous viewing behavior. In addition, by targeting more
effectively to prospects and existing commercial buyers,
TV-station operators can improve and strengthen customer
relationships. The customer can perceive more value in TV-


1450 H.T. Pao et al. / Expert Systems with Applications 35 (2008) 1444–1450
news and commercials (i.e. both commercial buyers and
news audience receive only products and/or services of
interest to them).

4.3.2. Cross-selling

The basic idea behind cross-selling is to leverage the
existing customer base by selling them additional products
(commercial time slots) and/or news services. By analyzing
the groups of products or services that are commonly pur-
chased together and predicting each customer’s affinity
towards different products using historical data, a TV-sta-
tion can maximize its selling potential to the existing cus-
tomers. Cross-selling is one of the important areas in
database marketing where predictive data mining tech-
niques can be successfully applied. Using historical pur-
chase data of different products from the customer
database along with news type, viewing time and news
channels, commercial buyers can identify their products
that are most likely to be of interest to targeted news audi-
ence. Similarly, for each type of product (i.e. commercial or
groups of commercials), a ranked list of different types of
news or groups of audience, that are most likely to be
attracted to that product. Then, arrangement of commer-
cials with matched types of news to achieve a high likeli-
hood of audience response rate.
5. Conclusion

This paper addresses techniques and possible applica-
tions of fully automated information mining on a multime-
dia TV-news archive. The proposed automated information
mining contain the following processes: (1) segmenting a
TV-news program video recording into scene clips, (2) using
video OCR to extract and recognize close-caption and/or
image characters into keywords for each scenes, (3) using
keywords to generate semantic labels for each scenes, and
(4) segmenting commercial video clips from news clips.
Information associated with various labels and scenes
(e.g. the starting and ending time of a scene) are stored in
the proposed news information tree. Performing statistical
analysis on the data items in the news information tree
can reveal hidden information, like popular channels and
evolution of some hot news stories. These information can
help general multitude in finding their favored or desired
news channel, searching focal point person, tracking hot
news stories, . . ., and so on.
References

Cheng, S.-S., Wang, H. m., & Fu, H.-C. (2004). A model-selection-based
self-splitting gaussian mixture learning with application to speaker
identification. EURASIP Journal on Applied Signal Processing, 17,
2626–2639.

Daniel, Gildea, & Daniel, Jurafsky (2002). Automatic labeling of semantic
roles. Computational Linguististics, 28(3), 245–288.

Feinstein, C. D., & Morris, P. A. (1988). Information tree: A model of
information flow in complex organizations. Systems, Man and Cyber-
netics, IEEE Transactions, 18(3), 390–401.

Fraley, C., & Raftery, A. E. (1998). How many clusters? which clustering
method? answers via model-based cluster analysis. Computer Journal,
41, 578–588.

Huang,T. -Y., Lai, P.- S., & Fu, H.-C. (2004). A shot-based video clip
search method. In Proceedings of CVGIP2004, Taipei, Hualien, ROC,
August 2004.

Huffman, S., Yang, T., Yan, L., & Sanders, K. (1990). Genie out of the
bottle: Three US Networks report tiananmen square. In Proceedings of
the annual meeting of association for education in journalism and mass

communication, Minneapolis, Minnesota, USA.
Iain, E. G., & Richardson, H. (2003). 264 and mpeg-4 video compression.

Wiley Press.
Kim, D.-W., Kim, J.-T., Ra, I.-H., & Choi, Y.-S. (1999). A new video

interpolation technique based on motion-adaptive subsampling. IEEE
Transactions on Consumer Electronics, 45(3), 782–787.

Lai, P. S., Lai, L. Y., Tseng, T. C., Chen, Y. H., & Fu, H. C. (2004). A
fully automated web-based TV-news system. In Proceedings of PCM
2004, Tokyo, Japan, Dec. 2004.

Lee, M. H., Yoo, H. W., & Jang, D. S. (2006). Video scene change
detection using neural network: Improved art2. Expert System with
Applications, 31(1), 13–25.

Lin, C.-J., Liu, C.-C., & Chen, H.-H. (2001). A simple method for chinese
video ocr and its application to question answering. International
Journal of Computational Linguistics and Chinese Language Processing,

6(2), 11–30.
Patel, N. V., & Sethi, I. K. (1997). Video shot detection and character-

ization for video databases. Pattern Recognition, 30(4), 583–592.
Saracoglu, R., Tutuncu, K., & Allahverdi, N. (2007). A fuzzy clustering

approach for finding similar documents using a novel similarity
measure. Expert System with Applications, 33(3), 600–605.

Sun, S.- Y., Tseng, C. L., Chen, Y. H., Chuang, S. C., & Fu, H. C. (2004).
Cluster-based support vector machine in text-independent speaker
identification. In Proceedings of international joint conference on neural
networks IJCNN 2004, Budapest, Hungary; 2004.

Vanderbilt television news archive, http://www.vanderbilt.edu/vtna.
Wang, Y., Ostermann, J., & Zhang, Y.-Q. (2002). Video processing and

communications. Prentice Hall Press.
Xu, Y. Y., Chen, Y. H., Tseng, C. L., Lai, P. S., & Fu, H. C. (2004).

Multimedia TV-news browsing system. In Proceedings of IEEE
international conference on multimedia and expo (ICME), Taipei,
Taiwan, ROC; June 2004.

Zhu, L., Rao, A., & Zhang, A. (2002). Theory of keyblock-based image
rerieval. ACM Transactions on Information Systems, 20(2), 224–257.

http://www.vanderbilt.edu/vtna

	Constructing and application of multimedia TV-news archives
	Introduction
	TV-news archive
	News information tree generation
	News information tree
	Analysis units
	Semantic labels of units

	Data mining on the news information tree
	Mine the news preference of a TV-Station
	The evolution of a series of news stories
	The mining on TV commercial
	Customer modeling
	Cross-selling


	Conclusion
	References