Multimedia Users Group

Oklahoma State University

July 18, 2000

Jane Carpenter, Bill Elliot, John Gelder

An Introduction To Real and SMIL

Video is just now becoming a hot feature on the Internet. There are two...maybe three...environments for video over the Internet; Real (Real Networks), QuickTime (Apple), Media Player (Microsoft).

Historically video was pretty simple on the web. Click on a link and as long as you had the player it would load and play. Here are two examples, one using the Real player, the other the QuickTime player.

Real Example

QuickTime Example

But the sophistication of the players and the 'movies' have increased tremedously in the past year. Look at these examples,

Take 5


The current use of video on the Internet is more interactive than ever before. More and more tools are becoming available that allow this interactivity which can make your video more user friendly to your audience.


SMIL is a neumonic for Synchronized Multimedia Integration Language. SMIL is a markup language, related to XML, using tags to synchronize different media streams on a 'page'. The page could be displayed in the RealNetworks player, embedded within a web page or even played in QuickTime. The markup language provides the author with more control over placement of media streams on the screen and interactivity with those streams. SMIL 1.0 is the current version of this powerful yet simple language, with SMIL 2.0 slatted for release later this year. While you may feel like SMIL is something totally new and you've never seen it in action, you may in fact have used it without knowing. Many news and entertainment services such as CNN, ABCNews, Take 5 use SMIL to display their multimedia and provide interactivity on their sites.

So how do we create a SMIL file and what does a SMIL file look like? Since SMIL is a markup language you can create SMIL files using a text editor. So Notepad, SimpleText, BBedit or any other text editor can be used, even MicroSoft Word works. A familiarity with the tags/elements is useful to get started, but I think in our case we'll just learn those as we go along. Perhaps the best way to learn SMIL is to look at SMIL files created by others. This is possibly the most efficient approach. Once you've created a SMIL file you can use the Realplayer (RealPlayer Basic) by RealNetworks to view the results locally. If the media files are not particularly large you can actually serve up your materials from an http server. But for larger files which need to be accessed by many users a RealServer is important.

So lets look at the four important tags that should appear in your SMIL file.





The basic organization of these four tags is,









Specific information about the design of your screen is placed between the <layout> tags, and the actual media data types are placed within the <body> tag. The file is saved with an '.smi', or '.smil' extension.

The layout tag has several important features. Within the basic layout tag we can define the root window size which will contain the media elements. An example is,


<root-layout background-color="black" width="600" height="500"/>


The root-layout element defines the window size for the player, in this case the player window will be 600 pixels wide and 500 pixels high. Within that window a different tag is used to place the media elements. This element is called the region id="". The region element controls the position, size and scaling of the media elements. For example if within the window defined by the root-layout element there are two regions for media streams and a region for a background image to play the media elements over, they would be defined in the following way,


<root-layout background-color="black" width="500" height="500"/>

<region id="backregion" top="0" left="0" width="500" height="500" z-index="0"/>

<region id="videoregion" top="176" left="252" width="160" height="120" z-index="1"/>

<region id="textregion" top="355" left="159" width="325" height="129" z-index="1"/>


The left, right, width and height attributes define the position and size of the regions. Notice the videoregion is 160 pixels by 120 pixels. That is the size of the video clip I captured. The textregion is 325 pixels wide and 129 pixels high. Those dimensions were arrived at on the basis of the information I display within that region. I use the textregion to display slides showing information that I'm discussing in my presentation. The backregion is the same dimensions as the root-layout. I place a background image which I created in Fireworks 3.0. The background image looks like;

Within this background image you can see the embossed region for the video and the slide region (in gray). The z-index defines the relative layer for each region. The higher number region is on top of the lower number region.

Now that we've defined the regions in our file we can associate the different media streams within these regions. So the following should be placed within the <body> </body> tag.




<ref src="xxx.rm" region="videoregion"/>

<ref src="xxx.gif" region="backregion" fill="freeze">





The <seq> and <par> elements define how the media elements are sychronized contained within these elemetns are played. In this particular case all three media streams are played in parallel. They begin together and end according to the time length of the longest stream. If one of the streams has a shorter length the fill="freeze" attribute freezes the last frame of the stream. If two media streams are to be played one after the other the streams would be placed between the <seq> element. For example,



<ref src="xxx.rm" region="videoregion"/>

<ref src="xxx.gif" region="backregion" fill="freeze">




This arrangement would display the background image then play the video file on top of the image. When the video was over the slides would begin to play.

Note each media stream is assigned a particular region id based on those defined within the <layout> element.

You've already noticed that the background image has several 'buttons'. These areas can be defined as buttons to advanced the video, or to load a different SMIL file. Lets add some additional elements within the <body> element to enhance the interactivity.



<ref src="xxx.rm" region="videoregion"/>

<ref src="xxx.gif" region="backregion" fill="freeze">

<anchor href="command:seek(0:0.0)" target="_player" coords="21,174,121,204"/>

<anchor href="command:seek(0:20.0)" target="_player" coords="21,224,121,254"/>

<anchor href="command:seek(0:30.0)" target="_player" coords="21,274,121,304"/>

<anchor href="command:seek(0:40.0)" target="_player" coords="21,324,121,354"/>




The <anchor href> element allows the author to add a measure of interactivity to the file. So we can define a rectangle so when the user clicks the mouse within a particular behavior occurs. In the example above two different actions occur. The "command:seek(x:xx.x)" action causes the current video to jump to the particular time and continue to play. The <anchor href> element can also load a new file. In this particular case I've used relative addressing because the new SMIL file is within the same directory.

It is also possible to make a particular media stream 'hot'. If I wanted to allow the user to click the mouse within the streaming video media I would use a slight variation,

<a href="xxx.smi" show="new">

<video src="xxx.rm" region="videoregion"/>


Notice the 'video src' is different compared to the 'ref src" I used. These are examples of media object elements. SMIL allows several different media object elements, including; ref, animation, audio, img, video, text and textstream. As I understand the 'ref' is the broadest and will handle any of the sources.

To add some slides into the textregion we need to creat some gif or jpeg images and palce them into the same directory. We use the element,

<img src="test1.gif" region="textregion" begin="0min" end="0.1min/>

The begin and end attributes define the timing of the slide. If additional slides are added it would look like.

<img src="test1.gif" region="textregion" begin="0min" end=".1min"/>
<img src="test2.gif" region="textregion" begin=".10min" end=".2min"/>
<img src="test3.gif" region="textregion" begin=".2min" end=".3min"/>
<img src="test4.gif" region="textregion" begin=".3min" end=".4min"/>

There are many additional attributes allowed for each of the elements I've discussed. These are described in the SMIL language specifications. If you are like me the details of these specifications are better understood by looking at examples of different SMIL files and analyzing what they do.

The particular example I've used in this discussion is one I currently favor for use of video captured in my lectures. It allows the students to advance to particular example problems, or discussion of concepts. The student does not have to guess where in a 50 minute video particular events have occurred. This means the author must define those particular time points in the video by reviewing the video. I've found through experience I can capture a 50 minute video in...50 minutes. Using Real Producer Plus the video is compressed while it is captured. A 50 minute lecture is approximately 100 MB in size. An additional 30 minutes are required to determine the exact time points within the video important to the student. Producing the slides are where I spend the majority of my time. These may take several hours depending on how detailed you desire. After all of the video is captured, the time points within the video determined and the slides are created I upload the files to the Real Server and generate a link on my web page to access the file.

One particular area I've not discussed is the issue of bitrate. This is very important for the user. This has to do with speed the video, audio, text, pics are delivered to the user. That depends on the connection of the user, 56Kbaud, T1, ISDN, etc. A SMIL file containing media elements streaming for a T1 line will not play very well on a 56 K connection. I've not got all this figured out completely at this point, but hope to by the time this workshop is ready for the Fall semester.
