Since this type of mark-up was designed with textual documents in mind, it works fantastically for them, but runs into problems when we try to expand it to include multimedia outside of strictly textual materials, such as audio and video. But why limit ourselves to simply marking up transcriptions of lyrics and dialogue, or to scripts?
Now, I'm sure you're wondering, how can you adequately mark-up something that you hear over time? Or visuals that are constantly in motion? These things are non-static in how we experience them, but they do not have to be non-static when we mark them up. To view these as static, I would suggest turning to non-linear editing for a proposal on how to view them as static in such a way that would be adequate for adapting TEI to such media.
For those unfamiliar with what non-linear editing looks like, here's a screencap from Adobe Premiere to help illustrate it:
|Click to enlarge|
The part you want to pay especial attention to there is that timeline sequence along the bottom, which has clips of audio and video (those colorful boxes) arranged along it. That's a view commonly used for non-linear editing. (N.B. It's called non-linear because you can jump to and from anywhere on the timeline with disregard for original linearity, unlike in traditional, physical editing.)
Now, imagine you have an interface similar to this, except instead of loading multiple clips to edit, you have but one object on the timeline, which would be whatever object you'll be marking up. The upper right pane would be used for what it is now: as a "monitor" through which you can either view (or listen) to the content you are editing. What that pane to the left of the monitor would then be for is editing the TEI tag(s) being opened or closed at that particular point on the timeline (on which your cursor would currently rest). The granularity of this temporal mark-up could be set to a specified interval, such as tenth or hundredth of a second, allowing each project to be as precise as necessary as to when tags open and close.
By working in conjunction with a timeline, you could also "wrap" portions of the file in the appropriate tags. Say a particular character gives a speech from 2h10m05s in the footage until 2h13m18s; you could then highlight that portion of the file in this view and then wrap the selected time segment with the <said> element, which would insert an opening tag at 2h10m05s and a closing tag at 2h13m18s.
Of course, this is just a nascent idea, and there would be technical specifics to be worked out, such as supported multimedia formats, software for such editing, and how this could actually be made beneficial on the consumer's end (e.g. how can this be used to dynamically enrich the manner in which a researcher studies this object?), but it would be wonderful if it could be made to work as a way to encode multimedia that has a temporal dimension. (Or, if there's something out there like it already, I'd love to know of it.)