Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Jan Newmarch, Linux Sound Programming, 10.1007/978-1-4842-2496-0_26

26. Subtitles and Closed Captions

Jan Newmarch¹

(1)Oakleigh, Victoria, Australia

Many karaoke systems use subtitles¹ imposed over a movie of some kind. Programs like kmid and my Java programs play lyrics on some sort of canvas object. This gives a pretty boring background. Video CDs or MPEG-4 files have a nicer background but have the lyrics hard-coded onto the background video, so there is little chance for manipulation of them. CD+G files keep the lyrics separate from the video, but there doesn’t seem to be any way of playing them directly from Linux. They can be converted to MP3+G, and they can be played by VLC, which loads the MP3 file and picks up the corresponding .cdg file.

This chapter considers subtitles that can be created independently, combined with video and audio in some way, and then played. The current situation is not completely satisfactory.

Resources

Check out this resource :

“Subtitling with Linux Tutorial” ( http://sub.wordnerd.de/linux-subs.html )

Subtitle Formats

This chapter is concerned here with what are called soft subtitles, where the subtitles are stored in a separate file from the video or audio and are combined during rendering. The Wikipedia page “Subtitle (captioning)” ( http://en.wikipedia.org/wiki/Subtitle_(captioning )) is a long article going into many issues about subtitling. It also contains a list of subtitle formats, but the one that seems to be of most use in this context is SubStation Alpha.

MPlayer

According to the MPlayer page “Subtitles and OSD” ( www.mplayerhq.hu/DOCS/HTML/en/subosd.htm ), the following are the formats recognized by MPlayer:

VOBsub
OGM
CC (closed caption)
MicroDVD
SubRip
SubViewer
Sami
VPlayer
RT
SSA
PJS (Phoenix Japanimation Society)
MPsub
AQTitle
JACOsub

VLC

According to VLC ( www.videolan.org/vlc/features.php?cat=sub ), support under Linux includes the following subtitle formats:

DVD
Text files (MicroDVD, SubRIP, SubViewer, SSA1-5, SAMI, VPlayer)
Closed captions
Vobsub
Universal Subtitle Format (USF)
SVCD/CVD
DVB
OGM
CMML
Kate

If you play some sort of video file, say XYZ.mpg, and there is also a file with the same root name and appropriate extension such as XYZ.ass (the extension for SubStation Alpha), then VLC will automatically load the subtitles file and play it. If the subtitles file has a different name, then it can be loaded from the VLC menu Video ➤ Subtitles Track. However, this does not appear to be as reliable as sharing the name.

Gnome Subtitles

See “Gnome Subtitles 1.3 is out!” ( http://gnome-subtitles.sourceforge.net/ ). Gnome supports Adobe Encore DVD, Advanced Sub Station, Alpha AQ, Title DKS Subtitle Format FAB Subtitler Karaoke Lyrics LRC Karaoke Lyrics VKT MacSUB MicroDVD MPlayer MPlayer 2 MPSub Panimator Phoenix Japanimation Society Power DivX Sofni SubCreator 1.x SubRip Sub Station Alpha SubViewer 1.0, SubViewer 2.0, and ViPlay Subtitle File.

SubStation Alpha

The SSA/ASS specification is at MooDub.free ( http://moodub.free.fr/video/ass-specs.doc ). It is brief and appears to contain some minor errors with respect to later specifications and implementations. For example, the time format is different. Or are the later ones all wrong?

SSA/ASS files can be used stand-alone. They can also be included in container formats such as Matroska files, discussed briefly in Chapter 3. When they are embedded into MKV files, some restrictions ( www.matroska.org/technical/specs/subtitles/ssa.html ) are made, such as the text being converted into UTF-8 Unicode.

ASS files are divided into several sections.

General information about the environment the subtitle file expects, such as the X and Y resolutions
Style information such as colors and fonts
Event information, which is where the subtitle text is given along with timing information and any special effects to be applied

Under normal circumstances you would not directly create such files using a text editor. Instead, the program Aegisub gives you a GUI environment in which to create the files. Essentially, you just enter the text lines, plus the start and end times for each line to be displayed.

Figure 26-1 shows a screen dump.

Figure 26-1. Aegisub screenshot

Many special effects are possible. The video on Bill Cresswell's blog ( https://billcreswell.wordpress.com/tag/aegisub/ ) is an excellent example. Here is the direct YouTube link: www.youtube.com/watch?v=0Z0dgdglrAo .

For completeness, here is part of an ASS file I created:

[Script Info]
; Script generated by Aegisub 2.1.9
; http://www.aegisub.org/
Title: Default Aegisub file
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 640
PlayResY: 480
ScaledBorderAndShadow: yes
Video Aspect Ratio: 0
Video Zoom: 6
Video Position: 0

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,20,&H00FFFFFF,&H00B4FCFC,&H00000008,&H80000008,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,Here comes the sun
Dialogue: 0,0:00:20.19,0:00:21.75,Default,,0000,0000,0000,,doo doo doo doo
Dialogue: 0,0:00:22.16,0:00:24.20,Default,,0000,0000,0000,,Here comes the sun
Dialogue: 0,0:00:24.61,0:00:28.24,Default,,0000,0000,0000,,I said it's alright
...

Karaoke Effects in ASS Files

A line in an ASS file essentially consists of a time to start the display, a time to finish the display, and the text itself. However, karaoke users are accustomed to the text being highlighted as it is played.

ASS supports two major highlight styles.

Words are highlighted one at a time.
The text is highlighted by filling from the left.

These effects are done by embedding “karaoke overrides” into the text. These are in {} with a duration time in hundredths of a second.

The details are as follows:

Word highlighting
An override of the form {k<time>} will highlight the following word for time hundredths of a second. An example is as follows:
```
{k100}Here {k150}comes {k50}the {k150}sun
```
Fill highlighting
An override of the form {kf<time>} will progressively fill up the following word for time hundredths of a second. An example is as follows:
```
{kf100}Here {kf150}comes {kf50}the {kf150}sun
```
The three styles appear as follows:
Lines with no highlighting (see Figure 26-2)
Figure 26-2. Subtitles without highlighting
Word highlighting (see Figure 26-3)
Figure 26-3. Subtitles with word highlighting
Fill highlighting (see Figure 26-4)
Figure 26-4. Subtitles with fill highlighting

Multiline Karaoke

Ideally, a karaoke system should have a “look-ahead” mechanism whereby you can see the next line before having to sing it. This can be done by showing two lines of text with overlapping times at different heights. The algorithm is as follows:

When line N with markup is shown,
    show line N+1 without markup
After line N is finished, continue showing line N+1
When line N+1 is due to show,
     finish showing unmarked line N+1
     show line N+1 with markup

Here is the song “Here Comes the Sun” with lyrics:

Here comes the sun
doo doo doo doo
Here comes the sun
I said it's alright

The resultant ASS file should look like this:

Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0100,,{kf16}Here {kf46}comes {kf43}the {kf67}sun
Dialogue: 0,0:00:18.22,0:00:20.19,Default,,0000,0000,0000,,doo doo doo doo
Dialogue: 0,0:00:20.19,0:00:21.75,Default,,0000,0000,0000,,{kf17}doo {kf25}doo {kf21}doo {kf92}doo
Dialogue: 0,0:00:20.19,0:00:22.16,Default,,0000,0000,0100,,Here comes the sun
Dialogue: 0,0:00:22.16,0:00:24.20,Default,,0000,0000,0100,,{kf17}Here {kf46}comes {kf43}the {kf97}sun
Dialogue: 0,0:00:22.16,0:00:24.61,Default,,0000,0000,0000,,I said it's alright

Figure 26-5 shows what it looks like.

Figure 26-5. Multiline subtitles

libass

SubStation Alpha and its renderers appear to have been through a complex history. According to “The old and present: VSFilter” ( http://blog.aegisub.org/2010/02/old-and-present-vsfilter.html ), the ASS format was finalized in about 2004, and the renderer VSFilter was made open source at that time. However, around 2007 development of VSFilter ceased, and several forks were made. These introduced several extensions to the format, such as the blur tag by Aegisub. Some of these forks since merged, some were abandoned, and for some of these forks there is still code in the wild.

libass ( http://code.google.com/p/libass/ ) is the main rendering library for Linux. An alternative, xy-vsfilter, claims to be faster, more reliable, and so on, but does not seem to have a Linux implementation. libass supports some of the later extensions. These seem to be the Aegisub 2008 extensions, according to “VSFilter hacks” ( http://blog.aegisub.org/2008/07/vsfilter-hacks.html ).

Converting KAR Files to MKV Files with ASS Subtitles

Follow these steps:

To pull out the lyrics from a KAR or MIDI file, use the Java DumpSequence given in Chapter 18, as follows, to get a dump of all events:
```
java DumpSequence  song.kar  > song.dump
```

For line-only display , use the following Python script generated by Aegisub 2.1.9 to extract the lyrics and save them in ASS format:

#!/usr/bin/python

import fileinput
import string
import math

TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,Karaoke,"

textStr = TEXT_STR
startTime = -1
endTime = -1

def printPreface():
    print '[Script Info]

; Script generated by Aegisub 2.1.9

; http://www.aegisub.org/

Title: Default Aegisub file

ScriptType: v4.00+

WrapStyle: 0

PlayResX: 640

PlayResY: 480

ScaledBorderAndShadow: yes

Video Aspect Ratio: 0

Video Zoom: 6

Video Position: 0



[V4+ Styles]

Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding

Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1



[Events]

Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'

def timeFormat(s):
    global microSecondsPerTick

    tf = float(s)
    tf /= 62.6  #ticks per sec

    # This should be right , but is too slow
    #tf = (tf * microSecondsPerTick) / 1000000

    t = int(math.floor(tf))
    hundredths = round((tf-t)*100)
    secs = t % 60
    t /= 60
    mins = t % 60
    t /= 60
    hrs = t
    return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)

def doLyric(words):
    global textStr
    global startTime
    global endTime
    global TEXT_STR

    if words[1] == "0:":
        #print "skipping"
        return

    time = string.rstrip(words[1], ':')
    if startTime == -1:
        startTime = time
    #print words[1],
    if len(words) == 5:
        if words[4][0] == '' or words[4][0] == '/':
            #print "My name is %s and weight is %d kg!" % ('Zara', 21)
            #print startTime, endTime
            print textStr % (timeFormat(startTime), timeFormat(endTime)) + "
",
            textStr = TEXT_STR + words[4][:1]
            startTime = -1
        else:
            textStr += words[4]
    else:
        textStr += ' '

    endTime = time

printPreface()

for line in fileinput.input():
    words = line.split()

    if len(words)  >= 2:
        if words[0] == "Resolution:":
            ticksPerBeat = words[1]
        elif words[0] == "Length:":
            numTicks = int(words[1])
        elif words[0] == "Duration:":
            duration = int(words[1])
            microSecondsPerTick = duration/numTicks
            # print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)

    if len(words) >= 3 and words[2] == "Text":
        doLyric(words)

Here’s an example:

python lyric2ass4kar.py song.dump > song.ass

For fill lyrics display , use the following Python script to extract the lyrics and save them in ASS format:

 #!/usr/bin/python

import fileinput
import string
import math

TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,,"

textStr = "{kf%d}"
plainTextStr = ""
startTime = -1
startWordTime = -1
endTime = -1

def printPreface():
    print '[Script Info]

; Script generated by Aegisub 2.1.9

; http://www.aegisub.org/

Title: Default Aegisub file

ScriptType: v4.00+

WrapStyle: 0

PlayResX: 640

PlayResY: 480

ScaledBorderAndShadow: yes

Video Aspect Ratio: 0

Video Zoom: 6

Video Position: 0



[V4+ Styles]

Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding

Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1



[Events]

Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'

def timeFormat(s):
    global microSecondsPerTick

    tf = float(s)

    # frames per sec should be 60: 120 beats/min, 30 ticks per beat
    # but it is too slow on 54154
    tf /= 62.6  #ticks per sec

    # This should be right , but is too slow
    # tf = (tf * microSecondsPerTick) / 1000000

    t = int(math.floor(tf))
    hundredths = round((tf-t)*100)
    secs = t % 60
    t /= 60
    mins = t % 60
    t /= 60
    hrs = t
    return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)

def durat(end, start):
    fend = float(end)
    fstart = float(start)
    d = (fend - fstart) / 62.9
    #print end, start, d
    return round(d*100)

def doLyric(words):
    global textStr
    global plainTextStr
    global startTime
    global endTime
    global TEXT_STR
    global startWordTime
    global lineNum

    if words[1] == "0:":
        #print "skipping"
        return

    time = string.rstrip(words[1], ':')
    if startTime == -1:
        startTime = time
        startWordTime = time
        previousEndTime = time
    #print words[1],
    if len(words) == 5:
        if words[4][0] == '' or words[4][0] == '/':
            #print "My name is %s and weight is %d kg!" % ('Zara', 21)
            #print startTime, endTime
            dur = durat(time, startWordTime)
            textStr = textStr % (dur)
            if len(words[4]) == 1:
                print TEXT_STR % (timeFormat(startTime),
                                  timeFormat(endTime)) + 
                                  textStr + "
",

            # next word
            textStr = "{kf%d}" + words[4][1:]
            startTime = -1
        else:
            textStr += words[4]
    else:
        # it's a space, gets lost by the split
        dur = durat(time, startWordTime)
        textStr = textStr % (dur) + " {kf%d}"
        startWordTime = time

    endTime = time

printPreface()
# print "Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,{k16}Here {k46}comes {k43}the {k67}sun"

for line in fileinput.input():
    words = line.split()

    if len(words)  >= 2:
        if words[0] == "Resolution:":
            ticksPerBeat = words[1]
        elif words[0] == "Length:":
            numTicks = int(words[1])
        elif words[0] == "Duration:":
            duration = int(words[1])
            microSecondsPerTick = duration/numTicks
            # print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)

    if len(words) >= 3 and words[2] == "Text":
        doLyric(words)

Here’s an example:

python lyric2karaokeass4kar.py song.dump > song.ass

For multiline lyrics display , use the following Python script to extract the lyrics and save them in ASS format:

 #!/usr/bin/python

import fileinput
import string
import math

START_EVENTS = ["Dialogue: 0,%s,%s,Default,,0000,0000,0000,,",
                "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"]

TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,,"
TEXT_STR2 = "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"

textStr = "{kf%d}"
plainTextStr = ""
startTime = -1
previousStartTime = -1
startWordTime = -1
endTime = -1
previousEndTime = -1
lineNum = 0

def printPreface():
    print '[Script Info]

; Script generated by Aegisub 2.1.9

; http://www.aegisub.org/

Title: Default Aegisub file

ScriptType: v4.00+

WrapStyle: 0

PlayResX: 640

PlayResY: 480

ScaledBorderAndShadow: yes

Video Aspect Ratio: 0

Video Zoom: 6

Video Position: 0



[V4+ Styles]

Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding

Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1



[Events]

Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'

def timeFormat(s):
    global microSecondsPerTick

    tf = float(s)
    # print "factori is %f instead of %f" % ((1.0*microSecondsPerTick / 1000000), (1.0/62.9))
    # frames per sec should be 60: 120 beats/min, 30 ticks per beat
    # but it is too slow on 54154
    tf /= 62.6  #ticks per sec

    # This should be right , but is too slow
    # tf = (tf * microSecondsPerTick) / 1000000

    t = int(math.floor(tf))
    hundredths = round((tf-t)*100)
    secs = t % 60
    t /= 60#!/usr/bin/python

import fileinput
import string
import math

START_EVENTS = ["Dialogue: 0,%s,%s,Default,,0000,0000,0000,,",
                "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"]

TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,,"
TEXT_STR2 = "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"

textStr = "{kf%d}"
plainTextStr = ""
startTime = -1
previousStartTime = -1
startWordTime = -1
endTime = -1
previousEndTime = -1
lineNum = 0

def printPreface():
    print '[Script Info]

; Script generated by Aegisub 2.1.9

; http://www.aegisub.org/

Title: Default Aegisub file

ScriptType: v4.00+

WrapStyle: 0

PlayResX: 640

PlayResY: 480

ScaledBorderAndShadow: yes

Video Aspect Ratio: 0

Video Zoom: 6

Video Position: 0



[V4+ Styles]

Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding

Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1



[Events]

Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'

def timeFormat(s):
    global microSecondsPerTick

    tf = float(s)
    # print "factori is %f instead of %f" % ((1.0*microSecondsPerTick / 1000000), (1.0/62.9))
    # frames per sec should be 60: 120 beats/min, 30 ticks per beat
    # but it is too slow on 54154
    tf /= 62.6  #ticks per sec

    # This should be right , but is too slow
    # tf = (tf * microSecondsPerTick) / 1000000

    t = int(math.floor(tf))
    hundredths = round((tf-t)*100)
    secs = t % 60
    t /= 60
    mins = t % 60
    t /= 60
    hrs = t
    return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)

def durat(end, start):
    fend = float(end)
    fstart = float(start)
    d = (fend - fstart) / 62.9
    #print end, start, d
    return round(d*100)

def doLyric(words):
    global textStr
    global plainTextStr
    global startTime
    global endTime
    global previousStartTime
    global previousEndTime
    global TEXT_STR
    global startWordTime
    global lineNum

    if words[1] == "0:":
        #print "skipping"
        return

    time = string.rstrip(words[1], ':')
    if startTime == -1:
        startTime = time
        startWordTime = time
        previousEndTime = time
    #print words[1],
    if len(words) == 5:
        if words[4][0] == '' or words[4][0] == '/':
            #print "My name is %s and weight is %d kg!" % ('Zara', 21)
            #print startTime, endTime
            dur = durat(time, startWordTime)
            textStr = textStr % (dur)

            if len(words[4]) == 1:

                if previousStartTime != -1:
                    print START_EVENTS[lineNum % 2] % (timeFormat(previousStartTime),
                                                       timeFormat(previousEndTime)) + 
                                                       plainTextStr + "
",
                print START_EVENTS[lineNum % 2] % (timeFormat(startTime),
                                                   timeFormat(endTime)) + 
                                                   textStr + "
",

            # next word
            lineNum += 1
            #previousEndTime = time
            textStr = "{kf%d}" + words[4][1:]
            plainTextStr = words[4][1:]
            previousStartTime = startTime
            startTime = -1
        else:
            textStr += words[4]
            plainTextStr += words[4]
    else:
        #print textStr
        #dur = duration(time, startWordTime)
        dur = durat(time, startWordTime)
        textStr = textStr % (dur) + " {kf%d}"
        plainTextStr += ' '
        startWordTime = time

    endTime = time

printPreface()
# print "Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,{k16}Here {k46}comes {k43}the {k67}sun"

for line in fileinput.input():
    words = line.split()

    if len(words)  >= 2:
        if words[0] == "Resolution:":
            ticksPerBeat = words[1]
        elif words[0] == "Length:":
            numTicks = int(words[1])
        elif words[0] == "Duration:":
            duration = int(words[1])
            microSecondsPerTick = duration/numTicks
            # print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)

    if len(words) >= 3 and words[2] == "Text":
        doLyric(words)
    mins = t % 60
    t /= 60
    hrs = t
    return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)

def durat(end, start):
    fend = float(end)
    fstart = float(start)
    d = (fend - fstart) / 62.9
    #print end, start, d
    return round(d*100)

def doLyric(words):
    global textStr
    global plainTextStr
    global startTime
    global endTime
    global previousStartTime
    global previousEndTime
    global TEXT_STR
    global startWordTime
    global lineNum

    if words[1] == "0:":
        #print "skipping"
        return

    time = string.rstrip(words[1], ':')
    if startTime == -1:
        startTime = time
        startWordTime = time
        previousEndTime = time
    #print words[1],
    if len(words) == 5:
        if words[4][0] == '' or words[4][0] == '/':
            #print "My name is %s and weight is %d kg!" % ('Zara', 21)
            #print startTime, endTime
            dur = durat(time, startWordTime)
            textStr = textStr % (dur)

            if len(words[4]) == 1:

                if previousStartTime != -1:
                    print START_EVENTS[lineNum % 2] % (timeFormat(previousStartTime),
                                                       timeFormat(previousEndTime)) + 
                                                       plainTextStr + "
",
                print START_EVENTS[lineNum % 2] % (timeFormat(startTime),
                                                   timeFormat(endTime)) + 
                                                   textStr + "
",

            # next word
            lineNum += 1
            #previousEndTime = time
            textStr = "{kf%d}" + words[4][1:]
            plainTextStr = words[4][1:]
            previousStartTime = startTime
            startTime = -1
        else:
            textStr += words[4]
            plainTextStr += words[4]
    else:
        #print textStr
        #dur = duration(time, startWordTime)
        dur = durat(time, startWordTime)
        textStr = textStr % (dur) + " {kf%d}"
        plainTextStr += ' '
        startWordTime = time

    endTime = time

printPreface()
# print "Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,{k16}Here {k46}comes {k43}the {k67}sun"

for line in fileinput.input():
    words = line.split()

    if len(words)  >= 2:
        if words[0] == "Resolution:":
            ticksPerBeat = words[1]
        elif words[0] == "Length:":
            numTicks = int(words[1])
        elif words[0] == "Duration:":
            duration = int(words[1])
            microSecondsPerTick = duration/numTicks
            # print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)

    if len(words) >= 3 and words[2] == "Text":
        doLyric(words)

Here is an example:

python lyric2karaokeass4kar.py song.dump > song.ass

Convert the MIDI sound file to a WAV file using fluidsynth.

fluidsynth -F song.wav /usr/share/sounds/sf2/FluidR3_GM.sf2 song.kar

Convert the WAV file to MP3.
```
lame song.wav song.mp3
```
Find a suitable video-only file for your background (I used one off my karaoke discs) and then merge them into an MKV file.
```
mkvmerge -o 54154.mkv 54154.mp3 54154.ass BACK01.MPG
```

The resultant MKV file can then be played as a stand-alone file by MPlayer.

mplayer song.mkv

It can also be played by VLC, but only with the ASS file present.

vlc song.mkv

Screen captures were shown earlier in the chapter, depending on the karaoke effect chosen.

Timing is, however, an issue. The default MIDI tempo is 120 beats per minute, and a common tick rate is 30 ticks per beat. This leads to a rate of 60 MIDI ticks per second. However, you are now playing MP3 files and ASS files, neither of which are MIDI files anymore and which are not necessarily synchronized. With a rate of 60 ticks per second in converting from MIDI to ASS, the lyrics run too slowly. Experimentally I have found 62.9 to be a reasonable rate for at least some files.

HTML5 Subtitles

HTML5 has support for video types, although exactly what video format is supported by which brower is variable. This includes support for subtitles and closed captions, using the HTML 5.1 track element. A search will turn up several detailed articles discussing this in more detail.

You need to prepare a file of timing and text instructions. The format shown in examples is as a .vtt file and can be as follows:

WEBVTT

1
00:00:01.000 --> 00:00:30.000  D:vertical A:start
This is the first line of text, displaying from 1-30 seconds

2
00:00:35.000 --> 00:00:50.000
And the second line of text
separated over two lines from 35 to 50 seconds

Here the first line is WEBVTT, and blocks of text are separated by blank lines. The format of VTT files is specified at “WebVTT: The Web Video Text Tracks Format” ( http://dev.w3.org/html5/webvtt/ ).

The HTML then references the audio/video files and the subtitles file as follows:

    <video  controls>
      <source src="output.webm" controls>
      <track src="54154.vtt" kind="subtitles" srclang="en" label="English" default />
      <!-- fallback for rubbish browsers -->
    </video>

Figure 26-6 shows a screen capture.

Figure 26-6. HTML5 subtitles

There does not seem to be any mechanism for highlighting words progressively in a line. Possibly JavaScript may be able to do so, but after a cursory look, it doesn’t seem likely. This makes it not yet suitable for karaoke.

Conclusion

This chapter discussed methods for overlaying subtitle text onto a changing video image. It is feasible, but there are only a few viable mechanisms.

Footnotes

1 Rigorously, subtitles refer to what is spoken, while closed captions may include other sounds such as doors slamming. For karaoke, there is no need to distinguish them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 26. Subtitles and Closed Captions

Create new playlist

Sign In

Sign Up