©  Jan Newmarch 2017

Jan Newmarch, Linux Sound Programming, 10.1007/978-1-4842-2496-0_26

26. Subtitles and Closed Captions

Jan Newmarch

(1)Oakleigh, Victoria, Australia

Many karaoke systems use subtitles1 imposed over a movie of some kind. Programs like kmid and my Java programs play lyrics on some sort of canvas object. This gives a pretty boring background. Video CDs or MPEG-4 files have a nicer background but have the lyrics hard-coded onto the background video, so there is little chance for manipulation of them. CD+G files keep the lyrics separate from the video, but there doesn’t seem to be any way of playing them directly from Linux. They can be converted to MP3+G, and they can be played by VLC, which loads the MP3 file and picks up the corresponding .cdg file.

This chapter considers subtitles that can be created independently, combined with video and audio in some way, and then played. The current situation is not completely satisfactory.

Resources

Check out this resource :

Subtitle Formats

This chapter is concerned here with what are called soft subtitles, where the subtitles are stored in a separate file from the video or audio and are combined during rendering. The Wikipedia page “Subtitle (captioning)” ( http://en.wikipedia.org/wiki/Subtitle_(captioning )) is a long article going into many issues about subtitling. It also contains a list of subtitle formats, but the one that seems to be of most use in this context is SubStation Alpha.

MPlayer

According to the MPlayer page “Subtitles and OSD” ( www.mplayerhq.hu/DOCS/HTML/en/subosd.htm ), the following are the formats recognized by MPlayer:

  1. VOBsub

  2. OGM

  3. CC (closed caption)

  4. MicroDVD

  5. SubRip

  6. SubViewer

  7. Sami

  8. VPlayer

  9. RT

  10. SSA

  11. PJS (Phoenix Japanimation Society)

  12. MPsub

  13. AQTitle

  14. JACOsub

VLC

According to VLC ( www.videolan.org/vlc/features.php?cat=sub ), support under Linux includes the following subtitle formats:

  1. DVD

  2. Text files (MicroDVD, SubRIP, SubViewer, SSA1-5, SAMI, VPlayer)

  3. Closed captions

  4. Vobsub

  5. Universal Subtitle Format (USF)

  6. SVCD/CVD

  7. DVB

  8. OGM

  9. CMML

  10. Kate

If you play some sort of video file, say XYZ.mpg, and there is also a file with the same root name and appropriate extension such as XYZ.ass (the extension for SubStation Alpha), then VLC will automatically load the subtitles file and play it. If the subtitles file has a different name, then it can be loaded from the VLC menu Video ➤ Subtitles Track. However, this does not appear to be as reliable as sharing the name.

Gnome Subtitles

See “Gnome Subtitles 1.3 is out!” ( http://gnome-subtitles.sourceforge.net/ ). Gnome supports Adobe Encore DVD, Advanced Sub Station, Alpha AQ, Title DKS Subtitle Format FAB Subtitler Karaoke Lyrics LRC Karaoke Lyrics VKT MacSUB MicroDVD MPlayer MPlayer 2 MPSub Panimator Phoenix Japanimation Society Power DivX Sofni SubCreator 1.x SubRip Sub Station Alpha SubViewer 1.0, SubViewer 2.0, and ViPlay Subtitle File.

SubStation Alpha

The SSA/ASS specification is at MooDub.free ( http://moodub.free.fr/video/ass-specs.doc ). It is brief and appears to contain some minor errors with respect to later specifications and implementations. For example, the time format is different. Or are the later ones all wrong?

SSA/ASS files can be used stand-alone. They can also be included in container formats such as Matroska files, discussed briefly in Chapter 3. When they are embedded into MKV files, some restrictions ( www.matroska.org/technical/specs/subtitles/ssa.html ) are made, such as the text being converted into UTF-8 Unicode.

ASS files are divided into several sections.

  1. General information about the environment the subtitle file expects, such as the X and Y resolutions

  2. Style information such as colors and fonts

  3. Event information, which is where the subtitle text is given along with timing information and any special effects to be applied

Under normal circumstances you would not directly create such files using a text editor. Instead, the program Aegisub gives you a GUI environment in which to create the files. Essentially, you just enter the text lines, plus the start and end times for each line to be displayed.

Figure 26-1 shows a screen dump.

A435426_1_En_26_Fig1_HTML.jpg
Figure 26-1. Aegisub screenshot

Many special effects are possible. The video on Bill Cresswell's blog ( https://billcreswell.wordpress.com/tag/aegisub/ ) is an excellent example. Here is the direct YouTube link: www.youtube.com/watch?v=0Z0dgdglrAo .

For completeness, here is part of an ASS file I created:

[Script Info]
; Script generated by Aegisub 2.1.9
; http://www.aegisub.org/
Title: Default Aegisub file
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 640
PlayResY: 480
ScaledBorderAndShadow: yes
Video Aspect Ratio: 0
Video Zoom: 6
Video Position: 0


[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,20,&H00FFFFFF,&H00B4FCFC,&H00000008,&H80000008,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1


[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,Here comes the sun
Dialogue: 0,0:00:20.19,0:00:21.75,Default,,0000,0000,0000,,doo doo doo doo
Dialogue: 0,0:00:22.16,0:00:24.20,Default,,0000,0000,0000,,Here comes the sun
Dialogue: 0,0:00:24.61,0:00:28.24,Default,,0000,0000,0000,,I said it's alright
...

Karaoke Effects in ASS Files

A line in an ASS file essentially consists of a time to start the display, a time to finish the display, and the text itself. However, karaoke users are accustomed to the text being highlighted as it is played.

ASS supports two major highlight styles.

  1. Words are highlighted one at a time.

  2. The text is highlighted by filling from the left.

These effects are done by embedding “karaoke overrides” into the text. These are in {} with a duration time in hundredths of a second.

The details are as follows:

  1. Word highlighting

    An override of the form {k<time>} will highlight the following word for time hundredths of a second. An example is as follows:

    {k100}Here {k150}comes {k50}the {k150}sun
  2. Fill highlighting

    An override of the form {kf<time>} will progressively fill up the following word for time hundredths of a second. An example is as follows:

    {kf100}Here {kf150}comes {kf50}the {kf150}sun

    The three styles appear as follows:

  3. Lines with no highlighting (see Figure 26-2)

    A435426_1_En_26_Fig2_HTML.jpg
    Figure 26-2. Subtitles without highlighting
  4. Word highlighting (see Figure 26-3)

    A435426_1_En_26_Fig3_HTML.jpg
    Figure 26-3. Subtitles with word highlighting
  5. Fill highlighting (see Figure 26-4)

    A435426_1_En_26_Fig4_HTML.jpg
    Figure 26-4. Subtitles with fill highlighting

Multiline Karaoke

Ideally, a karaoke system should have a “look-ahead” mechanism whereby you can see the next line before having to sing it. This can be done by showing two lines of text with overlapping times at different heights. The algorithm is as follows:

When line N with markup is shown,
    show line N+1 without markup
After line N is finished, continue showing line N+1
When line N+1 is due to show,
     finish showing unmarked line N+1
     show line N+1 with markup

Here is the song “Here Comes the Sun” with lyrics:

Here comes the sun
doo doo doo doo
Here comes the sun
I said it's alright

The resultant ASS file should look like this:

Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0100,,{kf16}Here {kf46}comes {kf43}the {kf67}sun
Dialogue: 0,0:00:18.22,0:00:20.19,Default,,0000,0000,0000,,doo doo doo doo
Dialogue: 0,0:00:20.19,0:00:21.75,Default,,0000,0000,0000,,{kf17}doo {kf25}doo {kf21}doo {kf92}doo
Dialogue: 0,0:00:20.19,0:00:22.16,Default,,0000,0000,0100,,Here comes the sun
Dialogue: 0,0:00:22.16,0:00:24.20,Default,,0000,0000,0100,,{kf17}Here {kf46}comes {kf43}the {kf97}sun
Dialogue: 0,0:00:22.16,0:00:24.61,Default,,0000,0000,0000,,I said it's alright

Figure 26-5 shows what it looks like.

A435426_1_En_26_Fig5_HTML.jpg
Figure 26-5. Multiline subtitles

libass

SubStation Alpha and its renderers appear to have been through a complex history. According to “The old and present: VSFilter” ( http://blog.aegisub.org/2010/02/old-and-present-vsfilter.html ), the ASS format was finalized in about 2004, and the renderer VSFilter was made open source at that time. However, around 2007 development of VSFilter ceased, and several forks were made. These introduced several extensions to the format, such as the blur tag by Aegisub. Some of these forks since merged, some were abandoned, and for some of these forks there is still code in the wild.

libass ( http://code.google.com/p/libass/ ) is the main rendering library for Linux. An alternative, xy-vsfilter, claims to be faster, more reliable, and so on, but does not seem to have a Linux implementation. libass supports some of the later extensions. These seem to be the Aegisub 2008 extensions, according to “VSFilter hacks” ( http://blog.aegisub.org/2008/07/vsfilter-hacks.html ).

Converting KAR Files to MKV Files with ASS Subtitles

Follow these steps:

  1. To pull out the lyrics from a KAR or MIDI file, use the Java DumpSequence given in Chapter 18, as follows, to get a dump of all events:

    java DumpSequence  song.kar  > song.dump
  2. For line-only display , use the following Python script generated by Aegisub 2.1.9 to extract the lyrics and save them in ASS format:

    #!/usr/bin/python

    import fileinput
    import string
    import math


    TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,Karaoke,"

    textStr = TEXT_STR
    startTime = -1
    endTime = -1


    def printPreface():
        print '[Script Info]
    ; Script generated by Aegisub 2.1.9
    ; http://www.aegisub.org/
    Title: Default Aegisub file
    ScriptType: v4.00+
    WrapStyle: 0
    PlayResX: 640
    PlayResY: 480
    ScaledBorderAndShadow: yes
    Video Aspect Ratio: 0
    Video Zoom: 6
    Video Position: 0

    [V4+ Styles]
    Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
    Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1

    [Events]
    Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'


    def timeFormat(s):
        global microSecondsPerTick


        tf = float(s)
        tf /= 62.6  #ticks per sec


        # This should be right , but is too slow
        #tf = (tf * microSecondsPerTick) / 1000000


        t = int(math.floor(tf))
        hundredths = round((tf-t)*100)
        secs = t % 60
        t /= 60
        mins = t % 60
        t /= 60
        hrs = t
        return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)


    def doLyric(words):
        global textStr
        global startTime
        global endTime
        global TEXT_STR


        if words[1] == "0:":
            #print "skipping"
            return


        time = string.rstrip(words[1], ':')
        if startTime == -1:
            startTime = time
        #print words[1],
        if len(words) == 5:
            if words[4][0] == '' or words[4][0] == '/':
                #print "My name is %s and weight is %d kg!" % ('Zara', 21)
                #print startTime, endTime
                print textStr % (timeFormat(startTime), timeFormat(endTime)) + " ",
                textStr = TEXT_STR + words[4][:1]
                startTime = -1
            else:
                textStr += words[4]
        else:
            textStr += ' '


        endTime = time

    printPreface()

    for line in fileinput.input():
        words = line.split()


        if len(words)  >= 2:
            if words[0] == "Resolution:":
                ticksPerBeat = words[1]
            elif words[0] == "Length:":
                numTicks = int(words[1])
            elif words[0] == "Duration:":
                duration = int(words[1])
                microSecondsPerTick = duration/numTicks
                # print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)


        if len(words) >= 3 and words[2] == "Text":
            doLyric(words)

    Here’s an example:

    python lyric2ass4kar.py song.dump > song.ass
  3. For fill lyrics display , use the following Python script to extract the lyrics and save them in ASS format:

     #!/usr/bin/python

    import fileinput
    import string
    import math


    TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,,"

    textStr = "{kf%d}"
    plainTextStr = ""
    startTime = -1
    startWordTime = -1
    endTime = -1


    def printPreface():
        print '[Script Info]
    ; Script generated by Aegisub 2.1.9
    ; http://www.aegisub.org/
    Title: Default Aegisub file
    ScriptType: v4.00+
    WrapStyle: 0
    PlayResX: 640
    PlayResY: 480
    ScaledBorderAndShadow: yes
    Video Aspect Ratio: 0
    Video Zoom: 6
    Video Position: 0

    [V4+ Styles]
    Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
    Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1

    [Events]
    Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'


    def timeFormat(s):
        global microSecondsPerTick


        tf = float(s)

        # frames per sec should be 60: 120 beats/min, 30 ticks per beat
        # but it is too slow on 54154
        tf /= 62.6  #ticks per sec


        # This should be right , but is too slow
        # tf = (tf * microSecondsPerTick) / 1000000


        t = int(math.floor(tf))
        hundredths = round((tf-t)*100)
        secs = t % 60
        t /= 60
        mins = t % 60
        t /= 60
        hrs = t
        return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)


    def durat(end, start):
        fend = float(end)
        fstart = float(start)
        d = (fend - fstart) / 62.9
        #print end, start, d
        return round(d*100)


    def doLyric(words):
        global textStr
        global plainTextStr
        global startTime
        global endTime
        global TEXT_STR
        global startWordTime
        global lineNum


        if words[1] == "0:":
            #print "skipping"
            return


        time = string.rstrip(words[1], ':')
        if startTime == -1:
            startTime = time
            startWordTime = time
            previousEndTime = time
        #print words[1],
        if len(words) == 5:
            if words[4][0] == '' or words[4][0] == '/':
                #print "My name is %s and weight is %d kg!" % ('Zara', 21)
                #print startTime, endTime
                dur = durat(time, startWordTime)
                textStr = textStr % (dur)
                if len(words[4]) == 1:
                    print TEXT_STR % (timeFormat(startTime),
                                      timeFormat(endTime)) +
                                      textStr + " ",


                # next word
                textStr = "{kf%d}" + words[4][1:]
                startTime = -1
            else:
                textStr += words[4]
        else:
            # it's a space, gets lost by the split
            dur = durat(time, startWordTime)
            textStr = textStr % (dur) + " {kf%d}"
            startWordTime = time


        endTime = time

    printPreface()
    # print "Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,{k16}Here {k46}comes {k43}the {k67}sun"


    for line in fileinput.input():
        words = line.split()


        if len(words)  >= 2:
            if words[0] == "Resolution:":
                ticksPerBeat = words[1]
            elif words[0] == "Length:":
                numTicks = int(words[1])
            elif words[0] == "Duration:":
                duration = int(words[1])
                microSecondsPerTick = duration/numTicks
                # print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)


        if len(words) >= 3 and words[2] == "Text":
            doLyric(words)

    Here’s an example:

    python lyric2karaokeass4kar.py song.dump > song.ass
  4. For multiline lyrics display , use the following Python script to extract the lyrics and save them in ASS format:

     #!/usr/bin/python

    import fileinput
    import string
    import math


    START_EVENTS = ["Dialogue: 0,%s,%s,Default,,0000,0000,0000,,",
                    "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"]


    TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,,"
    TEXT_STR2 = "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"


    textStr = "{kf%d}"
    plainTextStr = ""
    startTime = -1
    previousStartTime = -1
    startWordTime = -1
    endTime = -1
    previousEndTime = -1
    lineNum = 0


    def printPreface():
        print '[Script Info]
    ; Script generated by Aegisub 2.1.9
    ; http://www.aegisub.org/
    Title: Default Aegisub file
    ScriptType: v4.00+
    WrapStyle: 0
    PlayResX: 640
    PlayResY: 480
    ScaledBorderAndShadow: yes
    Video Aspect Ratio: 0
    Video Zoom: 6
    Video Position: 0

    [V4+ Styles]
    Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
    Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1

    [Events]
    Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'


    def timeFormat(s):
        global microSecondsPerTick


        tf = float(s)
        # print "factori is %f instead of %f" % ((1.0*microSecondsPerTick / 1000000), (1.0/62.9))
        # frames per sec should be 60: 120 beats/min, 30 ticks per beat
        # but it is too slow on 54154
        tf /= 62.6  #ticks per sec


        # This should be right , but is too slow
        # tf = (tf * microSecondsPerTick) / 1000000


        t = int(math.floor(tf))
        hundredths = round((tf-t)*100)
        secs = t % 60
        t /= 60#!/usr/bin/python


    import fileinput
    import string
    import math


    START_EVENTS = ["Dialogue: 0,%s,%s,Default,,0000,0000,0000,,",
                    "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"]


    TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,,"
    TEXT_STR2 = "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"


    textStr = "{kf%d}"
    plainTextStr = ""
    startTime = -1
    previousStartTime = -1
    startWordTime = -1
    endTime = -1
    previousEndTime = -1
    lineNum = 0


    def printPreface():
        print '[Script Info]
    ; Script generated by Aegisub 2.1.9
    ; http://www.aegisub.org/
    Title: Default Aegisub file
    ScriptType: v4.00+
    WrapStyle: 0
    PlayResX: 640
    PlayResY: 480
    ScaledBorderAndShadow: yes
    Video Aspect Ratio: 0
    Video Zoom: 6
    Video Position: 0

    [V4+ Styles]
    Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
    Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1

    [Events]
    Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'


    def timeFormat(s):
        global microSecondsPerTick


        tf = float(s)
        # print "factori is %f instead of %f" % ((1.0*microSecondsPerTick / 1000000), (1.0/62.9))
        # frames per sec should be 60: 120 beats/min, 30 ticks per beat
        # but it is too slow on 54154
        tf /= 62.6  #ticks per sec


        # This should be right , but is too slow
        # tf = (tf * microSecondsPerTick) / 1000000


        t = int(math.floor(tf))
        hundredths = round((tf-t)*100)
        secs = t % 60
        t /= 60
        mins = t % 60
        t /= 60
        hrs = t
        return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)


    def durat(end, start):
        fend = float(end)
        fstart = float(start)
        d = (fend - fstart) / 62.9
        #print end, start, d
        return round(d*100)


    def doLyric(words):
        global textStr
        global plainTextStr
        global startTime
        global endTime
        global previousStartTime
        global previousEndTime
        global TEXT_STR
        global startWordTime
        global lineNum


        if words[1] == "0:":
            #print "skipping"
            return


        time = string.rstrip(words[1], ':')
        if startTime == -1:
            startTime = time
            startWordTime = time
            previousEndTime = time
        #print words[1],
        if len(words) == 5:
            if words[4][0] == '' or words[4][0] == '/':
                #print "My name is %s and weight is %d kg!" % ('Zara', 21)
                #print startTime, endTime
                dur = durat(time, startWordTime)
                textStr = textStr % (dur)


                if len(words[4]) == 1:

                    if previousStartTime != -1:
                        print START_EVENTS[lineNum % 2] % (timeFormat(previousStartTime),
                                                           timeFormat(previousEndTime)) +
                                                           plainTextStr + " ",
                    print START_EVENTS[lineNum % 2] % (timeFormat(startTime),
                                                       timeFormat(endTime)) +
                                                       textStr + " ",


                # next word
                lineNum += 1
                #previousEndTime = time
                textStr = "{kf%d}" + words[4][1:]
                plainTextStr = words[4][1:]
                previousStartTime = startTime
                startTime = -1
            else:
                textStr += words[4]
                plainTextStr += words[4]
        else:
            #print textStr
            #dur = duration(time, startWordTime)
            dur = durat(time, startWordTime)
            textStr = textStr % (dur) + " {kf%d}"
            plainTextStr += ' '
            startWordTime = time


        endTime = time

    printPreface()
    # print "Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,{k16}Here {k46}comes {k43}the {k67}sun"


    for line in fileinput.input():
        words = line.split()


        if len(words)  >= 2:
            if words[0] == "Resolution:":
                ticksPerBeat = words[1]
            elif words[0] == "Length:":
                numTicks = int(words[1])
            elif words[0] == "Duration:":
                duration = int(words[1])
                microSecondsPerTick = duration/numTicks
                # print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)


        if len(words) >= 3 and words[2] == "Text":
            doLyric(words)
        mins = t % 60
        t /= 60
        hrs = t
        return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)


    def durat(end, start):
        fend = float(end)
        fstart = float(start)
        d = (fend - fstart) / 62.9
        #print end, start, d
        return round(d*100)


    def doLyric(words):
        global textStr
        global plainTextStr
        global startTime
        global endTime
        global previousStartTime
        global previousEndTime
        global TEXT_STR
        global startWordTime
        global lineNum


        if words[1] == "0:":
            #print "skipping"
            return


        time = string.rstrip(words[1], ':')
        if startTime == -1:
            startTime = time
            startWordTime = time
            previousEndTime = time
        #print words[1],
        if len(words) == 5:
            if words[4][0] == '' or words[4][0] == '/':
                #print "My name is %s and weight is %d kg!" % ('Zara', 21)
                #print startTime, endTime
                dur = durat(time, startWordTime)
                textStr = textStr % (dur)


                if len(words[4]) == 1:

                    if previousStartTime != -1:
                        print START_EVENTS[lineNum % 2] % (timeFormat(previousStartTime),
                                                           timeFormat(previousEndTime)) +
                                                           plainTextStr + " ",
                    print START_EVENTS[lineNum % 2] % (timeFormat(startTime),
                                                       timeFormat(endTime)) +
                                                       textStr + " ",


                # next word
                lineNum += 1
                #previousEndTime = time
                textStr = "{kf%d}" + words[4][1:]
                plainTextStr = words[4][1:]
                previousStartTime = startTime
                startTime = -1
            else:
                textStr += words[4]
                plainTextStr += words[4]
        else:
            #print textStr
            #dur = duration(time, startWordTime)
            dur = durat(time, startWordTime)
            textStr = textStr % (dur) + " {kf%d}"
            plainTextStr += ' '
            startWordTime = time


        endTime = time

    printPreface()
    # print "Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,{k16}Here {k46}comes {k43}the {k67}sun"


    for line in fileinput.input():
        words = line.split()


        if len(words)  >= 2:
            if words[0] == "Resolution:":
                ticksPerBeat = words[1]
            elif words[0] == "Length:":
                numTicks = int(words[1])
            elif words[0] == "Duration:":
                duration = int(words[1])
                microSecondsPerTick = duration/numTicks
                # print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)


        if len(words) >= 3 and words[2] == "Text":
            doLyric(words)

    Here is an example:

    python lyric2karaokeass4kar.py song.dump > song.ass
  5. Convert the MIDI sound file to a WAV file using fluidsynth.

    fluidsynth -F song.wav /usr/share/sounds/sf2/FluidR3_GM.sf2 song.kar
  6. Convert the WAV file to MP3.

    lame song.wav song.mp3
  7. Find a suitable video-only file for your background (I used one off my karaoke discs) and then merge them into an MKV file.

    mkvmerge -o 54154.mkv 54154.mp3 54154.ass BACK01.MPG

The resultant MKV file can then be played as a stand-alone file by MPlayer.

mplayer song.mkv

It can also be played by VLC, but only with the ASS file present.

vlc song.mkv

Screen captures were shown earlier in the chapter, depending on the karaoke effect chosen.

Timing is, however, an issue. The default MIDI tempo is 120 beats per minute, and a common tick rate is 30 ticks per beat. This leads to a rate of 60 MIDI ticks per second. However, you are now playing MP3 files and ASS files, neither of which are MIDI files anymore and which are not necessarily synchronized. With a rate of 60 ticks per second in converting from MIDI to ASS, the lyrics run too slowly. Experimentally I have found 62.9 to be a reasonable rate for at least some files.

HTML5 Subtitles

HTML5 has support for video types, although exactly what video format is supported by which brower is variable. This includes support for subtitles and closed captions, using the HTML 5.1 track element. A search will turn up several detailed articles discussing this in more detail.

You need to prepare a file of timing and text instructions. The format shown in examples is as a .vtt file and can be as follows:

WEBVTT

1
00:00:01.000 --> 00:00:30.000  D:vertical A:start
This is the first line of text, displaying from 1-30 seconds


2
00:00:35.000 --> 00:00:50.000
And the second line of text
separated over two lines from 35 to 50 seconds

Here the first line is WEBVTT, and blocks of text are separated by blank lines. The format of VTT files is specified at “WebVTT: The Web Video Text Tracks Format” ( http://dev.w3.org/html5/webvtt/ ).

The HTML then references the audio/video files and the subtitles file as follows:

    <video  controls>
      <source src="output.webm" controls>
      <track src="54154.vtt" kind="subtitles" srclang="en" label="English" default />
      <!-- fallback for rubbish browsers -->
    </video>

Figure 26-6 shows a screen capture.

A435426_1_En_26_Fig6_HTML.jpg
Figure 26-6. HTML5 subtitles

There does not seem to be any mechanism for highlighting words progressively in a line. Possibly JavaScript may be able to do so, but after a cursory look, it doesn’t seem likely. This makes it not yet suitable for karaoke.

Conclusion

This chapter discussed methods for overlaying subtitle text onto a changing video image. It is feasible, but there are only a few viable mechanisms.

Footnotes

1 Rigorously, subtitles refer to what is spoken, while closed captions may include other sounds such as doors slamming. For karaoke, there is no need to distinguish them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset