Many karaoke systems use subtitles1 imposed over a movie of some kind. Programs like kmid and my Java programs play lyrics on some sort of canvas object. This gives a pretty boring background. Video CDs or MPEG-4 files have a nicer background but have the lyrics hard-coded onto the background video, so there is little chance for manipulation of them. CD+G files keep the lyrics separate from the video, but there doesn’t seem to be any way of playing them directly from Linux. They can be converted to MP3+G, and they can be played by VLC, which loads the MP3 file and picks up the corresponding .cdg file.
This chapter considers subtitles that can be created independently, combined with video and audio in some way, and then played. The current situation is not completely satisfactory.
Resources
Check out this resource :
“Subtitling with Linux Tutorial” ( http://sub.wordnerd.de/linux-subs.html )
Subtitle Formats
This chapter is concerned here with what are called soft subtitles, where the subtitles are stored in a separate file from the video or audio and are combined during rendering. The Wikipedia page “Subtitle (captioning)” ( http://en.wikipedia.org/wiki/Subtitle_(captioning )) is a long article going into many issues about subtitling. It also contains a list of subtitle formats, but the one that seems to be of most use in this context is SubStation Alpha.
MPlayer
According to the MPlayer page “Subtitles and OSD” ( www.mplayerhq.hu/DOCS/HTML/en/subosd.htm ), the following are the formats recognized by MPlayer:
VOBsub
OGM
CC (closed caption)
MicroDVD
SubRip
SubViewer
Sami
VPlayer
RT
SSA
PJS (Phoenix Japanimation Society)
MPsub
AQTitle
JACOsub
VLC
According to VLC ( www.videolan.org/vlc/features.php?cat=sub ), support under Linux includes the following subtitle formats:
DVD
Text files (MicroDVD, SubRIP, SubViewer, SSA1-5, SAMI, VPlayer)
Closed captions
Vobsub
Universal Subtitle Format (USF)
SVCD/CVD
DVB
OGM
CMML
Kate
If you play some sort of video file, say XYZ.mpg, and there is also a file with the same root name and appropriate extension such as XYZ.ass (the extension for SubStation Alpha), then VLC will automatically load the subtitles file and play it. If the subtitles file has a different name, then it can be loaded from the VLC menu Video ➤ Subtitles Track. However, this does not appear to be as reliable as sharing the name.
Gnome Subtitles
See “Gnome Subtitles 1.3 is out!” ( http://gnome-subtitles.sourceforge.net/ ). Gnome supports Adobe Encore DVD, Advanced Sub Station, Alpha AQ, Title DKS Subtitle Format FAB Subtitler Karaoke Lyrics LRC Karaoke Lyrics VKT MacSUB MicroDVD MPlayer MPlayer 2 MPSub Panimator Phoenix Japanimation Society Power DivX Sofni SubCreator 1.x SubRip Sub Station Alpha SubViewer 1.0, SubViewer 2.0, and ViPlay Subtitle File.
SubStation Alpha
The SSA/ASS specification is at MooDub.free ( http://moodub.free.fr/video/ass-specs.doc ). It is brief and appears to contain some minor errors with respect to later specifications and implementations. For example, the time format is different. Or are the later ones all wrong?
SSA/ASS files can be used stand-alone. They can also be included in container formats such as Matroska files, discussed briefly in Chapter 3. When they are embedded into MKV files, some restrictions ( www.matroska.org/technical/specs/subtitles/ssa.html ) are made, such as the text being converted into UTF-8 Unicode.
ASS files are divided into several sections.
General information about the environment the subtitle file expects, such as the X and Y resolutions
Style information such as colors and fonts
Event information, which is where the subtitle text is given along with timing information and any special effects to be applied
Under normal circumstances you would not directly create such files using a text editor. Instead, the program Aegisub gives you a GUI environment in which to create the files. Essentially, you just enter the text lines, plus the start and end times for each line to be displayed.
Figure 26-1 shows a screen dump.
Figure 26-1. Aegisub screenshot
Many special effects are possible. The video on Bill Cresswell's blog ( https://billcreswell.wordpress.com/tag/aegisub/ ) is an excellent example. Here is the direct YouTube link: www.youtube.com/watch?v=0Z0dgdglrAo .
For completeness, here is part of an ASS file I created:
[Script Info]
; Script generated by Aegisub 2.1.9
; http://www.aegisub.org/
Title: Default Aegisub file
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 640
PlayResY: 480
ScaledBorderAndShadow: yes
Video Aspect Ratio: 0
Video Zoom: 6
Video Position: 0
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,20,&H00FFFFFF,&H00B4FCFC,&H00000008,&H80000008,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,Here comes the sun
Dialogue: 0,0:00:20.19,0:00:21.75,Default,,0000,0000,0000,,doo doo doo doo
Dialogue: 0,0:00:22.16,0:00:24.20,Default,,0000,0000,0000,,Here comes the sun
Dialogue: 0,0:00:24.61,0:00:28.24,Default,,0000,0000,0000,,I said it's alright
...
Karaoke Effects in ASS Files
A line in an ASS file essentially consists of a time to start the display, a time to finish the display, and the text itself. However, karaoke users are accustomed to the text being highlighted as it is played.
ASS supports two major highlight styles.
Words are highlighted one at a time.
The text is highlighted by filling from the left.
These effects are done by embedding “karaoke overrides” into the text. These are in {} with a duration time in hundredths of a second.
The details are as follows:
Word highlighting
An override of the form {k<time>} will highlight the following word for time hundredths of a second. An example is as follows:
{k100}Here {k150}comes {k50}the {k150}sun
Fill highlighting
An override of the form {kf<time>} will progressively fill up the following word for time hundredths of a second. An example is as follows:
{kf100}Here {kf150}comes {kf50}the {kf150}sun
The three styles appear as follows:
Lines with no highlighting (see Figure 26-2)
Figure 26-2. Subtitles without highlighting
Word highlighting (see Figure 26-3)
Figure 26-3. Subtitles with word highlighting
Fill highlighting (see Figure 26-4)
Figure 26-4. Subtitles with fill highlighting
Multiline Karaoke
Ideally, a karaoke system should have a “look-ahead” mechanism whereby you can see the next line before having to sing it. This can be done by showing two lines of text with overlapping times at different heights. The algorithm is as follows:
When line N with markup is shown,
show line N+1 without markup
After line N is finished, continue showing line N+1
When line N+1 is due to show,
finish showing unmarked line N+1
show line N+1 with markup
Here is the song “Here Comes the Sun” with lyrics:
Here comes the sun
doo doo doo doo
Here comes the sun
I said it's alright
The resultant ASS file should look like this:
Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0100,,{kf16}Here {kf46}comes {kf43}the {kf67}sun
Dialogue: 0,0:00:18.22,0:00:20.19,Default,,0000,0000,0000,,doo doo doo doo
Dialogue: 0,0:00:20.19,0:00:21.75,Default,,0000,0000,0000,,{kf17}doo {kf25}doo {kf21}doo {kf92}doo
Dialogue: 0,0:00:20.19,0:00:22.16,Default,,0000,0000,0100,,Here comes the sun
Dialogue: 0,0:00:22.16,0:00:24.20,Default,,0000,0000,0100,,{kf17}Here {kf46}comes {kf43}the {kf97}sun
Dialogue: 0,0:00:22.16,0:00:24.61,Default,,0000,0000,0000,,I said it's alright
Figure 26-5 shows what it looks like.
Figure 26-5. Multiline subtitles
libass
SubStation Alpha and its renderers appear to have been through a complex history. According to “The old and present: VSFilter” ( http://blog.aegisub.org/2010/02/old-and-present-vsfilter.html ), the ASS format was finalized in about 2004, and the renderer VSFilter was made open source at that time. However, around 2007 development of VSFilter ceased, and several forks were made. These introduced several extensions to the format, such as the blur tag by Aegisub. Some of these forks since merged, some were abandoned, and for some of these forks there is still code in the wild.
libass ( http://code.google.com/p/libass/ ) is the main rendering library for Linux. An alternative, xy-vsfilter, claims to be faster, more reliable, and so on, but does not seem to have a Linux implementation. libass supports some of the later extensions. These seem to be the Aegisub 2008 extensions, according to “VSFilter hacks” ( http://blog.aegisub.org/2008/07/vsfilter-hacks.html ).
Converting KAR Files to MKV Files with ASS Subtitles
Follow these steps:
To pull out the lyrics from a KAR or MIDI file, use the Java DumpSequence given in Chapter 18, as follows, to get a dump of all events:
java DumpSequence song.kar > song.dump
For line-only display , use the following Python script generated by Aegisub 2.1.9 to extract the lyrics and save them in ASS format:
#!/usr/bin/python
import fileinput
import string
import math
TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,Karaoke,"
textStr = TEXT_STR
startTime = -1
endTime = -1
def printPreface():
print '[Script Info]
; Script generated by Aegisub 2.1.9
; http://www.aegisub.org/
Title: Default Aegisub file
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 640
PlayResY: 480
ScaledBorderAndShadow: yes
Video Aspect Ratio: 0
Video Zoom: 6
Video Position: 0
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'
def timeFormat(s):
global microSecondsPerTick
tf = float(s)
tf /= 62.6 #ticks per sec
# This should be right , but is too slow
#tf = (tf * microSecondsPerTick) / 1000000
t = int(math.floor(tf))
hundredths = round((tf-t)*100)
secs = t % 60
t /= 60
mins = t % 60
t /= 60
hrs = t
return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)
def doLyric(words):
global textStr
global startTime
global endTime
global TEXT_STR
if words[1] == "0:":
#print "skipping"
return
time = string.rstrip(words[1], ':')
if startTime == -1:
startTime = time
#print words[1],
if len(words) == 5:
if words[4][0] == '' or words[4][0] == '/':
#print "My name is %s and weight is %d kg!" % ('Zara', 21)
#print startTime, endTime
print textStr % (timeFormat(startTime), timeFormat(endTime)) + " ",
textStr = TEXT_STR + words[4][:1]
startTime = -1
else:
textStr += words[4]
else:
textStr += ' '
endTime = time
printPreface()
for line in fileinput.input():
words = line.split()
if len(words) >= 2:
if words[0] == "Resolution:":
ticksPerBeat = words[1]
elif words[0] == "Length:":
numTicks = int(words[1])
elif words[0] == "Duration:":
duration = int(words[1])
microSecondsPerTick = duration/numTicks
# print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)
if len(words) >= 3 and words[2] == "Text":
doLyric(words)
Here’s an example:
python lyric2ass4kar.py song.dump > song.ass
For fill lyrics display , use the following Python script to extract the lyrics and save them in ASS format:
#!/usr/bin/python
import fileinput
import string
import math
TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,,"
textStr = "{kf%d}"
plainTextStr = ""
startTime = -1
startWordTime = -1
endTime = -1
def printPreface():
print '[Script Info]
; Script generated by Aegisub 2.1.9
; http://www.aegisub.org/
Title: Default Aegisub file
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 640
PlayResY: 480
ScaledBorderAndShadow: yes
Video Aspect Ratio: 0
Video Zoom: 6
Video Position: 0
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'
def timeFormat(s):
global microSecondsPerTick
tf = float(s)
# frames per sec should be 60: 120 beats/min, 30 ticks per beat
# but it is too slow on 54154
tf /= 62.6 #ticks per sec
# This should be right , but is too slow
# tf = (tf * microSecondsPerTick) / 1000000
t = int(math.floor(tf))
hundredths = round((tf-t)*100)
secs = t % 60
t /= 60
mins = t % 60
t /= 60
hrs = t
return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)
def durat(end, start):
fend = float(end)
fstart = float(start)
d = (fend - fstart) / 62.9
#print end, start, d
return round(d*100)
def doLyric(words):
global textStr
global plainTextStr
global startTime
global endTime
global TEXT_STR
global startWordTime
global lineNum
if words[1] == "0:":
#print "skipping"
return
time = string.rstrip(words[1], ':')
if startTime == -1:
startTime = time
startWordTime = time
previousEndTime = time
#print words[1],
if len(words) == 5:
if words[4][0] == '' or words[4][0] == '/':
#print "My name is %s and weight is %d kg!" % ('Zara', 21)
#print startTime, endTime
dur = durat(time, startWordTime)
textStr = textStr % (dur)
if len(words[4]) == 1:
print TEXT_STR % (timeFormat(startTime),
timeFormat(endTime)) +
textStr + " ",
# next word
textStr = "{kf%d}" + words[4][1:]
startTime = -1
else:
textStr += words[4]
else:
# it's a space, gets lost by the split
dur = durat(time, startWordTime)
textStr = textStr % (dur) + " {kf%d}"
startWordTime = time
endTime = time
printPreface()
# print "Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,{k16}Here {k46}comes {k43}the {k67}sun"
for line in fileinput.input():
words = line.split()
if len(words) >= 2:
if words[0] == "Resolution:":
ticksPerBeat = words[1]
elif words[0] == "Length:":
numTicks = int(words[1])
elif words[0] == "Duration:":
duration = int(words[1])
microSecondsPerTick = duration/numTicks
# print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)
if len(words) >= 3 and words[2] == "Text":
doLyric(words)
Here’s an example:
python lyric2karaokeass4kar.py song.dump > song.ass
For multiline lyrics display , use the following Python script to extract the lyrics and save them in ASS format:
#!/usr/bin/python
import fileinput
import string
import math
START_EVENTS = ["Dialogue: 0,%s,%s,Default,,0000,0000,0000,,",
"Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"]
TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,,"
TEXT_STR2 = "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"
textStr = "{kf%d}"
plainTextStr = ""
startTime = -1
previousStartTime = -1
startWordTime = -1
endTime = -1
previousEndTime = -1
lineNum = 0
def printPreface():
print '[Script Info]
; Script generated by Aegisub 2.1.9
; http://www.aegisub.org/
Title: Default Aegisub file
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 640
PlayResY: 480
ScaledBorderAndShadow: yes
Video Aspect Ratio: 0
Video Zoom: 6
Video Position: 0
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'
def timeFormat(s):
global microSecondsPerTick
tf = float(s)
# print "factori is %f instead of %f" % ((1.0*microSecondsPerTick / 1000000), (1.0/62.9))
# frames per sec should be 60: 120 beats/min, 30 ticks per beat
# but it is too slow on 54154
tf /= 62.6 #ticks per sec
# This should be right , but is too slow
# tf = (tf * microSecondsPerTick) / 1000000
t = int(math.floor(tf))
hundredths = round((tf-t)*100)
secs = t % 60
t /= 60#!/usr/bin/python
import fileinput
import string
import math
START_EVENTS = ["Dialogue: 0,%s,%s,Default,,0000,0000,0000,,",
"Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"]
TEXT_STR = "Dialogue: 0,%s,%s,Default,,0000,0000,0000,,"
TEXT_STR2 = "Dialogue: 0,%s,%s,Default,,0000,0000,0100,,"
textStr = "{kf%d}"
plainTextStr = ""
startTime = -1
previousStartTime = -1
startWordTime = -1
endTime = -1
previousEndTime = -1
lineNum = 0
def printPreface():
print '[Script Info]
; Script generated by Aegisub 2.1.9
; http://www.aegisub.org/
Title: Default Aegisub file
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 640
PlayResY: 480
ScaledBorderAndShadow: yes
Video Aspect Ratio: 0
Video Zoom: 6
Video Position: 0
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,36,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text'
def timeFormat(s):
global microSecondsPerTick
tf = float(s)
# print "factori is %f instead of %f" % ((1.0*microSecondsPerTick / 1000000), (1.0/62.9))
# frames per sec should be 60: 120 beats/min, 30 ticks per beat
# but it is too slow on 54154
tf /= 62.6 #ticks per sec
# This should be right , but is too slow
# tf = (tf * microSecondsPerTick) / 1000000
t = int(math.floor(tf))
hundredths = round((tf-t)*100)
secs = t % 60
t /= 60
mins = t % 60
t /= 60
hrs = t
return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)
def durat(end, start):
fend = float(end)
fstart = float(start)
d = (fend - fstart) / 62.9
#print end, start, d
return round(d*100)
def doLyric(words):
global textStr
global plainTextStr
global startTime
global endTime
global previousStartTime
global previousEndTime
global TEXT_STR
global startWordTime
global lineNum
if words[1] == "0:":
#print "skipping"
return
time = string.rstrip(words[1], ':')
if startTime == -1:
startTime = time
startWordTime = time
previousEndTime = time
#print words[1],
if len(words) == 5:
if words[4][0] == '' or words[4][0] == '/':
#print "My name is %s and weight is %d kg!" % ('Zara', 21)
#print startTime, endTime
dur = durat(time, startWordTime)
textStr = textStr % (dur)
if len(words[4]) == 1:
if previousStartTime != -1:
print START_EVENTS[lineNum % 2] % (timeFormat(previousStartTime),
timeFormat(previousEndTime)) +
plainTextStr + " ",
print START_EVENTS[lineNum % 2] % (timeFormat(startTime),
timeFormat(endTime)) +
textStr + " ",
# next word
lineNum += 1
#previousEndTime = time
textStr = "{kf%d}" + words[4][1:]
plainTextStr = words[4][1:]
previousStartTime = startTime
startTime = -1
else:
textStr += words[4]
plainTextStr += words[4]
else:
#print textStr
#dur = duration(time, startWordTime)
dur = durat(time, startWordTime)
textStr = textStr % (dur) + " {kf%d}"
plainTextStr += ' '
startWordTime = time
endTime = time
printPreface()
# print "Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,{k16}Here {k46}comes {k43}the {k67}sun"
for line in fileinput.input():
words = line.split()
if len(words) >= 2:
if words[0] == "Resolution:":
ticksPerBeat = words[1]
elif words[0] == "Length:":
numTicks = int(words[1])
elif words[0] == "Duration:":
duration = int(words[1])
microSecondsPerTick = duration/numTicks
# print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)
if len(words) >= 3 and words[2] == "Text":
doLyric(words)
mins = t % 60
t /= 60
hrs = t
return "%01d:%02d:%02d.%02d" % (hrs, mins, secs, hundredths)
def durat(end, start):
fend = float(end)
fstart = float(start)
d = (fend - fstart) / 62.9
#print end, start, d
return round(d*100)
def doLyric(words):
global textStr
global plainTextStr
global startTime
global endTime
global previousStartTime
global previousEndTime
global TEXT_STR
global startWordTime
global lineNum
if words[1] == "0:":
#print "skipping"
return
time = string.rstrip(words[1], ':')
if startTime == -1:
startTime = time
startWordTime = time
previousEndTime = time
#print words[1],
if len(words) == 5:
if words[4][0] == '' or words[4][0] == '/':
#print "My name is %s and weight is %d kg!" % ('Zara', 21)
#print startTime, endTime
dur = durat(time, startWordTime)
textStr = textStr % (dur)
if len(words[4]) == 1:
if previousStartTime != -1:
print START_EVENTS[lineNum % 2] % (timeFormat(previousStartTime),
timeFormat(previousEndTime)) +
plainTextStr + " ",
print START_EVENTS[lineNum % 2] % (timeFormat(startTime),
timeFormat(endTime)) +
textStr + " ",
# next word
lineNum += 1
#previousEndTime = time
textStr = "{kf%d}" + words[4][1:]
plainTextStr = words[4][1:]
previousStartTime = startTime
startTime = -1
else:
textStr += words[4]
plainTextStr += words[4]
else:
#print textStr
#dur = duration(time, startWordTime)
dur = durat(time, startWordTime)
textStr = textStr % (dur) + " {kf%d}"
plainTextStr += ' '
startWordTime = time
endTime = time
printPreface()
# print "Dialogue: 0,0:00:18.22,0:00:19.94,Default,,0000,0000,0000,,{k16}Here {k46}comes {k43}the {k67}sun"
for line in fileinput.input():
words = line.split()
if len(words) >= 2:
if words[0] == "Resolution:":
ticksPerBeat = words[1]
elif words[0] == "Length:":
numTicks = int(words[1])
elif words[0] == "Duration:":
duration = int(words[1])
microSecondsPerTick = duration/numTicks
# print "Duration %d numTicks %d microSecondsPerTick %d" % (duration, numTicks, microSecondsPerTick)
if len(words) >= 3 and words[2] == "Text":
doLyric(words)
Here is an example:
python lyric2karaokeass4kar.py song.dump > song.ass
Convert the MIDI sound file to a WAV file using fluidsynth.
fluidsynth -F song.wav /usr/share/sounds/sf2/FluidR3_GM.sf2 song.kar
Convert the WAV file to MP3.
lame song.wav song.mp3
Find a suitable video-only file for your background (I used one off my karaoke discs) and then merge them into an MKV file.
mkvmerge -o 54154.mkv 54154.mp3 54154.ass BACK01.MPG
The resultant MKV file can then be played as a stand-alone file by MPlayer.
mplayer song.mkv
It can also be played by VLC, but only with the ASS file present.
vlc song.mkv
Screen captures were shown earlier in the chapter, depending on the karaoke effect chosen.
Timing is, however, an issue. The default MIDI tempo is 120 beats per minute, and a common tick rate is 30 ticks per beat. This leads to a rate of 60 MIDI ticks per second. However, you are now playing MP3 files and ASS files, neither of which are MIDI files anymore and which are not necessarily synchronized. With a rate of 60 ticks per second in converting from MIDI to ASS, the lyrics run too slowly. Experimentally I have found 62.9 to be a reasonable rate for at least some files.
HTML5 Subtitles
HTML5 has support for video types, although exactly what video format is supported by which brower is variable. This includes support for subtitles and closed captions, using the HTML 5.1 track element. A search will turn up several detailed articles discussing this in more detail.
You need to prepare a file of timing and text instructions. The format shown in examples is as a .vtt file and can be as follows:
WEBVTT
1
00:00:01.000 --> 00:00:30.000 D:vertical A:start
This is the first line of text, displaying from 1-30 seconds
2
00:00:35.000 --> 00:00:50.000
And the second line of text
separated over two lines from 35 to 50 seconds
Here the first line is WEBVTT, and blocks of text are separated by blank lines. The format of VTT files is specified at “WebVTT: The Web Video Text Tracks Format” ( http://dev.w3.org/html5/webvtt/ ).
The HTML then references the audio/video files and the subtitles file as follows:
<video controls>
<source src="output.webm" controls>
<track src="54154.vtt" kind="subtitles" srclang="en" label="English" default />
<!-- fallback for rubbish browsers -->
</video>
Figure 26-6 shows a screen capture.
Figure 26-6. HTML5 subtitles
There does not seem to be any mechanism for highlighting words progressively in a line. Possibly JavaScript may be able to do so, but after a cursory look, it doesn’t seem likely. This makes it not yet suitable for karaoke.
Conclusion
This chapter discussed methods for overlaying subtitle text onto a changing video image. It is feasible, but there are only a few viable mechanisms.
Footnotes
1 Rigorously, subtitles refer to what is spoken, while closed captions may include other sounds such as doors slamming. For karaoke, there is no need to distinguish them.