Opened 12 years ago

Last modified 10 years ago

#2040 new defect

UNICODE chars in subtitles show as garbage

Reported by: yuri@… Owned by: reimar
Priority: normal Component: core
Version: HEAD Severity: normal
Keywords: Cc: mdop@…
Blocked By: Blocking:
Reproduced by developer: no Analyzed by developer: no

Description

When I try using subtitles with some unicode (utf8), I only see garbage.
See attached subtitle file and screenshot.
Screenshot was obtained while playing with -font 'Andale Mono', and font 'Andale Mono' does have all required unicode chars.

MPlayer SVN-r34449-snapshot-4.6.3
mplayer-1.0.r20111218_2

Attachments (2)

[DivX - ITA] - Adriano Celentano - Asso.srt (70.2 KB ) - added by yuri@… 12 years ago.
sample subtitle file
mplayer-subtitle-unicode.png (38.7 KB ) - added by yuri@… 12 years ago.
screenshot

Download all attachments as: .zip

Change History (9)

by yuri@…, 12 years ago

sample subtitle file

by yuri@…, 12 years ago

screenshot

comment:1 by yuri@…, 12 years ago

comment:2 by reimar, 12 years ago

For standalone subs you always need to tell MPlayer the encoding.
In case of UTF-8 that means -utf8 option or renaming the subtitle file to have .utf8 as extension.

comment:3 by yuri@…, 12 years ago

So now mplayer essentially defaults to no encoding and shows whatever font has for those bytes.

In today's world, you can do what other programs do (vim, openoffice) and default to utf8 unless otherwise specified. You can assume most non-ascii bytes occurring in subtitles represent utf8.

comment:4 by yuri@…, 12 years ago

Actually, to be precise, openoffice asks for encoding when you open this file but asking dialog defaults to utf8. But chrome for ex opens it in utf8.

They might use some kind of encoding detector library. For example http://site.icu-project.org has such fuinction. This is another option for mplayer.

comment:5 by reimar, 12 years ago

First, to be precise MPlayer defaults to latin1. By my own very biased sample that is still most common.
Second, MPlayer already supports libenca which is supposed to do auto-detection, but as far as I can tell it's completely useless and doesn't work for anything.
Last, UTF-8 can be detected fairly reliably for external subs since we read in the whole file anyway. It's just some implementation effort and it doesn't seem like enough of an issue to spend much time on.

comment:6 by yuri@…, 12 years ago

I saw this problem on Italian subtitles too. I am sure it will show up on the newer Russian ones too. Overall, utf8 is/will become prevalent.

comment:7 by Marcel Dopita, 10 years ago

Analyzed by developer: unset
Cc: mdop@… added
Reproduced by developer: unset
Note: See TracTickets for help on using tickets.