Opened 9 years ago
Closed 9 years ago
#2284 closed enhancement (worksforme)
Is it possible to override or ignore the “–subcp” parameter internally for Unicode subtitles with BOM?
Reported by: | Hakan | Owned by: | beastd |
---|---|---|---|
Priority: | normal | Component: | undetermined |
Version: | 1.3 | Severity: | normal |
Keywords: | susbtile, encoding, subcp | Cc: | |
Blocked By: | Blocking: | ||
Reproduced by developer: | no | Analyzed by developer: | no |
Description
I try to detect user language codepage from keyboard or system and use it as preferred default codepage setting for mplayer.
On my machine (win7 64 eng) default codepage is CP1254 and it works most of the time with Turkish ansi subtitles, and does not work Unicode subtitles (with or without BOM)
I have to change default codepage to UTF8 back and forth, so is it possible to override or ignore the “–subcp” for Unicode subtitles with BOM?
Mplayer version: MPlayer-generic-r37653+g674cc26
So far I have tried;
1- Removed –subcp parameter: It works for Unicode subtitles but not for ANSI ones (this is OK, we must tell codepage of ansi subtitle to mplayer).
2- Added –utf8 and –Unicode, removed -subcp: works for Unicode subtitles but not for ANSI ones. (this is OK, we must tell codepage of ansi subtitle to mplayer)
3- –subcp Enca:tr:cp1254: fallbacks to cp1254, problem with Unicode file. (I know that enca does not support my language (TR))
4- –subcp Enca:none:cp1254: fallbacks to cp1254 problem with Unicode file.
5- –subcp Enca:be:cp1254: fallbacks to cp1254 problem with Unicode file.
When I do not use the “-subcp” mplayer can handle Unicode subtitle files but cannot display Turkish ANSI subtitles properly.
When I use –subcp cp1254, it works for ansi subtitles but not work for Unicode subtitles. Because According to the documentation “–subcp” “takes priority over both −utf8 and −unicode”.
So, is there a way to ignore –subcp parameter for Unicode subtitles with BOM identifier? Because it is obvious that we can detect the encoding of subtitle from BOM and –subcp should not take priority over detected encoding.
Or what I should do?
Change History (3)
follow-up: 2 comment:1 by , 9 years ago
comment:2 by , 9 years ago
Replying to reimar:
enca not detecting UTF-8 is a problem and should probably be considered a bug.
However as to the BOM, that is non-standard and not really allowed for UTF-8 (considering that it stands for byte-order-marking that should make it quite obvious why it is nonsense for UTF-8), which is part of the reason why we don't check it.
There is also no guarantee the sequence cannot appear at file start for other encodings, thus overriding -subcp in some cases might leave people without an option to force MPlayer to do the right thing.
Unfortunately I think the simple fact is that enca got stuck in the past and supporting e.g. uchardet would be a much better solution.
However your specific issue should work just fine with the following:-subcp enca:__:cp1254
The "be" you tried will not detect UTF-8 and "none" is not supported by the library interface MPlayer uses. Well, it is supported, but the "name" is two underscores.
Thank you very much for your solution. It works great even detects Unicode files without BOM with
MPlayer Redxii-SVN-r37734-4.9.3 (i686) (C) 2000-2016 MPlayer Team (I am going to use this Redxii builds)
But your solution does not work with Mplayer latest windows builds (http://oss.netfarm.it/mplayer/)
1-Generic build (Intel 486 or better MPlayer sherpya-r37653+g674cc26-5.3.1 (C) 2000-2016 MPlayer Team).
2-MPlayer sherpya-r37653+g674cc26-5.3.1 (C) 2000-2016 MPlayer Team
Can you please suggest link of an official Windows builds? Thank you.
comment:3 by , 9 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
enca not detecting UTF-8 is a problem and should probably be considered a bug.
However as to the BOM, that is non-standard and not really allowed for UTF-8 (considering that it stands for byte-order-marking that should make it quite obvious why it is nonsense for UTF-8), which is part of the reason why we don't check it.
There is also no guarantee the sequence cannot appear at file start for other encodings, thus overriding -subcp in some cases might leave people without an option to force MPlayer to do the right thing.
Unfortunately I think the simple fact is that enca got stuck in the past and supporting e.g. uchardet would be a much better solution.
However your specific issue should work just fine with the following:
-subcp enca:__:cp1254
The "be" you tried will not detect UTF-8 and "none" is not supported by the library interface MPlayer uses. Well, it is supported, but the "name" is two underscores.