Opened 9 years ago
#2246 new enhancement
Using uchardet for encoding detection
Reported by: | Jehan | Owned by: | beastd |
---|---|---|---|
Priority: | normal | Component: | libass |
Version: | unspecified | Severity: | normal |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Reproduced by developer: | no | Analyzed by developer: | no |
Description
mpv/ffmpeg have a very limited encoding detection based on ENCA (basically latin/cyrillic/chinese only). So when you pass for instance a subtitle file in Japanese/Korean not using UTF-8 (from experience, maybe about half of them? UTF-8 gains weight but still isn't the only used encoding in many areas), it shows garbled text and you have to specify the encoding (meaning you have to know which it is, which mostly is done through trial-and-error). See enca --list languages
to see the list of supported encoding by enca, hence by mplayer/ffmpeg.
There are a few ports in various languages based on Mozilla firefox algorithm. A C binding is uchardet: https://github.com/BYVoid/uchardet
mpv, the mplayer fork, now uses "uchardet" as default language detection ("enca" is still available as alternative but is not default anymore).
See: https://github.com/mpv-player/mpv/issues/908
and: https://github.com/mpv-player/mpv/pull/2193
I believe mplayer's encoding detection is only in libass (otherwise wrong component, I guess)? Could mplayer/ffmpeg/libass also add a support to uchardet in order to improve support for Asian languages?
This would be great.