Opened 9 years ago

#2246 new enhancement

Using uchardet for encoding detection

Reported by: Jehan Owned by: beastd
Priority: normal Component: libass
Version: unspecified Severity: normal
Keywords: Cc:
Blocked By: Blocking:
Reproduced by developer: no Analyzed by developer: no

Description

mpv/ffmpeg have a very limited encoding detection based on ENCA (basically latin/cyrillic/chinese only). So when you pass for instance a subtitle file in Japanese/Korean not using UTF-8 (from experience, maybe about half of them? UTF-8 gains weight but still isn't the only used encoding in many areas), it shows garbled text and you have to specify the encoding (meaning you have to know which it is, which mostly is done through trial-and-error). See enca --list languages to see the list of supported encoding by enca, hence by mplayer/ffmpeg.

There are a few ports in various languages based on Mozilla firefox algorithm. A C binding is uchardet: https://github.com/BYVoid/uchardet

mpv, the mplayer fork, now uses "uchardet" as default language detection ("enca" is still available as alternative but is not default anymore).
See: https://github.com/mpv-player/mpv/issues/908
and: https://github.com/mpv-player/mpv/pull/2193

I believe mplayer's encoding detection is only in libass (otherwise wrong component, I guess)? Could mplayer/ffmpeg/libass also add a support to uchardet in order to improve support for Asian languages?
This would be great.

Change History (0)

Note: See TracTickets for help on using tickets.