Bug 1099508 - Myspell-bg_BG package has wrong CP1251 encoding content in Leap & Tumbleweed
Summary: Myspell-bg_BG package has wrong CP1251 encoding content in Leap & Tumbleweed
Status: RESOLVED UPSTREAM
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Other (show other bugs)
Version: Leap 15.0
Hardware: Other Other
: P5 - None : Minor (vote)
Target Milestone: ---
Assignee: Tomáš Chvátal
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-28 13:26 UTC by Forgotten User loYrf3hott
Modified: 2023-04-12 00:36 UTC (History)
0 users

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Forgotten User loYrf3hott 2018-06-28 13:26:34 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0
Build Identifier: 

The myspell-bg_BG package contents have been encoded in CP1251. The problem manifests when one tries for example to use gedit/Gnome3 editor for spellcheck. The spellcheck simply misbehaves, marking all correctly spelled Bulgarian words as errors.

Reproducible: Always

Steps to Reproduce:
1. Install myspell_bg_BG package;
2. Add 'Set Language' to Bulgarian in gedit/Gnome3;
3. Mark the 'Highlight Misspelled Words' checkbox in gedit/Tools;
4. Add a sentence to check in Bulgarian and an English sentence for control;
Actual Results:  
When 'Set Language' is set to English, only the words at the Bulgarian sentence are marked as misspelled - correct behavior.
When 'Set Language' is set to Bulgarian - all words from both languages are marked as misspelled - correct behavior for English text and incorrect behavior for the Bulgarian text.


In addition the contents of the English package of myspell-en_US are properly set to UTF-8 or ASCII, while the contents of the myspell-bg_BG package are CP1251 encoded.

$ head -2 /usr/share/hunspell/bg_BG.aff
SET microsoft-cp1251
TRY ������������������������������������������������������������

$ head -n 2 /usr/share/hunspell/bg_BG.dic
78238
��������


$ head -n 2 /usr/share/mythes/th_bg_BG_v2.dat
microsoft-cp1251
�|3

$ head -n 2 /usr/share/mythes/th_bg_BG_v2.idx
microsoft-cp1251
22889


I have found the next manual workaround currently to fix the functionality for spellcheck in gedit/Gnome3 after the install of the package myspell-bg_BG:

1. Convert the CP1251 to UTF-8 contents of the wrongly encoded files by: 

sudo iconv -f CP1251 -t UTF-8 /usr/share/hunspell/bg_BG.aff
sudo iconv -f CP1251 -t UTF-8 /usr/share/hunspell/bg_BG.dic
sudo iconv -f CP1251 -t UTF-8 /usr/share/mythes/th_bg_BG_v2.dat
sudo iconv -f CP1251 -t UTF-8 /usr/share/mythes/th_bg_BG_v2.idx

2. Replace the microsoft-cp1251 string with the UTF-8 string in the files above.

Please, review the problem and provide a kind of a permanent fix for the package myspell-bg_BG for future usage and updates.
Comment 1 Tomáš Chvátal 2018-06-28 13:57:37 UTC
It is quite easy they are in this encoding in the upstream repository:

https://cgit.freedesktop.org/libreoffice/dictionaries/tree/bg_BG

As you can see bulgarian does not see much of love:

https://cgit.freedesktop.org/libreoffice/dictionaries/log/bg_BG

How to provide upstream change is described at:

https://wiki.documentfoundation.org/Development/Dictionaries
Comment 2 Tomáš Chvátal 2018-06-29 10:10:02 UTC
https://gerrit.libreoffice.org/#/c/56674/

I've sent upstream the fix.

It will be included in our next dictionary updates.
Comment 3 Swamp Workflow Management 2018-07-27 09:00:19 UTC
This is an autogenerated message for OBS integration:
This bug (1099508) was mentioned in
https://build.opensuse.org/request/show/625716 Factory / myspell-dictionaries
Comment 5 Swamp Workflow Management 2018-09-24 13:15:03 UTC
SUSE-RU-2018:2826-1: An update that has two recommended fixes can now be installed.

Category: recommended (low)
Bug References: 1099508,1102294
CVE References: 
Sources used:
SUSE Linux Enterprise Workstation Extension 15 (src):    myspell-dictionaries-20180704-3.3.2
SUSE Linux Enterprise Module for Basesystem 15 (src):    myspell-dictionaries-20180704-3.3.2
Comment 6 Swamp Workflow Management 2018-09-24 13:17:09 UTC
SUSE-RU-2018:2829-1: An update that has two recommended fixes can now be installed.

Category: recommended (low)
Bug References: 1099508,1102294
CVE References: 
Sources used:
SUSE Linux Enterprise Workstation Extension 12-SP3 (src):    myspell-dictionaries-20180704-16.12.1
SUSE Linux Enterprise Desktop 12-SP3 (src):    myspell-dictionaries-20180704-16.12.1
Comment 7 Swamp Workflow Management 2018-09-27 22:27:04 UTC
openSUSE-RU-2018:2926-1: An update that has two recommended fixes can now be installed.

Category: recommended (low)
Bug References: 1099508,1102294
CVE References: 
Sources used:
openSUSE Leap 15.0 (src):    myspell-dictionaries-20180704-lp150.2.3.1
Comment 8 Swamp Workflow Management 2018-09-27 22:27:46 UTC
openSUSE-RU-2018:2927-1: An update that has two recommended fixes can now be installed.

Category: recommended (low)
Bug References: 1099508,1102294
CVE References: 
Sources used:
openSUSE Leap 42.3 (src):    myspell-dictionaries-20180704-10.1