Bug 1227316 - gettext-runtime: msgcat fails on certain files
Summary: gettext-runtime: msgcat fails on certain files
Status: IN_PROGRESS
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Leap 15.6
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Stanislav Brabec
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-03 01:19 UTC by Stanislav Brabec
Modified: 2024-07-17 08:03 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
sbrabec: needinfo? (meissner)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stanislav Brabec 2024-07-03 01:19:02 UTC
While trying to concatenate two po files, I got:
msgcat: memory exhausted

Debugging in gdb, I realized that one of the failing po files has a non standard header that contains extra empty line:
msgid ""
msgstr ""
"Project-Id-Version: @PACKAGE@\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2021-09-16 17:13+0200\n"
"PO-Revision-Date: 2005-07-29 15:37+0530\n"
"Last-Translator: Priyavert Sharma<priyavert.sharma@agreeya.com>\n"
"Language-Team: AgreeYa Solutions <linux_team@agreeya.com>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n!=1);\n"
"\n"

The bug is triggered in msgl-header.c: message_list_header_list()

       while (*h != '\0')
          {
            char *enh = strchr (h, ':');
            enh++;
            char * msgid = (char *)XNMALLOC (((enh - h) + 1), char);

If the line does not contain ":", then enh is zero, and (enh - h) + 1 is evaluated to an absurd number.

This is not an upstream bug. The smart header merge feature is introduced by:
0002-msgcat-Merge-headers-when-use-first.patch
Comment 1 Stanislav Brabec 2024-07-03 11:06:02 UTC
Breakpoint 5, message_list_header_list (mlp=<optimized out>) at msgl-header.c:386
386	            char * msgid = (char *)XNMALLOC (((enh - h) + 1), char);
5: enh = 0x5555555622ef " 8bit\nPlural-Forms: nplurals=2; plural=(n!=1);\n\n"
6: h = 0x5555555622d5 "Content-Transfer-Encoding: 8bit\nPlural-Forms: nplurals=2; plural=(n!=1);\n\n"
7: ((enh - h) + 1) = 27
(gdb) 
Continuing.

Breakpoint 5, message_list_header_list (mlp=<optimized out>) at msgl-header.c:386
386	            char * msgid = (char *)XNMALLOC (((enh - h) + 1), char);
5: enh = 0x555555562302 " nplurals=2; plural=(n!=1);\n\n"
6: h = 0x5555555622f5 "Plural-Forms: nplurals=2; plural=(n!=1);\n\n"
7: ((enh - h) + 1) = 14
(gdb) n
387	            memcpy (msgid, h, enh - h);
5: enh = 0x555555562302 " nplurals=2; plural=(n!=1);\n\n"
6: h = 0x5555555622f5 "Plural-Forms: nplurals=2; plural=(n!=1);\n\n"
7: ((enh - h) + 1) = 14
(gdb) 
389	            (msgid)[enh-h] = '\0';
5: enh = 0x555555562302 " nplurals=2; plural=(n!=1);\n\n"
6: h = <optimized out>
7: ((enh - h) + 1) = <error: value has been optimized out>
(gdb) 
392	            enh = strchr (h, '\n');
5: enh = 0x555555562302 " nplurals=2; plural=(n!=1);\n\n"
6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n"
7: ((enh - h) + 1) = 0
(gdb) 
393	            if (enh != NULL)
5: enh = 0x55555556231d "\n\n"
6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n"
7: ((enh - h) + 1) = 27
(gdb) 
395	                char * msgstr = (char *)XNMALLOC (((enh - h) + 1), char);
5: enh = 0x55555556231d "\n\n"
6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n"
7: ((enh - h) + 1) = 27
(gdb) 
396	                memcpy (msgstr, h, enh - h);
5: enh = 0x55555556231d "\n\n"
6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n"
7: ((enh - h) + 1) = 27
(gdb) 
29	  return __builtin___memcpy_chk (__dest, __src, __len,
5: enh = 0x55555556231d "\n\n"
6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n"
7: ((enh - h) + 1) = 27
(gdb) 
396	                memcpy (msgstr, h, enh - h);
5: enh = 0x55555556231d "\n\n"
6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n"
7: ((enh - h) + 1) = 27
(gdb) 
398	                msgstr[enh-h] = '\0';
5: enh = 0x55555556231d "\n\n"
6: h = <optimized out>
7: ((enh - h) + 1) = <error: value has been optimized out>
(gdb) 
399	                lex_pos_ty pos = {NULL, ctr++};
5: enh = 0x55555556231d "\n\n"
6: h = <optimized out>
7: ((enh - h) + 1) = <error: value has been optimized out>
(gdb) 
400	                message_list_append (header, message_alloc (NULL, msgid, NULL, msgstr, enh - h, &pos));
5: enh = 0x55555556231d "\n\n"
6: h = <optimized out>
7: ((enh - h) + 1) = <error: value has been optimized out>
(gdb) 
382	        while (*h != '\0')
6: h = 0x55555556231e "\n"
(gdb) 
384	            char *enh = strchr (h, ':');
5: enh = <optimized out>
6: h = 0x55555556231e "\n"
7: ((enh - h) + 1) = <error: value has been optimized out>
(gdb) 

Breakpoint 5, message_list_header_list (mlp=<optimized out>) at msgl-header.c:386
386	            char * msgid = (char *)XNMALLOC (((enh - h) + 1), char);
5: enh = 0x1 <error: Cannot access memory at address 0x1>
6: h = 0x55555556231e "\n"
7: ((enh - h) + 1) = -93824992289564

And in the next moment, this will be called:
XNMALLOC (-93824992289564, char)

And it fails.
Comment 3 Stanislav Brabec 2024-07-03 11:56:15 UTC
Looking back into the history, these two feature patches were never upstreamed:
0001-msgcat-Add-feature-to-use-the-newest-po-file.patch
0002-msgcat-Merge-headers-when-use-first.patch

However the upstream would be happy to include them:
https://lists.gnu.org/archive/html/bug-gettext/2019-10/msg00006.html

It is waiting forgotten for 5 years for the copyright assignment to the FSF from Markéta Machová (Calábková in the time of writing this patch).
Comment 5 Marcus Meissner 2024-07-03 12:59:44 UTC
i currently cannot envision gettext msgcat getting untrusted input, it is used during build where also source code is involved.

so i would currently consider it a regular bug.
Comment 6 Stanislav Brabec 2024-07-03 17:24:49 UTC
Marcus Meissner: msgcat is a general tool, and it could be used for processing of uploaded files on translation servers. Hopefully, rubygem-gettext is more popular for this purpose.
Comment 7 Stanislav Brabec 2024-07-04 02:16:53 UTC
I did a deep review of the header merge code, and it is affected by more bugs:
If ":" is missing on the last line, it crashes.
If ":" is missing on on one of previous lines, everything until the next ":" is considered as tag name (including "\n", and it results invalid header with possibly duplicated entries.
If the last line of the second file misses "\n" at the end, the tag is completely lost and does not appear in the output file.

I am working on fixes.