Bugzilla – Bug 1227316
gettext-runtime: msgcat fails on certain files
Last modified: 2024-07-17 08:03:18 UTC
While trying to concatenate two po files, I got: msgcat: memory exhausted Debugging in gdb, I realized that one of the failing po files has a non standard header that contains extra empty line: msgid "" msgstr "" "Project-Id-Version: @PACKAGE@\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2021-09-16 17:13+0200\n" "PO-Revision-Date: 2005-07-29 15:37+0530\n" "Last-Translator: Priyavert Sharma<priyavert.sharma@agreeya.com>\n" "Language-Team: AgreeYa Solutions <linux_team@agreeya.com>\n" "Language: \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=(n!=1);\n" "\n" The bug is triggered in msgl-header.c: message_list_header_list() while (*h != '\0') { char *enh = strchr (h, ':'); enh++; char * msgid = (char *)XNMALLOC (((enh - h) + 1), char); If the line does not contain ":", then enh is zero, and (enh - h) + 1 is evaluated to an absurd number. This is not an upstream bug. The smart header merge feature is introduced by: 0002-msgcat-Merge-headers-when-use-first.patch
Breakpoint 5, message_list_header_list (mlp=<optimized out>) at msgl-header.c:386 386 char * msgid = (char *)XNMALLOC (((enh - h) + 1), char); 5: enh = 0x5555555622ef " 8bit\nPlural-Forms: nplurals=2; plural=(n!=1);\n\n" 6: h = 0x5555555622d5 "Content-Transfer-Encoding: 8bit\nPlural-Forms: nplurals=2; plural=(n!=1);\n\n" 7: ((enh - h) + 1) = 27 (gdb) Continuing. Breakpoint 5, message_list_header_list (mlp=<optimized out>) at msgl-header.c:386 386 char * msgid = (char *)XNMALLOC (((enh - h) + 1), char); 5: enh = 0x555555562302 " nplurals=2; plural=(n!=1);\n\n" 6: h = 0x5555555622f5 "Plural-Forms: nplurals=2; plural=(n!=1);\n\n" 7: ((enh - h) + 1) = 14 (gdb) n 387 memcpy (msgid, h, enh - h); 5: enh = 0x555555562302 " nplurals=2; plural=(n!=1);\n\n" 6: h = 0x5555555622f5 "Plural-Forms: nplurals=2; plural=(n!=1);\n\n" 7: ((enh - h) + 1) = 14 (gdb) 389 (msgid)[enh-h] = '\0'; 5: enh = 0x555555562302 " nplurals=2; plural=(n!=1);\n\n" 6: h = <optimized out> 7: ((enh - h) + 1) = <error: value has been optimized out> (gdb) 392 enh = strchr (h, '\n'); 5: enh = 0x555555562302 " nplurals=2; plural=(n!=1);\n\n" 6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n" 7: ((enh - h) + 1) = 0 (gdb) 393 if (enh != NULL) 5: enh = 0x55555556231d "\n\n" 6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n" 7: ((enh - h) + 1) = 27 (gdb) 395 char * msgstr = (char *)XNMALLOC (((enh - h) + 1), char); 5: enh = 0x55555556231d "\n\n" 6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n" 7: ((enh - h) + 1) = 27 (gdb) 396 memcpy (msgstr, h, enh - h); 5: enh = 0x55555556231d "\n\n" 6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n" 7: ((enh - h) + 1) = 27 (gdb) 29 return __builtin___memcpy_chk (__dest, __src, __len, 5: enh = 0x55555556231d "\n\n" 6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n" 7: ((enh - h) + 1) = 27 (gdb) 396 memcpy (msgstr, h, enh - h); 5: enh = 0x55555556231d "\n\n" 6: h = 0x555555562303 "nplurals=2; plural=(n!=1);\n\n" 7: ((enh - h) + 1) = 27 (gdb) 398 msgstr[enh-h] = '\0'; 5: enh = 0x55555556231d "\n\n" 6: h = <optimized out> 7: ((enh - h) + 1) = <error: value has been optimized out> (gdb) 399 lex_pos_ty pos = {NULL, ctr++}; 5: enh = 0x55555556231d "\n\n" 6: h = <optimized out> 7: ((enh - h) + 1) = <error: value has been optimized out> (gdb) 400 message_list_append (header, message_alloc (NULL, msgid, NULL, msgstr, enh - h, &pos)); 5: enh = 0x55555556231d "\n\n" 6: h = <optimized out> 7: ((enh - h) + 1) = <error: value has been optimized out> (gdb) 382 while (*h != '\0') 6: h = 0x55555556231e "\n" (gdb) 384 char *enh = strchr (h, ':'); 5: enh = <optimized out> 6: h = 0x55555556231e "\n" 7: ((enh - h) + 1) = <error: value has been optimized out> (gdb) Breakpoint 5, message_list_header_list (mlp=<optimized out>) at msgl-header.c:386 386 char * msgid = (char *)XNMALLOC (((enh - h) + 1), char); 5: enh = 0x1 <error: Cannot access memory at address 0x1> 6: h = 0x55555556231e "\n" 7: ((enh - h) + 1) = -93824992289564 And in the next moment, this will be called: XNMALLOC (-93824992289564, char) And it fails.
Looking back into the history, these two feature patches were never upstreamed: 0001-msgcat-Add-feature-to-use-the-newest-po-file.patch 0002-msgcat-Merge-headers-when-use-first.patch However the upstream would be happy to include them: https://lists.gnu.org/archive/html/bug-gettext/2019-10/msg00006.html It is waiting forgotten for 5 years for the copyright assignment to the FSF from Markéta Machová (Calábková in the time of writing this patch).
i currently cannot envision gettext msgcat getting untrusted input, it is used during build where also source code is involved. so i would currently consider it a regular bug.
Marcus Meissner: msgcat is a general tool, and it could be used for processing of uploaded files on translation servers. Hopefully, rubygem-gettext is more popular for this purpose.
I did a deep review of the header merge code, and it is affected by more bugs: If ":" is missing on the last line, it crashes. If ":" is missing on on one of previous lines, everything until the next ":" is considered as tag name (including "\n", and it results invalid header with possibly duplicated entries. If the last line of the second file misses "\n" at the end, the tag is completely lost and does not appear in the output file. I am working on fixes.