Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.2.3

Bug #12490 wiki's do not display since php5-5.2.5 upgrade but are editable.
Submitted: 2007-11-20 20:26 UTC
From: eculp Assigned: justinpatrin
Status: Closed Package: Text_Wiki (version CVS)
PHP Version: 5.2.5 OS: FreeBSD
Roadmaps: 1.3.0    
Subscription  


 [2007-11-20 20:26 UTC] eculp (Edwin Culp)
Description: ------------ wiki's do not display since php5-5.2.5 upgrade but are editable. Using the Horde Wicked WIKI with the pear modules Text_Wiki 1.2.0 stable Text_Wiki_Cowiki 0.0.2 alpha Text_Wiki_Creole 1.0.0 stable Text_Wiki_Mediawiki 0.1.0 alpha Text_Wiki_Tiki 0.0.1 alpha Seems to be related to: "Fixed htmlentities/htmlspecialchars not to accept partial multibyte sequences." in php5-5.2.5 Thanks, ed Test script: --------------- see example at: http://intranet.encontacto.net/wicked/display.php?page=TestPage Shows blank but click on edit and you will see the text.

Comments

 [2007-12-01 16:48 UTC] noovea (Guillaume Lecanu)
I have the same problem under Ubuntu since my last upgrade. Can you make a patch quickly please ? In ubuntu / gutsy, this happend in upgrading version 5.2.3-1ubuntu6 to 5.2.3-1ubuntu6.1
 [2007-12-02 05:16 UTC] justinpatrin (Justin Patrin)
Well...Text_Wiki doesn't do editing, only rendering. On my machine with Gentoo's PHP 5.2.5 I'm seeing wiki rendering work just fine. Are you sure this isn't a problem with some custom rules you are using? http://pear.reversefold.com/text_wiki_sample.php
 [2007-12-02 14:47 UTC] eculp (Edwin Culp)
> [2007-12-02 05:16 UTC] justinpatrin (Justin Patrin) > > Well...Text_Wiki doesn't do editing, only rendering. > > On my machine with Gentoo's PHP 5.2.5 I'm seeing wiki rendering work > just fine. Are you sure this isn't a problem with some custom rules you > are using? I'm not really sure, all is possible ;) I'm not the only one that has seen this and it doesn't seem to be OS specific. Anyway my problem is rendering that afaik is done by Text_Wiki. Is anyone else seeing this behavior.
 [2007-12-02 20:41 UTC] justinpatrin (Justin Patrin)
Guillaume Lecanu has confirmed that external rules are causing his problem. Please check with either the maintainer of the rules or application you are using, this does not appear to be a problem with Text_Wiki. If you add a comment with the maintainer of the rules or application I'll see if I can contact them as well.
 [2007-12-03 17:21 UTC] yunosh (Jan Schneider)
It has nothing to do with external rules. The htmlspecialchars() call in Text_Wiki_Render_Xhtml::textEncode() is the culprit. If it's called with: $type = HTML_SPECIALCHARS $quotes = ENT_COMPAT $charset = 'UTF-8' the result string is empty. No error message even without the silence operator.
 [2007-12-06 08:23 UTC] noovea (Guillaume Lecanu)
Finally it's not my rules the problem, The problem is this line : $wiki->setFormatConf('Xhtml', 'charset', 'utf-8'); Before the upgrade this doesn't make problem. After the upgrade this return nothing.
 [2007-12-07 06:27 UTC] saeven (Alexandre Lemaire)
I can confirm the bug. A stock Text_Wiki works just fine on PHP 5.2.4, however on PHP 5.2.5 the handling of utf-8 charsets causes a miserable failure, where the textEncode method returns blank text. I'll add, that we can confirm this behavior, on a great number of systems. Lines 72-76 of Xhtml.php seem to be the culprit: $text = htmlentities( $text, $quotes, $charset ); Checking the ordinal value of each character however, it would appear that it is not being handled in utf-8 at all. I found that by adding a call to mb_convert_encoding directly above the lines above, results in proper operation.
 [2007-12-07 06:32 UTC] saeven (Alexandre Lemaire)
// convert the entities. we silence the call here so that // errors about charsets don't pop up, per counsel from // Jan at Horde. (http://pear.php.net/bugs/bug.php?id=4474) $text = mb_convert_encoding( $text, $charset ); $text = htmlentities( $text, $quotes, $charset );
 [2007-12-07 06:36 UTC] saeven (Alexandre Lemaire)
(cont'd) But as you can see, the delimiter will still exist in the output block, so I suspect this may be a charset issue?
 [2007-12-07 21:01 UTC] justinpatrin (Justin Patrin)
Perhaps one of you could try mb_convert_encoding() on your entire text before sending it into Text_Wiki.
 [2007-12-08 13:34 UTC] surfchen (Surf Chen)
"Fixed htmlentities/htmlspecialchars not to accept partial multibyte sequences." on php-5.2.5 cause this problem. In php-5.2.5,if there is any invalid or partial multibyte-char passed into html**** ,the tow functions return a empty string. Let me expain what is invalid multi-char. for example,below is a unicode and utf-8 table:(First field is the unicode range,and the 2nd is utf-8 struct(in binary) in this range) 0000-007F | 0xxxxxxx 0080-07FF | 110xxxxx 10xxxxxx 0800-FFFF | 1110xxxx 10xxxxxx 10xxxxxx 10000-10FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx It means ,if a char begin with 0,110,1110 or 11110,it is consider as the first byte char of a multibyte-char.If this char is 110xxxxx,the next one must be 10xxxxxx,otherwise the html*** functions will just return a empty string. Thinking the code below: $text='a'.chr(220).chr(127).'b'; $text=htmlentities($text,ENT_COMPAT,'UTF-8'); var_dump($text); Then,what is going on if chr(127) change to chr(129)? Now,you should know what cause this Text_Wiki problem. var $delim = "\xFF" is what the xxxxing. If $wiki->setFormatConf('Xhtml', 'charset', 'utf-8') ,the htmlentities($text,ENT_COMPAT,'UTF-8') will be used in Text_Wiki_Render_Xhtml::textEncode($text).Because "\xFF" is a invalid utf-8 byte and the $text include this delim, so the result is a empty string. Solution for end user is specifying a $delim,like $wiki->delim = 'wsiukrifchen'. Solution for pear developers is changing the default $delim.
 [2007-12-08 14:43 UTC] surfchen (Surf Chen)
I found that $delim must be one byte because some regex use it as a range pattern.So chaning to a string like "wsiukrifchen" will cause other problem. Unfortunately,in utf-8 multichars,only the first 127 ones(0001-007F) are one byte.Which one is the most unique?So hard to choose..
 [2007-12-08 15:58 UTC] saeven (Alexandre Lemaire)
I agree that a multibyte delimiter would be most ideal.
 [2007-12-10 13:52 UTC] yunosh (Jan Schneider)
Since Text_Wiki is processing text, most control characters from the ascii table should be fine. I usually use NUL as a delimiter.
 [2007-12-10 13:59 UTC] yunosh (Jan Schneider)
NUL doesn't work because it's considered a string terminator by PCRE. Thanks to the delimiter being a public property of Text_Wiki::, this is the workaround that we successfully use in Wicked now: /* Create format-specific Text_Wiki object */ $class = 'Text_Wiki_' . $format; require_once 'Text/Wiki/' . $format . '.php'; $this->_proc = new $class(); /* Use a non-printable delimiter character that is still a valid * UTF-8 character. See * http://pear.php.net/bugs/bug.php?id=12490. */ $this->_proc->delim = chr(1);
 [2007-12-10 15:46 UTC] surfchen (Surf Chen)
Currently I use $wiki->delim = chr(31); to solve this problem.
 [2007-12-11 00:25 UTC] justinpatrin (Justin Patrin)
Please confirm that this fixes the issue for both utf-8 and non-utf-8 wiki text on PHP5.2.5 and I'll make the change in CVS and make an emergency release: $wiki->delim = chr(31);
 [2007-12-11 01:54 UTC] saeven (Alexandre Lemaire)
I can vouch for 31 on this end. :)
 [2007-12-11 02:09 UTC] surfchen (Surf Chen)
Use a char <= 127 as $delim will fix this issue if the charset passed to html**** is compatible with ansi.And it seems that currently all charsets that html**** supports is ansi-compatible. But once a day if html**** support utf-16,this $delim problem will come again.Because utf-16 is not ansi-compatible,a utf-16 char always tow-byte. As many regex use $delim as a range pattern in packages which relate to Text_Wiki,using tow-byte delim will cause some other problems. Whatever,in php-5.2.5, chr(31)or others <=127 is better than chr(255). And before php-5.2.5,chr(31) is not worse than chr(255).
 [2007-12-17 16:04 UTC] justinpatrin (Justin Patrin)
This bug has been fixed in CVS. If this was a documentation problem, the fix will appear on pear.php.net by the end of next Sunday (CET). If this was a problem with the pear.php.net website, the change should be live shortly. Otherwise, the fix will appear in the package's next release. Thank you for the report and for helping us make PEAR better.