<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Bounty: PHP internationalisation ascii / UTF-8 problem</title>
	<atom:link href="http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/</link>
	<description>Share what you know, Learn what you don't.  This is the evolution of one Afrikan.</description>
	<pubDate>Fri, 09 Jan 2009 23:17:47 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Tom Barta</title>
		<link>http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-81923</link>
		<dc:creator>Tom Barta</dc:creator>
		<pubDate>Tue, 22 May 2007 21:12:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-81923</guid>
		<description>This comment may be far too late, but I just stumbled upon this blog entry looking around for info about securing against SQL injection with a UTF-8-encoded database.

Your clients may want a short URL, but you (and they) should be aware of all of the possible ramifications:

If the search form POSTs and then the POSTed page displays output:

1) The browser's back button will not work properly: If they back up through a search page, the user will get a warning message asking to re-submit a form.

2) HTTP-level caching will not work (either in the browser or in an intermediate HTTP cache such as Squid), since caches can't operate on POSTdata.

3) Search results cannot be bookmarked.  Not sure if this is a problem for you or not, but in general query-based form results should be bookmarkable.

If the search form POSTs, then the search results are stored somewhere intermediate, and the POSTed page redirects to something like "searchresults.php?searchid=13242&#38;page=1":

1) The URL is more likely to be acceptable to your clients.  You can also play with URL rewriting to get something like "/searchresults/13242/1".

2) The browser's back button will operate as expected

3) The URL will probably expire at some point, unless you keep search results around forever.  This will still prevent users from bookmarking search results, and will interfere with HTTP-level caching.

If the search form GETs instead of POSTs:

1) Your clients have to deal with a long URL

2) Your clients can bookmark search results indefinitely

3) The browser's back button will operate as expected

4) Intermediate caches (as well as your server) will include query terms in their logfiles.  For example, forms containing sensitive information like social security numbers or credit card numbers should always be POSTed, to prevent a disgrunted admin from hunting through logs for identity theft.  Having search terms in logfiles may be bad (if they're sensitive searches) or good (if you can mine popular topics from logs and improve your services with that knowledge).

Anyway, this is a bit over the top and not really what you were asking for in your blog, but hopefully you, your other readers, or your clients will find something useful in it.</description>
		<content:encoded><![CDATA[<p>This comment may be far too late, but I just stumbled upon this blog entry looking around for info about securing against SQL injection with a UTF-8-encoded database.</p>
<p>Your clients may want a short URL, but you (and they) should be aware of all of the possible ramifications:</p>
<p>If the search form POSTs and then the POSTed page displays output:</p>
<p>1) The browser&#8217;s back button will not work properly: If they back up through a search page, the user will get a warning message asking to re-submit a form.</p>
<p>2) HTTP-level caching will not work (either in the browser or in an intermediate HTTP cache such as Squid), since caches can&#8217;t operate on POSTdata.</p>
<p>3) Search results cannot be bookmarked.  Not sure if this is a problem for you or not, but in general query-based form results should be bookmarkable.</p>
<p>If the search form POSTs, then the search results are stored somewhere intermediate, and the POSTed page redirects to something like &#8220;searchresults.php?searchid=13242&amp;page=1&#8243;:</p>
<p>1) The URL is more likely to be acceptable to your clients.  You can also play with URL rewriting to get something like &#8220;/searchresults/13242/1&#8243;.</p>
<p>2) The browser&#8217;s back button will operate as expected</p>
<p>3) The URL will probably expire at some point, unless you keep search results around forever.  This will still prevent users from bookmarking search results, and will interfere with HTTP-level caching.</p>
<p>If the search form GETs instead of POSTs:</p>
<p>1) Your clients have to deal with a long URL</p>
<p>2) Your clients can bookmark search results indefinitely</p>
<p>3) The browser&#8217;s back button will operate as expected</p>
<p>4) Intermediate caches (as well as your server) will include query terms in their logfiles.  For example, forms containing sensitive information like social security numbers or credit card numbers should always be POSTed, to prevent a disgrunted admin from hunting through logs for identity theft.  Having search terms in logfiles may be bad (if they&#8217;re sensitive searches) or good (if you can mine popular topics from logs and improve your services with that knowledge).</p>
<p>Anyway, this is a bit over the top and not really what you were asking for in your blog, but hopefully you, your other readers, or your clients will find something useful in it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gareth Knight</title>
		<link>http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-43997</link>
		<dc:creator>Gareth Knight</dc:creator>
		<pubDate>Mon, 02 Oct 2006 09:29:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-43997</guid>
		<description>@Jeff:

Thanks for the reply ;-)

Yup - we asked the hosting company to do a check while we were seeing the problems, and then when we came back to the problem it seemed to be the same problem consistent accross both boxes, so I guess they either fixed it or made the appropriate changes to make things consisten... Will drop them a mail to ask what it was they did.</description>
		<content:encoded><![CDATA[<p>@Jeff:</p>
<p>Thanks for the reply <img src='http://www.oneafrikan.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /><br />
Yup - we asked the hosting company to do a check while we were seeing the problems, and then when we came back to the problem it seemed to be the same problem consistent accross both boxes, so I guess they either fixed it or made the appropriate changes to make things consisten&#8230; Will drop them a mail to ask what it was they did.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gareth Knight</title>
		<link>http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-43996</link>
		<dc:creator>Gareth Knight</dc:creator>
		<pubDate>Mon, 02 Oct 2006 09:24:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-43996</guid>
		<description>@Demien:

First, thanks for your reply dude ;-)

The bounty: I didn't have a fixed idea as to how much I wanted to offer, but did have a max figure that I wouldn't have gone higher than.  I was kinda hoping to either tie it to an hourly/daily rate; or just throw a figure out there and see what happened.

I'm not sure about never submitting a search form with POST...  I can see why it would be a bad idea, but in this case 
a) the client didn't want to see ugly or long url's 
b) we'd inherited a system from someone else and had to work within those boundaries; and 
c) we're only really doing select statements from the DB using predetermined dropdowns (I know that someone skilled enough probably could try a SQL injection attack, but we've tried to cover that by fixing it so that we're only returning results if the request comes from the same server).

What makes you think we've used htmlentities()?

Thanks for the heads up regards the html and php charset stuff ;-)

Yup - somewhere along the line, the live server was spitting out correct cyrillic, but the dev server was spitting out ascii, so that was kinda weird - so you might have seen ascii in the select statement if you'd viewed the source.

However, we tackled this issue late last week, and when we came back to the dev server, it appeared that both the dev and live servers are now spitting out the correct cyrillic (no more ascii as it was doing), which made the problem easier to solve 'cos we had a dev server that replicated the problem on the live server.</description>
		<content:encoded><![CDATA[<p>@Demien:</p>
<p>First, thanks for your reply dude <img src='http://www.oneafrikan.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /><br />
The bounty: I didn&#8217;t have a fixed idea as to how much I wanted to offer, but did have a max figure that I wouldn&#8217;t have gone higher than.  I was kinda hoping to either tie it to an hourly/daily rate; or just throw a figure out there and see what happened.</p>
<p>I&#8217;m not sure about never submitting a search form with POST&#8230;  I can see why it would be a bad idea, but in this case<br />
a) the client didn&#8217;t want to see ugly or long url&#8217;s<br />
b) we&#8217;d inherited a system from someone else and had to work within those boundaries; and<br />
c) we&#8217;re only really doing select statements from the DB using predetermined dropdowns (I know that someone skilled enough probably could try a SQL injection attack, but we&#8217;ve tried to cover that by fixing it so that we&#8217;re only returning results if the request comes from the same server).</p>
<p>What makes you think we&#8217;ve used htmlentities()?</p>
<p>Thanks for the heads up regards the html and php charset stuff <img src='http://www.oneafrikan.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /><br />
Yup - somewhere along the line, the live server was spitting out correct cyrillic, but the dev server was spitting out ascii, so that was kinda weird - so you might have seen ascii in the select statement if you&#8217;d viewed the source.</p>
<p>However, we tackled this issue late last week, and when we came back to the dev server, it appeared that both the dev and live servers are now spitting out the correct cyrillic (no more ascii as it was doing), which made the problem easier to solve &#8216;cos we had a dev server that replicated the problem on the live server.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff</title>
		<link>http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-41397</link>
		<dc:creator>Jeff</dc:creator>
		<pubDate>Thu, 14 Sep 2006 21:48:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-41397</guid>
		<description>You should never submit a search form with a post?  That's new to me -- if you're doing any .net develpment, any form element runat="server" with an event handler will post back using a post (function __doPostBack(eventTarget, eventArgument)).  

I think http://trac.seagullproject.org/wiki/Standards/CorrectUseOfGetAndPost
is saying simply: protected yourself against sql injection attacks by not processing any GET data directly against your DB that might be malicious, which is certainly a good practice, whether it's a post or a get.  
The link off your link ( http://www.cs.tut.fi/~jkorpela/forms/methods.html ) seems to be saying the same thing: "one should normally use METHOD="POST" if and only if the form submission may cause changes." 

I don't see the difference between the pages either (FF and IE).  I'd start by diffing the php.ini's, any .htaccess, and httpd.conf's between the boxes once you're sure the webroots are identical.</description>
		<content:encoded><![CDATA[<p>You should never submit a search form with a post?  That&#8217;s new to me &#8212; if you&#8217;re doing any .net develpment, any form element runat=&#8221;server&#8221; with an event handler will post back using a post (function __doPostBack(eventTarget, eventArgument)).  </p>
<p>I think <a href="http://trac.seagullproject.org/wiki/Standards/CorrectUseOfGetAndPost" rel="nofollow">http://trac.seagullproject.org/wiki/Standards/CorrectUseOfGetAndPost</a><br />
is saying simply: protected yourself against sql injection attacks by not processing any GET data directly against your DB that might be malicious, which is certainly a good practice, whether it&#8217;s a post or a get.<br />
The link off your link ( <a href="http://www.cs.tut.fi/~jkorpela/forms/methods.html" rel="nofollow">http://www.cs.tut.fi/~jkorpela/forms/methods.html</a> ) seems to be saying the same thing: &#8220;one should normally use METHOD=&#8221;POST&#8221; if and only if the form submission may cause changes.&#8221; </p>
<p>I don&#8217;t see the difference between the pages either (FF and IE).  I&#8217;d start by diffing the php.ini&#8217;s, any .htaccess, and httpd.conf&#8217;s between the boxes once you&#8217;re sure the webroots are identical.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gareth Knight</title>
		<link>http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-41330</link>
		<dc:creator>Gareth Knight</dc:creator>
		<pubDate>Thu, 14 Sep 2006 15:42:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-41330</guid>
		<description>Thanks Demian ;-)

Have to dash off now (to Germany), so will get back to this asap!</description>
		<content:encoded><![CDATA[<p>Thanks Demian <img src='http://www.oneafrikan.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /><br />
Have to dash off now (to Germany), so will get back to this asap!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Demian Turner</title>
		<link>http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-41301</link>
		<dc:creator>Demian Turner</dc:creator>
		<pubDate>Thu, 14 Sep 2006 11:28:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.oneafrikan.com/archives/2006/09/14/bounty-php-internationalisation-ascii-utf-8-problem/#comment-41301</guid>
		<description>Hi Gareth - you don't mention how much the bounty is?

There are a few problems with this site/page, in order of graveness:

 - you should never submit a search form with POST, (see http://trac.seagullproject.org/wiki/Standards/CorrectUseOfGetAndPost)

 - if you correctly set the character encoding you should not *also* run the output through htmlentities() as you've done

 - in the html head you correctly set the charset to utf-8, but you should also use the PHP header function to set the equivalent for the Content-type header

 - modern browsers send x-www-form-urlencoded data to the server in the CHARSET that was determined to be that of the *form*, however that determination was made

I don't actually see the difference between the two pages you list, except that only the dev example shows the first SELECT query above the search form.  Both the 2nd queries sucessfully use Cyrilic text in the WHERE statement (i'm using FF).</description>
		<content:encoded><![CDATA[<p>Hi Gareth - you don&#8217;t mention how much the bounty is?</p>
<p>There are a few problems with this site/page, in order of graveness:</p>
<p> - you should never submit a search form with POST, (see <a href="http://trac.seagullproject.org/wiki/Standards/CorrectUseOfGetAndPost" rel="nofollow">http://trac.seagullproject.org/wiki/Standards/CorrectUseOfGetAndPost</a>)</p>
<p> - if you correctly set the character encoding you should not *also* run the output through htmlentities() as you&#8217;ve done</p>
<p> - in the html head you correctly set the charset to utf-8, but you should also use the PHP header function to set the equivalent for the Content-type header</p>
<p> - modern browsers send x-www-form-urlencoded data to the server in the CHARSET that was determined to be that of the *form*, however that determination was made</p>
<p>I don&#8217;t actually see the difference between the two pages you list, except that only the dev example shows the first SELECT query above the search form.  Both the 2nd queries sucessfully use Cyrilic text in the WHERE statement (i&#8217;m using FF).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
