Invalid UTF-8 Sitemap contains CO or C1 control codes

I came across “Invalid UTF-8 Sitemap contains CO or C1 control codes” in WMT tools the other day. The sitemaps had recenlty been changed from a normal www.sitename.com/sitemap.xml to www.sitename.com/sitemap.xml.gz (the compressed version).

I looked through the uncompressed version of the sitemap and didn’t see anything out of the ordinary characters-wise. When I ran it through some UTF-8 checks it came back as valid, so I was at a loss as to what was causing the error.

What do you do when you need some answers? Google it! The only post I found with that error message had 0 replies and it wasn’t posted by someone having a PHP problem. After broadening my search a little bit I came across this useful stackoverflow post that alluded it was because fopen/fclose were appending  characters that were not being encoded. Switching to gzopen, gzwrite, and gzclose worked perfect for me. gzwrite is binary-safe, so best guess would be that fwrite was including a PHP_EOL in the file that was resulting in the error.

$xml = '<!--?xml version="1.0" encoding="UTF-8"?-->
<urlset
              xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
                    http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
	<loc>http://www.sitename.com/</loc>
	<lastmod>2014-03-14T09:39:11-04:00</lastmod>
	<changefreq>daily</changefreq>
	<priority>1.0</priority>
</url>
</urlset>
';
$gz = gzopen('sitemap.xml.gz','w9');
gzwrite($gz, $xml);
gzclose($gz);

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>