I came across “Invalid UTF-8 Sitemap contains CO or C1 control codes” in WMT tools the other day. The sitemaps had recenlty been changed from a normal www.sitename.com/sitemap.xml to www.sitename.com/sitemap.xml.gz (the compressed version).
I looked through the uncompressed version of the sitemap and didn’t see anything out of the ordinary characters-wise. When I ran it through some UTF-8 checks it came back as valid, so I was at a loss as to what was causing the error.
What do you do when you need some answers? Google it! The only post I found with that error message had 0 replies and it wasn’t posted by someone having a PHP problem. After broadening my search a little bit I came across this useful stackoverflow post that alluded it was because fopen/fclose were appending characters that were not being encoded. Switching to gzopen, gzwrite, and gzclose worked perfect for me. gzwrite is binary-safe, so best guess would be that fwrite was including a PHP_EOL in the file that was resulting in the error.
$xml = '<!--?xml version="1.0" encoding="UTF-8"?--> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> <url> <loc>http://www.sitename.com/</loc> <lastmod>2014-03-14T09:39:11-04:00</lastmod> <changefreq>daily</changefreq> <priority>1.0</priority> </url> </urlset> '; $gz = gzopen('sitemap.xml.gz','w9'); gzwrite($gz, $xml); gzclose($gz);