How to stop non-UTF-8 characters from breaking your Wordpress feeds

If you have problems with non-UTF-8 characters breaking your feeds in Wordpress (ie. breaking XML parsers), one solution is to attach a filter to the the_excerpt_rss() function and stripping or converting the characters. I’m guessing the errant off-character characters (ahem) are the result of promiscuous copypasting.

  1. Grab Jason Judge’s self-contained function for limiting to valid UTF-8 characters (here’s a link to the source).
  2. Paste it into your theme’s functions.php file.
  3. Also add the following lines:
    function the_excerpt_rss_utf8($text) {
    return trim(clean_utf8_xml_string($text));
    }
    add_filter('the_excerpt_rss', 'the_excerpt_rss_utf8');

---

There is 1 other entry posted on this day.