<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>M. E. Patterson - Author, Geek &#187; sphinx</title>
	<atom:link href="http://mepatterson.net/tag/sphinx/feed/" rel="self" type="application/rss+xml" />
	<link>http://mepatterson.net</link>
	<description>bestselling author of Devil&#039;s Hand, a supernatural thriller; writer of fictions and web software</description>
	<lastBuildDate>Sun, 18 Dec 2011 23:19:07 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
	<!-- google_ad_section_end --><!-- google_ad_section_start -->	<item>
		<title>Mongosphinx with MongoDB and MongoMapper</title>
		<link>http://mepatterson.net/2010/01/mongosphinx-with-mongodb-and-mongomapper/</link>
		<comments>http://mepatterson.net/2010/01/mongosphinx-with-mongodb-and-mongomapper/#comments</comments>
		<pubDate>Wed, 20 Jan 2010 23:29:53 +0000</pubDate>
		<dc:creator>M. E. Patterson</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[sphinx]]></category>

		<guid isPermaLink="false">http://blog.digimonkey.com/?p=43</guid>
		<description><![CDATA[Can that title have the word &#8216;mongo&#8217; in it any more times? Well, fear not, I&#8217;m about to use it even more&#8230; So, I had to fool around a bit to get the so-called &#8220;Mongosphinx&#8221; gem working with my app architecture. Thought it might be helpful to others to demonstrate how I did it. I&#8217;ll [...]]]></description>
			<content:encoded><![CDATA[<p><em>Can that title have the word &#8216;mongo&#8217; in it any more times?  Well, fear not, I&#8217;m about to use it even more&#8230;</em></p>
<p>So, I had to fool around a bit to get the so-called &#8220;Mongosphinx&#8221; gem working with my app architecture.  Thought it might be helpful to others to demonstrate how I did it.  I&#8217;ll boil it down to a generic sort of implementation.  Hit the jump to see the whole bloody mess&#8230;<br />
<span id="more-43"></span></p>
<h2>app/models/document.rb</h2>
<div class="codecolorer-container ruby blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:300px;"><div class="ruby codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color:#9966CC; font-weight:bold;">class</span> Document<br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">include</span> <span style="color:#6666ff; font-weight:bold;">MongoMapper::Document</span><br />
&nbsp; key <span style="color:#ff3333; font-weight:bold;">:title</span>, &nbsp;<span style="color:#CC0066; font-weight:bold;">String</span><br />
&nbsp; key <span style="color:#ff3333; font-weight:bold;">:content</span>, &nbsp;<span style="color:#CC0066; font-weight:bold;">String</span><br />
&nbsp; timestamps!<br />
<br />
&nbsp; <span style="color:#008000; font-style:italic;">#cached for the sphinx indexer</span><br />
&nbsp; key <span style="color:#ff3333; font-weight:bold;">:sphinx_tags</span>, <span style="color:#CC0066; font-weight:bold;">String</span><br />
<br />
&nbsp; <span style="color:#008000; font-style:italic;"># for mongosphinx</span><br />
&nbsp; fulltext_index <span style="color:#ff3333; font-weight:bold;">:title</span>, <span style="color:#ff3333; font-weight:bold;">:content</span><br />
&nbsp; REINDEX_INTERVAL = <span style="color:#006666;">10</span>.<span style="color:#9900CC;">minutes</span><br />
&nbsp; INDEXED_FIELDS = <span style="color:#996600;">'_sphinx_id, title, content, sphinx_tags'</span><br />
&nbsp; before_save: cache_for_indexer<br />
&nbsp; after_save: reindex<br />
<br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">def</span> <span style="color:#0000FF; font-weight:bold;">self</span>.<span style="color:#9900CC;">search</span><span style="color:#006600; font-weight:bold;">&#40;</span>query<span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; &nbsp; <span style="color:#008000; font-style:italic;"># method returns a sphinx resultset object with its own each() method</span><br />
&nbsp; &nbsp; <span style="color:#008000; font-style:italic;"># iterate over that and pull each element out as an old-fashioned array of Documents</span><br />
&nbsp; &nbsp; by_fulltext_index<span style="color:#006600; font-weight:bold;">&#40;</span>query<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">each</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span><span style="color:#CC0066; font-weight:bold;">p</span><span style="color:#006600; font-weight:bold;">|</span> <span style="color:#CC0066; font-weight:bold;">p</span><span style="color:#006600; font-weight:bold;">&#125;</span><br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
<br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">def</span> <span style="color:#0000FF; font-weight:bold;">self</span>.<span style="color:#9900CC;">xml_for_sphinx_pipe</span><br />
&nbsp; &nbsp; <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#6666ff; font-weight:bold;">MongoSphinx::Indexer::XMLDocset</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>Document.<span style="color:#9900CC;">all</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:fields</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> INDEXED_FIELDS<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">to_s</span><br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
<br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">def</span> cache_for_indexer<br />
&nbsp; &nbsp; <span style="color:#0000FF; font-weight:bold;">self</span>.<span style="color:#9900CC;">sphinx_tags</span> = tag_words.<span style="color:#9900CC;">join</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">' '</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; &nbsp; <span style="color:#0000FF; font-weight:bold;">true</span><br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
<br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">def</span> reindex<br />
&nbsp; &nbsp; <span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'mongo/gridfs'</span><br />
&nbsp; &nbsp; <span style="color:#008000; font-style:italic;"># we run this method whenever a doc is saved, but we only actually reindex every 10 minutes, max</span><br />
&nbsp; &nbsp; <span style="color:#9966CC; font-weight:bold;">unless</span> RAILS_ENV == <span style="color:#996600;">'test'</span><br />
&nbsp; &nbsp; &nbsp; db = MongoMapper.<span style="color:#9900CC;">database</span><br />
&nbsp; &nbsp; &nbsp; file = <span style="color:#996600;">&quot;indexer_next_run&quot;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color:#9966CC; font-weight:bold;">if</span> <span style="color:#6666ff; font-weight:bold;">GridFS::GridStore</span>.<span style="color:#9900CC;">exist</span>?<span style="color:#006600; font-weight:bold;">&#40;</span>db, file<span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; line = <span style="color:#6666ff; font-weight:bold;">GridFS::GridStore</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>db, file, <span style="color:#996600;">'r'</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#CC0066; font-weight:bold;">readlines</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color:#0000FF; font-weight:bold;">return</span> <span style="color:#0000FF; font-weight:bold;">false</span> <span style="color:#9966CC; font-weight:bold;">unless</span> <span style="color:#CC00FF; font-weight:bold;">Time</span>.<span style="color:#9900CC;">now</span> <span style="color:#006600; font-weight:bold;">&gt;</span> <span style="color:#CC00FF; font-weight:bold;">Time</span>.<span style="color:#9900CC;">parse</span><span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
&nbsp; &nbsp; &nbsp; next_run = <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC00FF; font-weight:bold;">Time</span>.<span style="color:#9900CC;">now</span> <span style="color:#006600; font-weight:bold;">+</span> REINDEX_INTERVAL<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">to_s</span><br />
&nbsp; &nbsp; &nbsp; <span style="color:#6666ff; font-weight:bold;">GridFS::GridStore</span>.<span style="color:#CC0066; font-weight:bold;">open</span><span style="color:#006600; font-weight:bold;">&#40;</span>db, file, <span style="color:#996600;">'w'</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>f<span style="color:#006600; font-weight:bold;">|</span> f.<span style="color:#CC0066; font-weight:bold;">puts</span> next_run <span style="color:#006600; font-weight:bold;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; logger.<span style="color:#9900CC;">info</span> <span style="color:#996600;">&quot;Running document re-indexer&quot;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color:#CC00FF; font-weight:bold;">Process</span>.<span style="color:#CC0066; font-weight:bold;">fork</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#996600;">`rake sphinx:index rotate=true`</span> <span style="color:#006600; font-weight:bold;">&#125;</span><br />
&nbsp; &nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
<span style="color:#9966CC; font-weight:bold;">end</span></div></div>
<p>I&#8217;ll illuminate some of that for you.</p>
<p>If you&#8217;ve been using MongoMapper, most of the key stuff should be obvious already.  The &#8216;sphinx_tags&#8217; is a special trick I cooked up to make this work well with my <a href="http://github.com/mepatterson/acts_as_mongo_taggable">acts_as_mongo_taggable</a> plugin.  Basically, whenever a Document is saved/updated, I jam a single string into the sphinx_tags field in Mongo.  This lets sphinx index those tags easily.</p>
<p>The search just takes advantage of Mongosphinx&#8217;s normal by_fulltext_index method.</p>
<p>The reindex thing, while probably a hacky solution, works well enough for now that I don&#8217;t have a need to do anything fancier.  This lets us have the app be reasonably quick to include newly-created documents in the index to be available for search.  And I take advantage of GridFS (built into Mongo) to store my last_run value so I don&#8217;t even need a cron job for this.  If my app starts getting significant document-creation traffic, I might want to do something more sophisticated like delta indexing or whatnot, but for now this is fine for me.</p>
<h2>lib/tasks/sphinx.rake</h2>
<p>Okay, moving on to the aforementioned rake tasks&#8230; Here are the contents of my lib/tasks/sphinx.rake file:</p>
<div class="codecolorer-container ruby blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:300px;"><div class="ruby codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">namespace <span style="color:#ff3333; font-weight:bold;">:sphinx</span> <span style="color:#9966CC; font-weight:bold;">do</span><br />
&nbsp; desc <span style="color:#996600;">&quot;generate xml that is sphinx-friendly&quot;</span><br />
&nbsp; task <span style="color:#ff3333; font-weight:bold;">:genxml</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#ff3333; font-weight:bold;">:environment</span> <span style="color:#9966CC; font-weight:bold;">do</span><br />
&nbsp; &nbsp; <span style="color:#008000; font-style:italic;"># this will just puts() to stdout; useful for debugging</span><br />
&nbsp; &nbsp; Document.<span style="color:#9900CC;">xml_for_sphinx_pipe</span><br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
&nbsp; <br />
&nbsp; desc <span style="color:#996600;">&quot;start up the sphinx daemon&quot;</span><br />
&nbsp; task <span style="color:#ff3333; font-weight:bold;">:start</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#ff3333; font-weight:bold;">:environment</span> <span style="color:#9966CC; font-weight:bold;">do</span><br />
&nbsp; &nbsp; cmd = <span style="color:#006600; font-weight:bold;">%</span><span style="color:#006600; font-weight:bold;">&#40;</span> searchd <span style="color:#006600; font-weight:bold;">--</span>config <span style="color:#996600;">&quot;#{Rails.root}/config/sphinx.conf&quot;</span> <span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; &nbsp; <span style="color:#CC0066; font-weight:bold;">system</span>! cmd<br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
&nbsp; <br />
&nbsp; desc <span style="color:#996600;">&quot;stop the sphinx daemon&quot;</span><br />
&nbsp; task <span style="color:#ff3333; font-weight:bold;">:stop</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#ff3333; font-weight:bold;">:environment</span> <span style="color:#9966CC; font-weight:bold;">do</span><br />
&nbsp; &nbsp; <span style="color:#CC0066; font-weight:bold;">system</span>! <span style="color:#006600; font-weight:bold;">%</span><span style="color:#006600; font-weight:bold;">&#40;</span> searchd <span style="color:#006600; font-weight:bold;">--</span>config <span style="color:#996600;">&quot;#{Rails.root}/config/sphinx.conf&quot;</span> <span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
&nbsp; <br />
&nbsp; desc <span style="color:#996600;">&quot;run the sphinx indexer&quot;</span><br />
&nbsp; task <span style="color:#ff3333; font-weight:bold;">:index</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#ff3333; font-weight:bold;">:environment</span> <span style="color:#9966CC; font-weight:bold;">do</span><br />
&nbsp; &nbsp; cmd = <span style="color:#006600; font-weight:bold;">%</span><span style="color:#006600; font-weight:bold;">&#40;</span> indexer <span style="color:#006600; font-weight:bold;">--</span>config <span style="color:#996600;">&quot;#{Rails.root}/config/sphinx.conf&quot;</span> <span style="color:#006600; font-weight:bold;">--</span>all <span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; &nbsp; cmd <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> <span style="color:#996600;">' --rotate'</span> <span style="color:#9966CC; font-weight:bold;">if</span> ENV<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'rotate'</span><span style="color:#006600; font-weight:bold;">&#93;</span> <span style="color:#006600; font-weight:bold;">&amp;&amp;</span> ENV<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'rotate'</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">downcase</span> == <span style="color:#996600;">'true'</span><br />
&nbsp; &nbsp; <span style="color:#CC0066; font-weight:bold;">system</span>! cmd<br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
<span style="color:#9966CC; font-weight:bold;">end</span><br />
<br />
<span style="color:#008000; font-style:italic;"># a fail-fast, hopefully helpful version of system</span><br />
<span style="color:#9966CC; font-weight:bold;">def</span> <span style="color:#CC0066; font-weight:bold;">system</span>!<span style="color:#006600; font-weight:bold;">&#40;</span>cmd<span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">unless</span> <span style="color:#CC0066; font-weight:bold;">system</span><span style="color:#006600; font-weight:bold;">&#40;</span>cmd<span style="color:#006600; font-weight:bold;">&#41;</span><br />
&nbsp; &nbsp; <span style="color:#CC0066; font-weight:bold;">raise</span> <span style="color:#006600; font-weight:bold;">&lt;&lt;-</span>SYSTEM_CALL_FAILED<br />
The following command failed:<br />
&nbsp; <span style="color:#008000; font-style:italic;">#{cmd}</span><br />
SYSTEM_CALL_FAILED<br />
&nbsp; <span style="color:#9966CC; font-weight:bold;">end</span><br />
<span style="color:#9966CC; font-weight:bold;">end</span></div></div>
<p>So this should be pretty self-explanatory, especially if you&#8217;ve already used acts_as_sphinx with a standard ActiveRecord-backed app.  After you&#8217;ve gotten everything (sphinx and the <a href="http://github.com/dacort/mongosphinx">mongosphinx gem</a>, specifically) installed, you should be able to use these rake tasks to start and stop the searchd daemon, as well as run the indexer.</p>
<p>(update 1/22/10: Note that I&#8217;m suggesting you get dacort&#8217;s fork of mongosphinx.  He&#8217;s done a nice job of adding excerpting, pagination, and better compatibility with the latest mongomapper.  He also pulled in my fix that makes it play nice with ruby 1.9.)</p>
<p>One nicety here is the sphinx:genxml task.  Running this is helpful when you&#8217;re trying to get everything setup, to prove that you&#8217;ve done things right.  It should output a big XML file of all the documents it would index.  If it doesn&#8217;t, or you get something weird, then <a href="http://icanhascheezburger.files.wordpress.com/2007/04/wrong-mike.jpg">ur doin&#8217; it wrong</a>.</p>
<h2>config/sphinx.conf</h2>
<p>Finally, to help you get up and running, here&#8217;s my config/sphinx.conf file.  Pretty standard:</p>
<div class="codecolorer-container bash blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:300px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">searchd <span style="color: #7a0874; font-weight: bold;">&#123;</span><br />
&nbsp; listen = 127.0.0.1<br />
&nbsp; port = <span style="color: #000000;">9312</span><br />
<br />
&nbsp; log = .<span style="color: #000000; font-weight: bold;">/</span>sphinx<span style="color: #000000; font-weight: bold;">/</span>searchd.log<br />
&nbsp; query_log = .<span style="color: #000000; font-weight: bold;">/</span>sphinx<span style="color: #000000; font-weight: bold;">/</span>searchd.query.log<br />
&nbsp; pid_file = .<span style="color: #000000; font-weight: bold;">/</span>sphinx<span style="color: #000000; font-weight: bold;">/</span>searchd.pid<br />
<span style="color: #7a0874; font-weight: bold;">&#125;</span><br />
<br />
<span style="color: #7a0874; font-weight: bold;">source</span> mongo_project <span style="color: #7a0874; font-weight: bold;">&#123;</span><br />
&nbsp; <span style="color: #7a0874; font-weight: bold;">type</span> = xmlpipe2<br />
<br />
&nbsp; xmlpipe_command = .<span style="color: #000000; font-weight: bold;">/</span>script<span style="color: #000000; font-weight: bold;">/</span>runner <span style="color: #ff0000;">&quot;Document.xml_for_sphinx_pipe&quot;</span><br />
<span style="color: #7a0874; font-weight: bold;">&#125;</span><br />
<br />
index mongo_project <span style="color: #7a0874; font-weight: bold;">&#123;</span><br />
&nbsp; <span style="color: #7a0874; font-weight: bold;">source</span> = mongo_project<br />
<br />
&nbsp; charset_type = utf-<span style="color: #000000;">8</span><br />
&nbsp; path = .<span style="color: #000000; font-weight: bold;">/</span>sphinx<span style="color: #000000; font-weight: bold;">/</span>sphinx_index_main<br />
<span style="color: #7a0874; font-weight: bold;">&#125;</span></div></div>
<p>Again, all of this may fall down spectacularly once you get up to some serious data being pushed from the app into the indexer.  At that point, do something else, something more awesome.  But consider this a basic start on jamming data from your mongo collection(s) right up sphinx&#8217;s pipe.</p>
]]></content:encoded>
			<wfw:commentRss>http://mepatterson.net/2010/01/mongosphinx-with-mongodb-and-mongomapper/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	<!-- google_ad_section_end --></channel>
</rss>

