<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Column stores and row stores: should you care?</title>
	<atom:link href="http://lemire.me/blog/archives/2009/07/03/column-stores-and-row-stores-should-you-care/feed/" rel="self" type="application/rss+xml" />
	<link>http://lemire.me/blog/archives/2009/07/03/column-stores-and-row-stores-should-you-care/</link>
	<description>Computer Scientist and Open Scholar: Databases, Information Retrieval, Business Intelligence.</description>
	<lastBuildDate>Thu, 09 Feb 2012 11:13:29 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2009/07/03/column-stores-and-row-stores-should-you-care/comment-page-1/#comment-51204</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Fri, 03 Jul 2009 23:05:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2044#comment-51204</guid>
		<description>@Parand

Yes, but &quot;most DBs are IO bound&quot; is not the entire explanation. Here are two finer points:

A) It is not all that true. 

On this blog (search for it), I have run experiments showing that parsing CSV files was easily CPU bound. Of course, you have to define properly what &quot;parsing&quot; means... I mean here to find the strings, then copy them into some data structure.

That is why databases use relatively &quot;cheap&quot; compression techniques. Going out of your way to squeeze the data down might be counterproductive. Compression is not everything.

Thankfully, column-oriented designs allow &quot;cheap&quot; compression techniques to work well. Basically, you sort the data (a relatively cheap operation) and then you  then you apply run-length encoding.

B) Compression is not only about reducing IO costs.

As an example, is it faster to compute the sum of:

111122222
or
4x1, 5x2
?

Clearly, it is faster to compute the sum of the &quot;compressed&quot; array. So compression can also save CPU cycles when *you are operating directly over the compressed data stream*. Whenever you need to load the data in RAM, then uncompress it, and then work over the uncompressed data, you have to worry that you will overload your memory bandwidth.</description>
		<content:encoded><![CDATA[<p>@Parand</p>
<p>Yes, but &#8220;most DBs are IO bound&#8221; is not the entire explanation. Here are two finer points:</p>
<p>A) It is not all that true. </p>
<p>On this blog (search for it), I have run experiments showing that parsing CSV files was easily CPU bound. Of course, you have to define properly what &#8220;parsing&#8221; means&#8230; I mean here to find the strings, then copy them into some data structure.</p>
<p>That is why databases use relatively &#8220;cheap&#8221; compression techniques. Going out of your way to squeeze the data down might be counterproductive. Compression is not everything.</p>
<p>Thankfully, column-oriented designs allow &#8220;cheap&#8221; compression techniques to work well. Basically, you sort the data (a relatively cheap operation) and then you  then you apply run-length encoding.</p>
<p>B) Compression is not only about reducing IO costs.</p>
<p>As an example, is it faster to compute the sum of:</p>
<p>111122222<br />
or<br />
4&#215;1, 5&#215;2<br />
?</p>
<p>Clearly, it is faster to compute the sum of the &#8220;compressed&#8221; array. So compression can also save CPU cycles when *you are operating directly over the compressed data stream*. Whenever you need to load the data in RAM, then uncompress it, and then work over the uncompressed data, you have to worry that you will overload your memory bandwidth.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://lemire.me/blog/archives/2009/07/03/column-stores-and-row-stores-should-you-care/comment-page-1/#comment-51203</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Fri, 03 Jul 2009 22:36:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2044#comment-51203</guid>
		<description>Indeed, compressibility seemed to be a major advantage they observed in using a column store vs. in Hadoop. Wasn&#039;t at all clear to me what that was / should be the case.</description>
		<content:encoded><![CDATA[<p>Indeed, compressibility seemed to be a major advantage they observed in using a column store vs. in Hadoop. Wasn&#8217;t at all clear to me what that was / should be the case.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Parand</title>
		<link>http://lemire.me/blog/archives/2009/07/03/column-stores-and-row-stores-should-you-care/comment-page-1/#comment-51202</link>
		<dc:creator>Parand</dc:creator>
		<pubDate>Fri, 03 Jul 2009 22:24:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=2044#comment-51202</guid>
		<description>One claim that caught my attention was compressibility in column oriented databases: the column stores tend to compress very well, significantly increasing IO bandwidth (x number of bytes from disk translates to &gt;&gt; x number of bytes of actual data). Since most DBs are IO bound, this turns out to provide a big real-world performance advantage. What do you think?</description>
		<content:encoded><![CDATA[<p>One claim that caught my attention was compressibility in column oriented databases: the column stores tend to compress very well, significantly increasing IO bandwidth (x number of bytes from disk translates to &gt;&gt; x number of bytes of actual data). Since most DBs are IO bound, this turns out to provide a big real-world performance advantage. What do you think?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

