bartukahttp://bartuka.comThoughts on computer programsMon, 08 Jul 2019 09:07:31 -0300clj-rsshttp://bartuka.com/posts-output/2019-07-08-memoization/http://bartuka.com/posts-output/2019-07-08-memoization/Smarter, not harder: Memoization<p>I'm reading the Neal's Ford book <em>Functional Thinking</em> and now the whole idea of functional paradigm is becoming clearer.</p><p>The whole idea of functional programming (FP) today is to be able to leave <em>acidental complexities</em> up to the <em>language</em> or the <em>runtime</em> to solve for you. No one should have to manage memory into your daily work activities. This kind of problem is not related to your business at all! You are not being paid to solve it.</p><p>Luckily, most modern languages already have this task completely handled without your direct intervention.</p><p>There are many situations where this kind of <em>niceties</em> are introduced into the FP world, for example, if you use the <strong>map</strong> function in Clojure, you already gain parallel execution for free at runtime. If you use higher-order constructs, you enable the runtine to become smarter and to even rearrange execution in a most appropriate order.</p><p>This it not to say that you should not learn what is going on behind the scenes, but once you learn it you can use this knowledge in very succinct way.</p><blockquote><p> Always learn one level of abstraction underneath you </p></blockquote><p>However, let's explore the <em>memoization</em> mechanism for a second.</p><h2 id="what_is_memoization?">What is memoization?</h2><p>Imagine that you have a function that is very computational intensive, meaning that you spend most of your execution time to perform your desired operation on this function. And for your despair, the very same function is called several times inside your application. <strong>How can we speed this up?</strong></p><p>Let's consider the task of classifying numbers into <em>perfect</em>, <em>abundant</em> and <em>deficient</em> accordingly with mathematical definitions of perfect numbers.</p><p>Example of a <strong>Python</strong> code to Classify numbers into <em>perfect</em>, <em>abundant</em> and <em>deficient</em>.</p><pre><code class="python">class Classifier:
@staticmethod
def sum_factors(number):
return sum(Classifier.factors_of(number))
@staticmethod
def factors_of(number):
filtered_list = filter(lambda x: (number % x == 0), range(1, number + 1))
return list(filtered_list)
@staticmethod
def is_perfect(number):
return Classifier.sum_factors(number) == 2 * number
@staticmethod
def is_abundant(number):
return Classifier.sum_factors(number) > 2 * number
@staticmethod
def is_deficient(number):
return Classifier.sum_factors(number) < 2 * number
</code></pre><p>Now, if you want to call this function on the numbers `[6, 25, 15000, 56000, 110560]`:</p><pre><code class="python">for el in [6, 25, 15000, 56000, 110560]:
print("The number {} is perfect? {}".format(el, Classifier.is_perfect(el)))
print("The number {} is abundant? {}".format(el, Classifier.is_abundant(el)))
print("The number {} is deficient? {}".format(el, Classifier.is_deficient(el)))
</code></pre><p>You will not probably like the result as the number that you want to classify gets bigger and bigger.</p><p>Same funcionality implemented in <strong>Clojure</strong>:</p><pre><code class="clojure">(defn- sum-of-factors [number]
(->> (range 1 (+ number 1))
(filter #(= 0 (rem number %)))
(reduce +)))
(defn is-perfect [number]
(= (sum-of-factors number) (* 2 number)))
(defn is-abundant [number]
(> (sum-of-factors number) (* 2 number)))
(defn is-deficient [number]
(< (sum-of-factors number) (* 2 number)))
</code></pre><p>And the same tests calls:<pre><code class="clojure">(doseq [el [6, 25, 15000, 56000, 110560]]
(println (str "The number " el " is perfect?" (is-perfect el)))
(println (str "The number " el " is abundant?" (is-abundant el)))
(println (str "The number " el " is deficient?" (is-deficient el))))
</code></pre></p><p>You will not like these results too. As we are not in a speed context between languages, I will place the time that the <em>Clojure</em> implementation took to perform these classifications.</p><p>Example of results. Showing only the <em>is_perfect</em> test.<pre><code class="clojure">el - 6: "Elapsed time: 0.122637 msecs"
el - 25: "Elapsed time: 0.071203 msecs"
el - 15000: "Elapsed time: 12.583968 msecs"
el - 56000: "Elapsed time: 32.59634 msecs"
el - 110560: "Elapsed time: 47.281954 msecs"
</code></pre></p><p>In order to perform all the classifications, the code took <strong>170 msecs</strong>.</p><h2 id="implementing_memoization">Implementing memoization</h2><p>Ok, we understand the problem now. The whole idea of <strong>memoization</strong> is to <strong>cache</strong> some results in order to gain speed when you compute that value again. This is very nice and simple: <strong>you exchange memory space for speed</strong>. Instead of computing the whole thing again, you will only perform a lookup into some data structure.</p><p>However, you see functional programming concepts kick-in again. In order to implement this sort of caching, you need that your function or method to be <strong>pure</strong>, in other words, if you pass the same inputs you always get the same output!</p><p>Implementing <strong>caching</strong> in Python to simulate memoization.</p><pre><code class="python">class ClassifierCached:
def __init__(self):
self.sum_cache = {}
def sum_factors(self, number):
if not number in self.sum_cache:
self.sum_cache[number] = sum(self.factors_of(number))
return self.sum_cache[number]
def factors_of(self, number):
filtered_list = filter(lambda x: (number % x == 0), range(1, number + 1))
return list(filtered_list)
def is_perfect(self, number):
return self.sum_factors(number) == 2 * number
def is_abundant(self, number):
return self.sum_factors(number) > 2 * number
def is_deficient(self, number):
return self.sum_factors(number) < 2 * number
</code></pre><p>If you pay attention, you will see that I haven't done much in order to implement this simple caching mechanism. However, what happened?</p><ul><li>I have to choose the data structure to store the data (a dictionary <code>sum_cache</code>)</li><li>I have to manage the lookup into the data structure</li><li>I have to manage the population of the data structure</li><li>I have to change the <code>staticmethod</code> functions to become instance methods, because now I have a state to manage.</li></ul><p>Let's take a look at the <strong>Clojure</strong> version:</p><pre><code class="clojure">(defn- sum-of-factors-non-cached [number]
(->> (range 1 (+ number 1))
(filter #(= 0 (rem number %)))
(reduce +)))
(def sum-of-factors
(memoize sum-of-factors-non-cached))
</code></pre><p>I just renamed the old implemented function to become <code>sum-of-factors-non-cached</code> and created a new one using the function <code>memoize</code>. That's it.</p><p>Now, the whole call cost <strong>80 msecs</strong>. As worse as your high-demanding function become, the clearer become the advantages of <em>memoization</em>.</p><p>What's the beauty on the Clojure version? I delegate to the <em>language</em> all the choices that I had to make in the Python version. The runtime is taking care of all the details for me.</p><p>Let's keep focused on the business problems we want to solve!!</p>Mon, 08 Jul 2019 00:00:00 -0300