10

Haskell Repa库使用心得

   Posted by: tangboyun   in Haskell

Repa库使用心得

1 Repa简介

Repa库是Haskell面向数值计算而精心设计的高性能数组库,其目的主要是为了便于多核并发,特别适合满矩阵(稠密矩阵)的算法设计。目前来看,该库为原生的Haskell代码写成,是矩阵计算类库中最值得推荐的。

2 ArrayVectorRepa的异同

对于数值计算而言,默认的Haskell设计理念(Purely functional,Lazy Evaluation),在大部分场合下,是个很糟糕的选择。
高性能的数值算法,大都需要这么几个特性: Unboxed数组,也就是内含原始数据(而非Reference)的连续内存区域。由于Unboxed数据无法像boxed对象一样便于GC进行垃圾回收,所以在Haskell中Unboxed容器内的元素一定是strict的(容器本身可以是个thunk)。

Haskell中的数值计算类相关类库,常见的有,Prelude自带的Data.ArrayHaskell platform中的Data.Vector模组。而Repa库的作者,事实上便是Data.Vector的作者。

ArrayVector模组之间的区别在于,Array的索引类支持多维索引,所以Array可以直接用于高维矩阵,不过Array库的缺点也很突出,也就是接口事实上并不完善,比之List类差了不少。Vector模组是面向高性能的一维向量运算,优点是接口相当完善,注释的也很棒,每个函数都给出了运行时间的上界。可惜的是并不支持矩阵运算。Repa则是对Vector的一个相当棒的补充。Repa自身直接支持并发,并且有良好的运行时支持。

3 ProjectEluer Problem 23的Repa高性能实现

以下代码是解ProjectEluer 的第23题,题目为:

A perfect number is a number for which the sum of its proper divisors is exactly equal to the number. For example, the sum of the proper divisors of 28 would be 1 + 2 + 4 + 7 + 14 = 28, which means that 28 is a perfect number.

A number n is called deficient if the sum of its proper divisors is less than n and it is called abundant if this sum exceeds n.

As 12 is the smallest abundant number, 1 + 2 + 3 + 4 + 6 = 16, the smallest number that can be written as the sum of two abundant numbers is 24. By mathematical analysis, it can be shown that all integers greater than 28123 can be written as the sum of two abundant numbers. However, this upper limit cannot be reduced any further by analysis even though it is known that the greatest number that cannot be expressed as the sum of two abundant numbers is less than this limit.

Find the sum of all the positive integers which cannot be written as the sum of two abundant numbers.

 1:  {-
 2:     Copyright (C) 2011 Boyun Tang
 3:     Send email to: tangboyun@hotmail.com
 4:  
 5:     This program is free software: you can redistribute it and/or modify
 6:     it under the terms of the GNU General Public License as published by
 7:     the Free Software Foundation, either version 3 of the License, or
 8:     (at your option) any later version.
 9:  
10:     This program is distributed in the hope that it will be useful,
11:     but WITHOUT ANY WARRANTY; without even the implied warranty of
12:     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
13:     GNU General Public License for more details.
14:  
15:     You should have received a copy of the GNU General Public License
16:     along with this program.  If not, see <http://www.gnu.org/licenses/>.
17:  -}
18:  
19:  {-# LANGUAGE BangPatterns #-}
20:  module Main where
21:  
22:  import Prelude hiding (sum)
23:  import qualified Prelude as P
24:  import Data.Array.Repa
25:  import qualified Data.Vector.Unboxed as UV
26:  
27:  
28:  main = putStrLn $ show $ prob23
29:  
30:  decom x = let ls = [1..x `div` 2]
31:            in filter (\y -> (x `rem` y)
32:                             == 0 ) ls
33:  
34:  isAbNum x = P.sum (decom x) > x
35:  
36:  prob23 = let !aArray = toVector $ 
37:                         fromFunction 
38:                         (Z :. (28123::Int)) (\(_ :. s) -> if isAbNum (s+1)
39:                                                           then s + 1
40:                                                           else 0 )             
41:               abArray = UV.filter (/= 0)  aArray         
42:               xx = UV.map (/= 0) aArray
43:               result' = fromFunction 
44:                         (Z :. (28123::Int)) 
45:                         (\(_:. s) -> 
46:                           let v = s + 1
47:                               vA = (UV.takeWhile (< v) abArray)
48:                           in if UV.any (\y -> UV.unsafeIndex xx (y-1)) 
49:                                   $ UV.map (\x -> v - x) vA
50:                              then 0
51:                              else v
52:                         )
53:           in toScalar $ sum result' 
54:  

编译上述代码:

ghc -O2  -fllvm test.hs -threaded -rtsopts -fforce-recomp

在笔记本(Core2 T5600)上的运行结果:

tangboyun@tangboyun-MXC062:/tmp$ time ./test +RTS -N4 -RTS
4179871

real    0m3.310s
user    0m5.840s
sys     0m0.040s
tangboyun@tangboyun-MXC062:/tmp$ time ./test
4179871

real    0m5.413s
user    0m5.390s
sys     0m0.020s

4线程大约提升了1.6倍速度,还算不错。
值得注意的是:在刚撰写完该代码时,没并有添加BangPatterns(第36行的!aArray),结果出现了如下警告,代码无法并发:

Data.Array.Repa: Performing nested parallel computation sequentially.
 You've probably called the 'force' function while another instance was
 already running. This can happen if the second version was suspended due
 to lazy evaluation. Use 'deepSeqArray' to ensure that each array is fully
 evaluated before you 'force' the next one.

无论如何采用’force’或者’deepSeqArray’去force eval aArray或者result’都无法解决。fromFuntion产生的是一个delayed array,而事实上,在Repa库中,toVector和sum都会在运算之间,运行deepSeqArray将参数force eval,关键在于,代码中的abArray和xx并没有被eval到WHNF, 将aArray添加BangPatterns即表明需要先将aArray eval到WHNF后,问题解决。

4 总结

Haskell是门玄妙的语言,相对于ML和OCaml而言,要写出一个高性能的Haskell实现,更需要对Haskell语意和运行时行为的透彻了解。新手写的Haskell程序,很可能还跑不过相应的Python实现,可只短短修改几个词,就可以达到甚至超过C的速度。 类似例子,还真不少见。

Date: August 10, 2011

Author: Boyun Tang

Org version 7.7 with Emacs version 23

Validate XHTML 1.0

 

© 2011, tangboyun. 版权所有.

Share

Tags:

This entry was posted on 星期三, 八月 10th, 2011 at 上午 11:24 and is filed under Haskell. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a reply

Name (*)
Mail (will not be published) (*)
URI
Comment

本WordPress博客由爱写字提供技术支持