Real World Haskell - Chapter 4. Functional Programming

技术2026-05-09 25

Chapter 4. Functional Programming

A Simple Command-Line Framework

Haskell 中所有函数都仅接受一个参数。如果一个函数需要多个参数，则每给它一个参数，它就返回一个partial 函数。

组合使用库函数，如map，take，filter 来代替尾部递归和匿名函数，可以使得代码更可读，加快编码速度，bug 更少。

能使用库函数组合（用“.”操作符），就不要使用fold。用fold 代替hand-rolled 尾递归循环。

好的函数名就是一个tiny 文档。

使用ghc 在命令行中编译.hs 源文件

-- in.txt

hello,world!

-- InteractWith.hs

module Main where

import System.Environment (getArgs)

interactWith function inputFile outputFile = do

input <- readFile inputFile

writeFile outputFile (function input)

main = mainWith myFunction

where mainWith function = do

args <- getArgs

case args of

[input,output] -> interactWith function input output

_ -> putStrLn "error: exactly two arguments needed"

-- replace "id" with the name of our function below

myFunction = id

这是一个完整的文件处理程序，

在命令行中运行

ghc --make InteractWith

InteractWith in.txt out.txt

运行程序后生成out.txt，内容和in.txt 相同。

程序中出现了一些新的符号，“do” 关键字引入一个“action 块”，action 块会对现实世界造成某些影响，如读或写一个文件。

“<-”在do 里面是赋值运算符。

Warming Up: Portably Splitting Lines of Text

Haskell 内置的lines 函数用来分割字串。

:type lines

lines :: String -> [String]

lines "line 1/nline 2" -- > ["line 1","line 2"]

lines "foo/n/nbar/n" -- >["foo","","bar"]

lines "a/r/nb" -- >["a/r","b"]

Windows 中，当以文本模式读一个文件，I/O 库会把"/r/n" 转成"/n"；如果写以一个文件行为正好相反。Unix 系统文本模式不作任何转换。

实现可移值的行分割函数

splitLines :: String -> [String]

splitLines [] = []

splitLines cs =

let (pre, suf) = break isLineTerminator cs

in pre : case suf of

('/r':'/n':rest) -> splitLines rest

('/r':rest) -> splitLines rest

('/n':rest) -> splitLines rest

_ → []

isLineTerminator c = c == '/r' || c == '/n'

splitLines "foo" -- >["foo"]

break isLineTerminator "foo" -- >("foo","")

splitLines "foo/r/nbar" -- >["foo","bar"]

break isLineTerminator "foo/r/nbar" -- > ("foo","/r/nbar")

"foo" : ["bar"] -- > ["foo","bar"]

标准函数break 用来将list 分成两部分，接受一个返回bool 值的函数作为第一个参数，一旦函数返回true 就在那一点将list 一分为二。

break 函数返回一个pair

遇到奇数就将list 一分为二

break odd [2,4,5,6,8] -- >([2,4],[5,6,8])

遇到大写字母就将list 一分为二

:module +Data.Char

break isUpper "isUpper" -- >("is","Upper")

A Line-Ending Conversion Program

文件转换 - Unix 文本转Windows 文本

-- FixLines.hs

-- 编译运行 FixLines.hs 的方法

-- ghc --make FixLines

-- FixLines gpl-3.0.txt fixed-gpl-3.0.txt

-- gpl-3.0.txt

-- this file created by Unix system

-- http://www.gnu.org/licenses/gpl-3.0.txt

module Main where

import System.Environment (getArgs)

-- 文件处理函数

interactWith function inputFile outputFile = do

input <- readFile inputFile

writeFile outputFile (function input)

-- 可移值的行分割函数

splitLines :: String -> [String]

splitLines [] = []

splitLines cs =

let (pre, suf) = break isLineTerminator cs

in pre : case suf of

('/r':'/n':rest) -> splitLines rest

('/r':rest) -> splitLines rest

('/n':rest) -> splitLines rest

_ -> []

isLineTerminator c = c == '/r' || c == '/n'

fixLines :: String -> String

fixLines input = unlines (splitLines input)

main = mainWith myFunction

where mainWith function = do

args <- getArgs

case args of

[input,output] -> interactWith function input output

_ -> putStrLn "error: exactly two arguments needed"

-- replace "id" with the name of our function below

myFunction = fixLines

Infix Functions

中缀有助于提高代码可读性

中缀语法糖“``”

a `plus` b = a + b

data a `Pair` b = a `Pair` b

deriving (Show)

前缀和中缀构造都可以

True `Pair` "something" -- >True `Pair` "something"

Pair True "something" -- >True `Pair` "something"

前缀和中缀调用都可以

1 `plus` 2 -- >3

plus 1 2 -- >3

测试一个值是否是列表的一个元素

elem 'a' "camogie" -- >True

3 `elem` [1,2,4,8] -- >False

测试一个字串是否是另一个字串的前缀

"foo" `isPrefixOf` "foobar" -- >True

使用isPrefixOf 函数前用import 命令导入Data.List 模块

测试一个字串是否是另一个字串的中缀

"needle" `isInfixOf` "haystack full of needle thingies" -- >True

测试一个字串是否是另一个字串的后缀

"end" `isSuffixOf` "the end" -- >True

Working with Lists

列表作为函数编程的最佳搭档理应享受特殊待遇。标准库为处理list 定义了一打函数，其中一些是实际编程中不可或缺的。

以下将会介绍数量众多的list 函数。为什么要一次呈现如些之多的函数？因为它们易于学习，并且无处不在。如果我们不熟悉它们就会把时间浪费在编写标准库已经提供的函数上。

Data.List 是标准list 函数真实逻辑义意上的home，Prelude 仅仅导出了Data.List 的一个小子集，而且其中一些很有用的函数并不包括在内。

ghci 中加载Data.List 模块

:module +Data.List

Basic List Manipulation

head 和last，tail 和init 是两对效果相反的函数

length 函数取列表元素个数

length [] -- >0

length "strings are lists, too" -- >22

null 函数检查一个列表是否为空

null [] -- >True

null "plugh" -- >False

head 函数取列表的第一个元素

head [1,2,3] -- >1

last 函数取列表的最后一个元素

last "bar" -- >'r'

tail 函数取除掉head 后剩余的元素列表

tail "foo" -- >"oo"

init 函数取除掉last 后剩余的元素列表

init "bar" -- >"ba"

(注意，上面这些函数遇到空列表通常会出错，要小心处理。)

Safely and Sanely Working with Crashy Functions

使用lenght 函数来判断列表是否为空不是好办法

myDumbExample xs = if length xs > 0

then head xs

else 'Z'

list 不存储自身的长度信息，所以要获取列表的长度唯一方法是遍历整个列表。因此当我们想知道一个列表是否为空时使用lenght 函数进行判断不是好办法。lenght 遇到无限列表时就出现死循环了。

用null 函数来判断一个列表是否为空是较好的选择，此函数会在常量时间返回。

应该使用null 函数来判断一个列表是否为空

mySmartExample xs = if not (null xs)

then head xs

else 'Z'

-- 亮点

myOtherExample (x:_) = x

myOtherExample [] = 'Z'

Partial and Total Functions

形如head [] 这种“[]”输入合法却会引发错误的函数就称为“partial functions”，是不安全的函数。

Prelude 定义了一些不安全的“partial functions”，例如head 函数。

More Simple List Manipulations

“++” 运算符连接两个列表

"foo" ++ "bar" -- >"foobar"

[True] ++ [] -- >[True]

concat 函数将两个列表的列表合并成单个列表

concat [[1,2,3], [4,5,6]] -- >[1,2,3,4,5,6]

concat [[[1,2],[3]], [[4],[5],[6]]] -- >[[1,2],[3],[4],[5],[6]]

concat (concat [[[1,2],[3]], [[4],[5],[6]]]) -- >[1,2,3,4,5,6]

调用一次concat 就拆掉一层括号

reverse 函数将列表逆序

reverse "foo" -- >"oof"

and, or 函数对一list 的bool 值进行逻辑测试

and [] -- >True

or [] -- >False

and [True,False,True] -- >False

or [False,False,False,True,False] -- >True

all, any 函数是and, or 的非常有用的弟妹

用all 函数判断list 中元素是否全为奇数

all odd [] -- >True

all odd [1,3,5] -- >True

用any 函数判断list 中是否存在偶数

any even [] -- >False

any even [3,1,4,1,5,9,2,6,5] -- >True

空列表[] 中，全部是偶数(all)，并且不存在偶数(any)

Working with Sublists

take 函数取前n 个元素的列表

take 2 [1] -- >[1]

drop 函数取除掉前n 个元素后的列表

drop 3 "xyzzy" -- >"zy"

splitAt 函数在指定位置分割列表

splitAt 3 "foobar" -- >("foo","bar")

splitAt 组合了take 和drop，先take 得到前面的部分，再drop 得到后面的部分，然后组合成一个pair 返回。

takeWhile 接受一个布尔函数和一个列表，返回那些使得布尔函数为真的元素列表

takeWhile odd [1,3,5,6,8,9,11] -- >[1,3,5]

注意，一旦布尔函数返回false 就不往下取了

dropWhile 接受一个布尔函数和一个列表，返回除去那些使得布尔函数为真的元素列表

dropWhile even [2,4,6,7,9,10,12] -- >[7,9,10,12]

span 函数在第一次使得布尔函数为假的位置分割列表

span even [2,4,6,7,9,10,12] -- >([2,4,6],[7,9,10,12])

span 组合了takeWhile 和dropWhile，先takeWhile 得到前面的部分，再dropWhile 得到后面的部分，然后组合成一个pair 返回。

break 函数在第一次使得布尔函数为真的位置分割列表

break even [1,3,5,6,8,9,10] -- >([1,3,5],[6,8,9,10])

break 组合了takeWhile 和dropWhile，先takeWhile 得到前面的部分，再dropWhile 得到后面的部分，然后组合成一个pair 返回。

Searching Lists

elem 函数判断列表中是否存在某个元素

2 `elem` [5,3,2,1,1] -- >True

notElem 函数判断列表中是否不存在某个元素

2 `notElem` [5,3,2,1,1] -- >False

filter 函数返回所有使得布尔函数为真的元素列表

filter odd [2,4,1,3,6,8,5,7] -- >[1,3,5,7]

isPrefixOf, isInfixOf 和isSuffixOf 函数分别用来判断一个列表是否是另一个列表的前缀中缀或后缀

"foo" `isPrefixOf` "foobar" -- >True

[2,6] `isInfixOf` [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9] -- >True

".c" `isSuffixOf` "crashme.c" -- >True -- 亮点

Working with Several Lists at Once

zip 函数将两个列表的元素一一对应并配成pair 的列表其长度与最短的列表一至

zip [12,72,93] "zippity" -- >[(12,'z'),(72,'i'),(93,'p')]

zipWith 函数接受一个操作函数并接受两个列表作为输入，其生成的列表长度与最短的列表一至

zipWith (+) [1,2,3] [4,5,6] -- >[5,7,9]

zip3,zipWith3 到zip7,zipWith7 函数是上面函数的参数加长版，能将更多的列表zip 起来

zip3 [12,72,93] "zippity" [1, 2, 3] -- >[(12,'z',1),(72,'i',2),(93,'p',3)]

Special String-Handling Functions

lines 函数将含有换行符的字串处理成列表

lines "foo/nbar" -- >["foo","bar"]

unlines 函数将字串列表处理成含有换行符的列表

unlines ["foo", "bar"] -- >"foo/nbarλn"

words 函数提取字串的所有单词并生成列表

words "the /r quick /t brown/n/n/nfox" -- >["the","quick","brown","fox"]

unwords 函数将单词列表连成字串

unwords ["jumps", "over", "the", "lazy", "dog"] -- >"jumps over the lazy dog"

How to Think About Loops

遍历处理List 的关键是拆分(用模式匹配)，连接，终止。

和传统语言不同，Haskell 既没有for 循环也没有while 循环。

用什么来替代循环呢？这个问题有许多可能的答案。

Haskell 中用尾部递当来代替循环。

在命令式语言中，循环的执行花费常量空间开销。Haskell 的尾部递归中每次调用自已都要分配一些空间，让函数知道应该在哪里返回。所以要花费线性空间开销。但是，函数语言会检测尾部递归，并应用一种称为“尾部调用优化”技术使得尾部递归变成常量空间开销

Explicit Recursion

C 语言字符串转数字

#include "stdio.h"

int as_int(char *str)

{

int acc; /* accumulate the partial result */

for (acc = 0; isdigit(*str); str++) {

acc = acc * 10 + (*str - '0');

}

return acc;

}

int main()

{

char *strdigit = "678413";

int digit;

digit = as_int(strdigit);

printf("%d", digit);

}

如何用Haskell 直观的实现字串转整数呢？

数字字串转整数

import Data.Char (digitToInt) -- we'll need ord shortly

asInt :: String -> Int

loop :: Int -> String -> Int

asInt xs = loop 0 xs

loop acc [] = acc

loop acc (x:xs) = let acc' = acc * 10 + digitToInt x

in loop acc' xs

asInt "33" -- >33

asInt "" -- >0

函数签名可以不写，但是它有助于提醒我们正在做的事情

acc 变量用于“accumulate the partial result”，不断的用列表值构器“:” 进行模式匹配将字串列表拆成两部分来控制循环。

acc' 在变量名后加“'” 表示与原变量差别不大的变量（“'” 发音为“prime”）。

字串String 就是字符列表[Char]，可以这样考虑列表结构：它要不是空列表[] ，要不就是单个元素后接剩余的列表。

修复上面这段代码缺陷留作97 页的练习。

因为loop 函数最后做的一件事情是简单的调用自已，所以是一个尾部递归函数。

对列表分为空和非空两种种况进行处理这是一种称为结构递归(structural recursion)的方法。

structural recursion 作为一种有用的技术不限于只用在list 上次，也可以用于其它代数数据类型上。

Transforming Every Piece of Input

计算列表的乘方

square :: [Double] -> [Double]

square (x:xs) = x*x : square xs

square [] = []

square [2,3,4] -- >[4.0,9.0,16.0]

小写字串转大写字串

import Data.Char (toUpper)

upperCase :: String -> String

upperCase (x:xs) = toUpper x : upperCase xs

upperCase [] = []

upperCase "hello,world!" -- >"HELLO,WORLD!"

Mapping over a List

map 函数接受一个处理函数，并用它处理列表中的每一个元素

square2 xs = map squareOne xs

where squareOne x = x * x

upperCase2 xs = map toUpper xs

map negate [1,2,3] -- >[-1,-2,-3]

因为map 以另一个函数作为参数，所以我们将其称为“higher-order”函数

map 函数的实现

myMap :: (a -> b) -> [a] -> [b]

myMap f (x:xs) = f x : myMap f xs

myMap _ _ = []

myMap toUpper "hello,world!" -- >"HELLO,WORLD!"

Selecting Pieces of Input

过滤掉列表中不是奇数的元素

oddList :: [Int] -> [Int]

oddList (x:xs) | odd x = x : oddList xs

| otherwise = oddList xs

oddList _ = []

oddList [1,1,2,3,5,8,13,21,34] -- >[1,1,3,5,13,21]

其中“|” 是“guard expresses”

用filter 函数过滤掉列表中不是奇数的元素

filter odd [1,1,2,3,5,8,13,21,34] -- >[1,1,3,5,13,21]

Computing One Answer over a Collection

Data.List 模块定义了一个名为foldl' 的函数，与foldl 不同的是它不会创建thunks。

由于foldl 的thunking 行为，在实际编程中应避免使用这个函数，而应该用foldl' 替代。让Foldl 仅用于测试。

用foldl 代劳尾部递归

Haskell 中fold 函数极为常见，而且拥有规则，可预测的行为。

相比使用显示递归，使用fold 函数可以让代码更容易理解。

Foldr 是list编程工具箱的重要成员。

Foldr 可以增量式的消费和生产一个list ，这对编写lazy data-processing 非常有用。

对foldl我们只需要简单的考虑两件事：第一，accumulator 的初始值是什么。第二，如何更新accumulator(“+” 函数)。这样我们的代码更短了，也易于理解。

另一种考虑foldr 的方法：第一，它的前两个参数是“对列表的每一个head/tail 做什么？”。第二，“要把列表的end 替换成什么？”

列表求和（后面有改进版）

mySum xs = helper 0 xs

where helper acc (x:xs) = helper (acc + x) xs

helper acc _ = acc

helper 是尾部递归函数，因为它做的最后一做事情就是调用自已

Adler-32 checksum 算法（后面有改进版）

--import Data.Char (ord)

--import Data.Bits (shiftL, (.&.), (.|.))

base = 65521

adler32_try2 xs = helper (1,0) xs

where helper (a,b) (x:xs) =

let a' = (a + (ord x .&. 0xff)) `mod` base

b' = (a' + b) `mod` base

in helper (a',b') xs;

helper (a,b) _ = (b `shiftL` 16) .|. a

shiftL 是逻辑左移函数，“.&.” 是按位与运算符，“.|.” 是按位或运算符。

The Left Fold

foldl 函数的实现

myfoldl :: (a -> b -> a) -> a -> [b] -> a

myfoldl step zero (x:xs) = myfoldl step (step zero x) xs

myfoldl _ zero [] = zero

foldl (+) 0 (1:2:3:[])

== foldl (+) (0 + 1) (2:3:[])

== foldl (+) ((0 + 1) + 2) (3:[])

== foldl (+) (((0 + 1) + 2) + 3) []

== (((0 + 1) + 2) + 3)

列表求和（后面有改进版）

foldlSum xs = foldl step 0 xs

where step acc x = acc + x

列表求和（后面有改进版）

niceSum :: [Integer] -> Integer

niceSum xs = foldl (+) 0 xs

这里不再使用尾部递归，因为foldl 代劳了。对foldl我们只需要简单的考虑两件事：第一，accumulator 的初始值是什么。第二，如何更新accumulator(“+” 函数)。这样我们的代码更短了，也易于理解。

另一种考虑foldr 的方法：第一，它的前两个参数是“对列表的每一个head/tail 做什么？”。第二，“要把列表的end 替换成什么？”

Adler-32 checksum 算法（后面有改进版）

adler32_foldl xs = let (a, b) = foldl step (1, 0) xs

in (b `shiftL` 16) .|. a

where step (a, b) x = let a' = a + (ord x .&. 0xff)

in (a' `mod` base, (a' + b) `mod` base)

这里accumulator 是一个pair，所以foldl 也返回pair

Why Use Folds, Maps, and Filters?

前面的例子使用fold 函数没让代码变短多少，为什么要用fold？因为在Haskell 中fold 函数极为常见，而且拥有规则，可预测的行为。

相比使用显示递归，使用fold 函数可以让代码更容易理解。

Folding from the Right

foldr 的实现

myfoldr :: (a → b → b) → b → [a] → b

myfoldr step zero (x:xs) = step x (myfoldr step zero xs)

myfoldr _ zero [] = zero

foldr (+) 0 (1:2:3:[])

== 1 + foldr (+) 0 (2:3:[])

== 1 + (2 + foldr (+) 0 (3:[])

== 1 + (2 + (3 + foldr (+) 0 [])) ????

== 1 + (2 + (3 + 0))

看起来似乎foldr 比foldl 的用处小，但如果用显示递归实现filter 看起来像这样:

filter 实现（后面有改进版）

myfilter :: (a → Bool) → [a] → [a]

myfilter p [] = []

myfilter p (x:xs)

| p x = x : myfilter p xs

| otherwise = myfilter p xs

filter 实现

myFilter p xs = foldr step [] xs

where step x ys | p x = x : ys

| otherwise = ys

myFilter odd [1,2,3,4] -- >[1,3]

step 1 (foldr step [] 2:3:4:[])

step 1 (step 2 (foldr step [] 3:4:[]))

step 1 (step 2 (step 3 (foldr step [] 4:[])))

step 1 (step 2 (step 3 (step 4 (foldr step [] [])))) ????

这种定义初看起来让人头痛。

第一，accumulator 的初始值是[]。

第二，由step 函数来更新accumulator，如果一个元素让布尔函数为真就连入列表

map 的实现

myMap' :: (a → b) → [a] → [b]

myMap' f xs = foldr step [] xs

where step x ys = f x : ys

实际上foldl 可以用foldr 实现

myFoldl :: (a → b → a) → a → [b] → a

myFoldl f z xs = foldr step id xs z

where step x g a = g (f a x)

（这样看起来foldr 比foldl 更有用？）

复制一个列表

identity :: [a] -> [a]

identity xs = foldr (:) [] xs

identity [1,2,3] -- >[1,2,3]

1:(2:(3:[]))????

identity 返回列表的一个拷贝

另一种考虑foldr 的方法：第一，它的前两个参数是“对列表的每一个head/tail 做什么？”。第二，“要把列表的end 替换成什么？”

如果foldr 将列表的end 替换成其它的值，这提示我们对于Haskell 的“++” 运算符的另一种看法。

连接两个列表就是把前一个的end 替换成另一个

append :: [a] -> [a] -> [a]

append xs ys = foldr (:) ys xs

append [1,3,5] [2,4,6] -- >[1,3,5,2,4,6]

Foldr 是list编程工具箱的重要成员。

Left Folds, Laziness, and Space Leaks

习惯上将foldl 作为测试用，而且我们在实践中从不使用foldl。在需要结果前，foldl 会存储一个thunk，而thunk 比一个单值的开销要大。

ghc 为thunk 准备的stack 大小有一个最大值限制，超过这个大小就会引发错误。要感谢这种限制使得我们可以尝试非常大的thunk 而不用但心消耗掉所有内存。

foldl (+) 0 [1..1000] -- >500500

这里foldl 将创建一个thunk，这个thunk 含有1000 个整数，并且应用了999 次“+” 函数。为了表示一个单值花费了太多内存和effort。

foldl (+) 0 [1..100000] -- >Exception: stack overflow（书上是这样说，实际上好像是没反应）

在小的expressions 中，foldl 工作正常，但速度慢。我们这种把不可见的thunking 称为“space leak”，因为我们的代码操作正常，但是内存使用不在正常范围。

使用foldl 带来的“space leak” 是Haskell 新手的一个典型路障。幸运的是，这很容易避免。

Data.List 模块定义了一个名为foldl' 的函数，与foldl 不同的是它不会创建thunks。

由于foldl 的thunking 行为，在实际编程中应避免使用这个函数，而应该用foldl' 替代。