Unix下针对邮件,搜索,网络硬盘等海量存储的分布式文件系统项目
:Google是当前最有影响的Web搜索引擎,它利用一万多台廉价PC机构造了一个高性能、超大存储容量、稳定、实用的巨型Linux集群。 http://bbs.chinaunix.net/forum/viewtopic.php?t=390949&show_type=old 其分布式分布式文件系统的实现方法,用低成本实现了高可用、高性能集群的方法是并行机设计、开发的一个成功典范,这种严格追求性价比的设计方法值得借鉴。 请大家参与到这一工作中来:) 发件人: Eric Anderson 收件人: FreeBSD Clustering List 主题: FreeBSD Clustering wishlist - Was: Introduction & RE: Clustering with Freebsd 日期: Wed, 11 May 2005 22:45:55 -0500 (星期四,11:45 CST) 邮件程序: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.7) Gecko/20050504 Ok - Im changing the subject here in an attempt to gather information. Heres my wishlist: FreeBSD have a native clustered filesystem. This is different than shared media (we already can do that over fiber channel, ggated, soon iscsi and AOE). This would allow multiple servers to access the same data read/write - highly important for load balancing applications like web servers, mail servers, and NFS servers. Online growable filesystem. I know I can growfs a filesystem now, but doing online while data is being used is *insanely* useful. Reiserfs and Polyserves FS (a clustered filesystem, not open-source) do this well. FreeBSDs UFS2 made to do journaling. Theres already someone working on this. I believe the above mean that we need a distributed lock manager too, so might as well add that to my wishlist. Single filesystem limits set very high - 16TB would be a good minimum. Vinum/geom (?) made to allow added a couple more disks - be it a real scsi device, or another vinum device - to existing vinums, so I can extend my vinum stripe, raid, concat, etc to a larger volume size, without worrying about which disk is where. I want to stripe mirrors of raids, and raid striped mirrors of stripes. I know it sounds crazy, but I really *do* have uses for all this. :) We currently pay lots of money every year (enough to pay an engineers salary) for support and maintenance with Polyserve. They make a good product (we need it for the clustered filesystem and NFS distributed lock manager stuff) - Id much rather see that go to FreeBSD. Eric :em10: :em10: :em10: :em10: :em16:
yftty 回复于:2005-05-13 14:46:32在 2005-05-11三的 22:45 -0500,Eric Anderson写道: > Ok - Im changing the subject here in an attempt to gather information. > > Heres my wishlist: > > FreeBSD have a native clustered filesystem. This is different than > shared media (we already can do that over fiber channel, ggated, soon Yes, the clustered filesystem will not run on SAN, since that will give a high cost. > iscsi and AOE). This would allow multiple servers to access the same > data read/write - highly important for load balancing applications like > web servers, mail servers, and NFS servers. http://www.netapp.com/tech_library/3022.html <-- this article give some info about the small file operations among the web, mail, IM, netdisk, blog, etc. service. and thats our DFS targets at ;) > > Online growable filesystem. I know I can growfs a filesystem now, but > doing online while data is being used is *insanely* useful. Reiserfs > and Polyserves FS (a clustered filesystem, not open-source) do this well. Yes, we also support that with our insanely mechanism. And you know in the current clustered fs, as GoogleFS, Lustre, etc. which can be built on online growfs. Thats also our way to do it. > > FreeBSDs UFS2 made to do journaling. Theres already someone working > on this. Good news. > > I believe the above mean that we need a distributed lock manager too, so > might as well add that to my wishlist. By the specific application & services, we can easily remove the distributed lock manager easily with upper layer way. You can read the GoogleFS paper to get some further info. > > Single filesystem limits set very high - 16TB would be a good minimum. The limits can be removed. > > Vinum/geom (?) made to allow added a couple more disks - be it a real > scsi device, or another vinum device - to existing vinums, so I can > extend my vinum stripe, raid, concat, etc to a larger volume size, > without worrying about which disk is where. I want to stripe mirrors of > raids, and raid striped mirrors of stripes. I know it sounds crazy, but > I really *do* have uses for all this. :) Yes, thats Lustres way, and we also add a logical disk layer to support it. > > We currently pay lots of money every year (enough to pay an engineers > salary) for support and maintenance with Polyserve. They make a good Would you like to persuade your company to sponse the developing ;) > > product (we need it for the clustered filesystem and NFS distributed > lock manager stuff) - Id much rather see that go to FreeBSD. At last, any help & donate & contribute among the requirements & tech. domains are great appreciated ! > > Eric > > > -- yf-263 Unix-driver.org
chifeng 回复于:2005-05-13 15:00:52yftty愿意为BSD做贡献啦。。。哈哈。 而且还有钱拿。。。。
riverfor 回复于:2005-05-13 15:05:58我也想写fs!
thzjy 回复于:2005-05-13 15:19:51largeness project
dtest 回复于:2005-05-13 16:33:32though i can not understand it completely, i think its a good idea.:)
kofwang 回复于:2005-05-14 11:22:43need to learn more advanced tech for understanding this artical
yftty 回复于:2005-05-15 21:54:46在 2005-05-11三的 22:45 -0500,Eric Anderson写道: > Ok - Im changing the subject here in an attempt to gather information. > > Heres my wishlist: As for your wishlists, how about the MogileFS of http://www.danga.com/mogilefs/ And what do you think about our GoogleFS like && MogileFS features Clustre FS ? Any comments are quite welcomed :)
yftty 回复于:2005-05-16 13:11:12看样子大家对英文不是很感冒,给点我们头人的中文的乐呵乐呵 我一直在构想一个基于类SMPP协议的公开协议的分布式网络存储系统。大家可以看到google发布过一个google fs的白皮书。实质上就是一个将fs的操作变为网络协议的操作的做法。最近手头上在帮助一个朋友完成了一个相关系统的设计的考虑。不知各位是否有兴趣一起来完成这样的一个项目,并且将它一直维护下去,也许将来它不止是一个python的实现,还会有c、java的实现。但是我相信python的实现会是最好的,就像现在的bt一样。 这样的分布式网络存储的用处会非常的多,如现在大家常在使用的大容量网络硬盘、gmail这样的大容量邮件系统、NNTP这样的大容量信息交互系统、Blog这样的大容量信息存储系统。 它的特点在于存储的内容多样化、存储的数据不能集中化、存储的数据会以用户/组/系统等为中心进行存储。 相关内容大家可以看看google fs。如果找不到我可以提供相关的pdf白皮书。 另:项目会开源(GPL或BSD)、项目会有实质的用所来证明我们的想法的正确性(我来解决测试环境的问题)。 ----HD
wheel 回复于:2005-05-16 13:47:49为何要基于类SMPP协议,不要基于bt
yftty 回复于:2005-05-16 13:53:24[quote:af0423eec1=wheel]为何要基于类SMPP协议,不要基于bt[/quote:af0423eec1] 具体的网络抽象层(NAL)正在选型,我上个季度用过CURL作了个DEMO. 后面可能会用类似 PVFS2 的网络层架构 文件访问支持 TFTP, FTP, HTTP, NFS, etc. 另: 现在看来还是用类似Lustre的Portals那样的东西吧:(
dtest 回复于:2005-05-16 13:53:38ok, i can take part in this project, how to start it? If python be used to develop, i think most of us must learn it at first.
yftty 回复于:2005-05-16 23:10:45some good talk on Spotlight on Tiger (Mac OS X) http://www.kernelthread.com/software/fslogger/ 这也是我们的设计所追求的目标: 表现层 (基于搜索的目录, 用户文件) 检索/搜索层 (搜索引擎) 存储层 (分布式文件系统)
sttty 回复于:2005-05-17 00:15:15好想法。支持。可惜我能力不够。不然我一定报名。 狂顶
ly_1979425 回复于:2005-05-17 09:18:18如果使用光盘作为近线存储介质,会更有效的发挥成本优势。 如果把现在的光盘库文件系统,如果ISO9660,UDF,JOLIET等光盘文件系统格式,以一种统一的网络文件系统的格式显现给用户,会极大的提高光盘在网络中的使用。如果光盘库这种设备。 这种大家存储很大的数据,但成本很便宜。光盘的成本远低于硬盘的成本。 我可以在这个方面与yftty合作。
xuediao 回复于:2005-05-17 09:28:56看了一下,基本了解了大概的事情。不过楼主能不能描述一下DFS将来的应用场景,和基于SMPP协议的考虑,这点我不是怎么明白。 However, my pleasure to join in this! :D
yftty 回复于:2005-05-17 10:45:22[quote:18810a6f1e=xuediao]看了一下,基本了解了大概的事情。不过楼主能不能描述一下DFS将来的应用场景,和基于SMPP协议的考虑,这点我不是怎么明白。 However, my pleasure to join in this! :D[/quote:18810a6f1e] 不好意思, 请看英文部分;) 现有的集群文件系统就我所了解到的好像没有基于SMPP的,我也从没看过SMPP协议的东西. 应用场景就是那种海量存储. 如WEB, MAIL, VOD/IPTV, 广电, 图书馆等. 比较熟悉的系统应用如:Google的LINUX机群系统,Yahoo的BSD Server机群系统。
yftty 回复于:2005-05-17 10:49:49[quote:0336c3ab45=ly_1979425]如果使用光盘作为近线存储介质,会更有效的发挥成本优势。 如果把现在的光盘库文件系统,如果ISO9660,UDF,JOLIET等光盘文件系统格式,以一种统一的网络文件系统的格式显现给用户,会极大的提高光盘在网络中的使用..........[/quote:0336c3ab45] 是的,本设计有这方面的考虑,如你前面所言. 将每个光盘文件系统的MetaData信息统一存储在MDS部分,完成Namespace解析功能, 使得到达光盘的指令仅为Seek和Read/Write Stripe操作, 会大大提高它的易用性. 同时光盘会大大降底使用成本如场地费用, 电费.
zhuwas 回复于:2005-05-17 13:10:57i can do it in my spare time , support , support !!!
yftty 回复于:2005-05-17 13:23:24或者你可以通过这种流程分析:) Ext3/UFS/ReiserFS ; NFS ; GlobalFS ; OpenAFS (Arla), Coda, Inter-mezzo, Lustre, PVFS2, GoogleFS. 因为我们的组内成员在扩大, 我一直在考虑如何使它像路边的大白菜一样普通; 而不是令人觉得突然在面前立起一个望不到头的高楼.
javawinter 回复于:2005-05-17 16:20:00友情支持 :D
zl_vim 回复于:2005-05-17 17:02:36是个什么dd? 怎么参与啊?
潇湘夜雨 回复于:2005-05-17 18:17:27支持一把。。。在IT职业生涯里也发一个吧
nemoliu 回复于:2005-05-17 23:00:19hehe,伴随着google的成功fs显得更加诱人了,如果有实力也很像参与
javawinter 回复于:2005-05-18 02:55:46有实力的都来加入吧:)
citybugzzzz 回复于:2005-05-18 08:45:05UpUp! 继续关注中。。。虽然项目很忙,但很乐于参与!
hdcola 回复于:2005-05-18 09:04:40很久没回来看了。我来告诉大家为什么当初会考虑smpp类的协议来做消息存储的分布式文件系统的一种协议。 1.smpp是全异步的协议,理论上可以非常多,但通常的应用中它通过十六到三十二个窗口来并发处理,从而达到在服务器端没有及时处理完工作的情况下在一个连接中处理下一个指令。这可以大量的减少服务器端的并发连接数量。 2.消息类存储写后都不会有大量的改。这样在save时可以考虑使用存储转发机制,在服务器端难以响应或出问题时解决消息的问题。 这只是一个建议。多一个想法而已。 ^_^
yftty 回复于:2005-05-18 10:11:29[quote:1c0f55756a=hdcola]很久没回来看了。我来告诉大家为什么当初会考虑smpp类的协议来做消息存储的分布式文件系统的一种协议。 1.smpp是全异步的协议,理论上可以非常多,但通常的应用中它通过十六到三十二个窗口来并发处理,从而达到在服..........[/quote:1c0f55756a] 欢迎大家多提意见和建议 >_> 我们都会在选型中作对应的评估和测试 :) 具体的工作会分为 client, data server, metadata server, namespace, datapath, log, recovery, networking (or on wire protocol), migration/replication, utilities, etc. 几部分. 欢迎大家就感兴趣的部分参与到工作中来. 或者可以分几个主题分别讨论相关的技术领域. 算是我们作分布式协作的尝试;) 欢迎大家也就开源协作模式作讨论
mozilla121 回复于:2005-05-18 15:15:28頂一下
nizvoo 回复于:2005-05-18 15:58:58i wanna do some part!
yftty 回复于:2005-05-18 16:13:51[quote:abdc530327=nizvoo]i wanna do some part![/quote:abdc530327] If you said the great golden saying I wanna do some part!, please recite your tech. background or interests domain so as I can give more info to let you get into the work smoothly. Speak another way, do you consider as I say : Just do it ! make sense ;)
uplooking 回复于:2005-05-18 16:47:21yftty 大侠的东西要顶,再说这个东西学会了会有很好的发展呀
yftty 回复于:2005-05-18 16:55:37http://tech.sina.com.cn/it/2005-05-08/0920600573.shtml 新华网北京5月7日电 (记者 李斌) 中国青年软件振兴计划工作委员会等单位日前进行的一项4400多人的“中国软件人才生存状况”调查表明,中国软件人才不仅“后继乏人”,而且由于培训缺乏、教育模式等原因“后继乏力”。 软件业知识更新速度快,然而调查发现,60%的国内软件企业没有对员工提供必要的职业规划,表明国内软件企业在员工培训方面不够重视。 调查表明,虽然大部分软件从业人员都希望自己可以通过培训提高自身能力,可是社会环境却很难提供这样的机会:一方面是供职的企业不支持,另一方面是社会上能够及时提供新技术培训的机构少之又少。 77%的软件从业人员的工作时间在8个小时以上,处于中间层次的程序员们没有时间去接受新的技术、新的理念,没有时间去提高自身能力。大多数软件专业本科毕业生月工资水平在2000元左右,年薪能够达到10万元的软件人才估计不足全部软件从业人员的5%。调查发现,教育体制的落后导致了软件专业毕业生缺乏实际编程能力,无法适应企业的实际需要。而软件企业自身又不愿提供相应的培训,这样一来编程人员的数量几乎是处在一种“净减”状态。 同时,中国缺少专门的软件开发管理人才培训机构,只有自身具备良好管理天赋的软件工程师或者程序员幸运地成为软件开发管理人员,出现了“软件人才就业难”和“软件企业招不到合适员工”的怪现象。 ------------- 希望Uplooking.com能为这个行业培养出更多的系统级开发人才 :)
nizvoo 回复于:2005-05-18 17:24:423 years c /windows/opengl/dx
yftty 回复于:2005-05-18 17:38:33[quote:4397799f7e=nizvoo]3 years c /windows/opengl/dx[/quote:4397799f7e] 本季度属于孕酿阶段,这个季度末我会向公司汇报或探讨可能的运作形式;请大家也就这方面提供意见和建议.关于像一个这样的项目的生存和发展. 使这个成为一个成功的行业级软件,并取得强大的生命力. 同时从这个贴子开始作起,去探索一个东东如何去保持其持续的生命力;) 年青,美丽, 永远!
yftty 回复于:2005-05-18 17:39:26http://lists.danga.com/pipermail/mogilefs/2004-December/000018.html On Dec 20, 2004, at 11:50, Brad Fitzpatrick wrote: Excellent! I did a project implementing exactly same idea two years ago for a project related to storage of mail messages for GSM carrier and can appreciate the beauty of the solution! It is great to have such product in open source.
uplooking 回复于:2005-05-18 17:47:59这个东西国内做的人多吗?
yftty 回复于:2005-05-18 18:29:21不多,但想想刚开始或现在华为作电信设备的时候也没多少人,所以他每年需要培养那么多;) 人们总喜欢称商业规则为 Game Rule, Game 也可以说是个赌博, 所以对公司在说,在一定程度上他是在赌大众心理. 赌对了的就活的舒服一点, 你觉得行业的趋势和大众的心理在哪里呢? 这样说对你有吸引力么;) http://www.blogchina.com/new/display/72595.html 遗憾人物”的最大缺陷就是资源利用和行业整合能力的欠缺,以及企业管理能力的平庸。
sttty 回复于:2005-05-18 23:55:59将此项目支持到底。有机会,好好学学。 说到uplooking 课程。前几天去听公开课。感觉不错,课程很实用。我发现听课的人水平都不低。 当时感觉很惭愧。 :oops:
yftty 回复于:2005-05-19 09:36:25[quote:7d30a9145e=sttty]将此项目支持到底。有机会,好好学学。 说到uplooking 课程。前几天去听公开课。感觉不错,课程很实用。我发现听课的人水平都不低。 当时感觉很惭愧。 :oops:[/quote:7d30a9145e] 对于一个社团来说, 它存在的价值在于: 首先它能帮助大家成长, 其次它能大家带来更多的机会. 请发布宣传性的东东如上以此为出发点;) 呵呵
nizvoo 回复于:2005-05-19 09:46:41ok, i know it. I need learn more FS knowledge. keep touch.my mail : nizvooATgmail.com.
deltali 回复于:2005-05-19 10:11:26whats the role of locks in a distributed filesystem? thanks!
yftty 回复于:2005-05-19 11:03:55[quote:042aeff932=deltali]whats the role of locks in a distributed filesystem? thanks![/quote:042aeff932] The locks in a distributed filesystem is managed by Distributed Lock Manager (DLM), A distributed filesystem need to addressing the problem of delivering aggregate performance to a large number of clients. DLM is the basis of scalable clusters. In a DLM based cluster all nodes can write to all shared resources and co-ordinate their action using the DLM. This sort of technology is mainly intended for CPU and/or ram intensive processing, not for disc intensive operations nor for reliblity. Digital > Compaq > HP... HP own the Digital DLM technology, available in Tru64 Unix (was Digital Unix and OpenVMS 8.) Compaq/HP licensed the DLM technology to Oracle who have base their cluster/grid software on the DLM Sun Solaris also has a DLM based cluster technology. Now Sun and HP are fighting blog wars... http://blogs.zdnet.com/index.php?p=661&tag=nl.e539 http://www.chillingeffects.org/responses/notice.cgi?NoticeID=1460 Where I see DLM being good is for rendering and scientific calculation. These processes could really benifit from having a central data store but will not put a huge load on the DLM hardware.. Some more deeply knowledge: http://kerneltrap.org/mailarchive/1/message/56956/thread http://kerneltrap.org/mailarchive/1/message/66678/thread http://lwn.net/Articles/135686/ Clusters and distributed lock management The creation of tightly-connected clusters requires a great deal of supporting infrastructure. One of the necessary pieces is a lock manager - a system which can arbitrate access to resources which are shared across the cluster. The lock manager provides functions similar to those found in the locking calls on a single-user system - it can give a process read-only or write access to parts of files. The lock management task is complicated by the cluster environment, though; a lock manager must operate correctly regardless of network latencies, cope with the addition and removal of nodes, recover from the failure of nodes which hold locks, etc. It is a non-trivial problem, and Linux does not currently have a working, distributed lock manager in the mainline kernel. David Teigland (of Red Hat) recently posted a set of distributed lock manager patches (called dlm), with a request for inclusion into the mainline. This code, which was originally developed at Sistina, is said to be influenced primarily by the venerable VMS lock manager. An initial look at the code confirms this statement: callbacks are called ASTs (asynchronous system traps, in VMS-speak), and the core locking call is an eleven-parameter monster: int dlm_lock(dlm_lockspace_t *lockspace, int mode, struct dlm_lksb *lksb, uint32_t flags, void *name, unsigned int namelen, uint32_t parent_lkid, void (*lockast) (void *astarg), void *astarg, void (*bast) (void *astarg, int mode), struct dlm_range *range); Most of the discussion has not been concerned with the technical issues, however. There are some disagreements over issues like how nodes should be identified, but most of the developers who are interested in this area seem to think that this implementation is at least a reasonable starting point. The harder issue is figuring out just how a general infrastructure for cluster support can be created for the Linux kernel. At least two other projects have their own distributed lock managers and are likely to want to be a part of this discussion; an Oracle developer recently described the posting of dlm as a preemptive strike. Lock management is a function needed by most tightly-coupled clustering and clustered filesystem projects; wouldnt it be nice if they could all use the same implementation? The fact is that the clustering community still needs to work these issues out; Andrew Morton doesnt want to have to make these decisions for them: Not only do I not know whether this stuff should be merged: I dont even know how to find that out. Unless Im prepared to become a full-on cluster/dlm person, which isnt looking likely. The usual fallback is to identify all the stakeholders and get them to say yes Andrew, this code is cool and we can use it, but I dont think the clustering teams have sufficent act-togetherness to be able to do that. Clustering will be discussed at the kernel summit in July. A month prior to that, there will also be a clustering workshop held in Germany. In the hopes that these two events will help bring some clarity to this issue, Andrew has said that he will hold off on any decisions for now.
wolfg 回复于:2005-05-19 14:36:08关注
ufoor 回复于:2005-05-19 23:38:16看的有些晕了,还得多学 相关的东西还是先看中文的比较好些,效率高些.如果中文的没有再看英文的
Zer4tul 回复于:2005-05-20 03:08:11好像是HD想出的主意吧?不错啊……可惜我水平不够……就在一边加油好了……过两天仔细看看Google FS的文档。
yftty 回复于:2005-05-20 08:15:56[quote:89ea8253f6=ufoor]看的有些晕了,还得多学 相关的东西还是先看中文的比较好些,效率高些.如果中文的没有再看英文的[/quote:89ea8253f6] 看中文的有利于迅速建立相关的概念, 但几个概念建立起来之后, 就不要看中文的了, 否则会越看越糊涂.
yftty 回复于:2005-05-20 08:19:46[quote:f7a8ff7b78=Zer4tul]好像是HD想出的主意吧?不错啊……可惜我水平不够……就在一边加油好了……过两天仔细看看Google FS的文档。[/quote:f7a8ff7b78] hehe, HD can be considerred the Godfather of the Project ! Also great project need great man. Do you want to let me know and merge your brilliant ideas as what to do or how to do. Lets inspiring each to other ;-) !
akadoc 回复于:2005-05-20 13:17:40up,up,up。关注中。。。
yftty 回复于:2005-05-20 17:03:08[quote:8e1053c4e6=akadoc]up,up,up。关注中。。。[/quote:8e1053c4e6] 您想关注那一点或哪一部分呢,是组织还是技术呢,还是技术的哪一部分呢:) 请看与我们类似的MogileFS提供的Features. http://www.danga.com/mogilefs/ MogileFS is our open source distributed filesystem. Its properties and features include: * Application level -- no special kernel modules required. * No single point of failure -- all three components of a MogileFS setup (storage nodes, trackers, and the trackers database(s)) can be run on multiple machines, so theres no single point of failure. (you can run trackers on the same machines as storage nodes, too, so you dont need 4 machines...) A minimum of 2 machines is recommended. * Automatic file replication -- files, based on their class, are automatically replicated between enough different storage nodes as to satisfy the minimum replica count as requested by their class. For instance, for a photo hosting site you can make original JPEGs have a minimum replica count of 3, but thumbnails and scaled versions only have a replica count of 1 or 2. If you lose the only copy of a thumbnail, the application can just rebuild it. In this way, MogileFS (without RAID) can save money on disks that would otherwise be storing multiple copies of data unnecessarily. * Better than RAID -- in a non-SAN RAID setup, the disks are redundant, but the host isnt. If you lose the entire machine, the files are inaccessible. MogileFS replicates the files between devices which are on different hosts, so files are always available. * Transport Neutral -- MogileFS clients can communicate with MogileFS storage nodes (after talking to a tracker) via either NFS or HTTP, but we strongly recommend HTTP. * Flat Namespace -- Files are identified by named keys in a flat, global namespace. You can create as many namespaces as youd like, so multiple applications with potentially conflicting keys can run on the same MogileFS installation. * Shared-Nothing -- MogileFS doesnt depend on a pricey SAN with shared disks. Every machine maintains its own local disks. * No RAID required -- Local disks on MogileFS storage nodes can be in a RAID, or not. Its cheaper not to, as RAID doesnt buy you any safety that MogileFS doesnt already provide. * Local filesystem agnostic -- Local disks on MogileFS storage nodes can be formatted with your filesystem of choice (ext3, ReiserFS, etc..). MogileFS does its own internal directory hashing so it doesnt hit filesystem limits such as max files per directory or max directories per directory. Use what youre comfortable with. MogileFS is not: * POSIX Compliant -- you dont run regular Unix applications or databases against MogileFS. Its meant for archiving write-once files and doing only sequential reads. (though you can modify a file by way of overwriting it with a new version) Notes: o Yes, this means your application has to specifically use a MogileFS client library to store and retrieve files. The steps in general are 1) talk to a tracker about what you want to put or get, 2) read/write to the NFS path for that storage node (the tracker will tell you where) or do an HTTP GET/PUT to the storage node, if youre running with an HTTP transport instead of NFS (which is highly recommended) o Weve briefly tinkered with using FUSE, which lets Linux filesystems be implemented in userspace, to provide a Linux filesystem interface to MogileFS, but we havent worked on it much. * Completely portable ... yet -- we have some Linux-isms in our code, at least in the HTTP transport code. Our plan is to scrap that and make it portable, though.
scrazy77 回复于:2005-05-20 20:50:59[quote:01064cbd68=yftty] 您想关注那一点或哪一部分呢,是组织还是技术呢,还是技术的哪一部分呢:) 请看与我们类似的MogileFS提供的Features. http://www.danga.com/mogilefs/ MogileFS is our open source distributed filesystem..........[/quote:01064cbd68] MogileFS 可視為簡單版的google gfs ?作, 概念上是很接近的, 只是其最小單位是以 file為主,而google gfs最小單位是一個Chunk (64MB) 但目前使用MogileFS 要用application client?韆ccess, 使用上的方便性還是不如像RedHat GFS這?的 Distribute share storage, 或Netapp Filer... 當然MogileFS可能是最便宜的solution 目前在我內部的cluster已經在進行測試, 使用php的client,應用於多server access的blog & album system, 如要?作為POSIX filesystem,使用FUSE應該是可以很快作出?恚? danga他們好像也有此計? Eric Chang
yftty 回复于:2005-05-21 00:30:32> MogileFS 可視為簡單版的google gfs ?作, > 概念上是很接近的, 是的,都属于非对称式集群文件系统的用户空间实现的一个子集 同时它们可以被看作是文件管理库函数,而不是个文件系统. > 只是其最小單位是以 file為主,而google gfs最小單位是一個Chunk (64MB) MogileFS 以 File 为最小管理单位, 所以只需要处理文件名字空间,无需处理磁盘块空间. GoogleFS 将原来的磁盘块操作提升为基于文件的 Chunk (64MB) 操作,以使存储管理有个合适的管理最小细度,降底用于管理方面的开销. > 但目前使用MogileFS 要用application client?韆ccess, > 使用上的方便性還是不如像RedHat GFS這?的 Distribute share storage, GFS 属于基于SAN的对称式的分布式文件系统 > 或Netapp Filer... Netapp Filer 属于优化的NFS Server > 當然MogileFS可能是最便宜的solution > 目前在我內部的cluster已經在進行測試, Good job ! > 使用php的client,應用於多server access的blog & album system, > 如要?作為POSIX filesystem,使用FUSE應該是可以很快作出?恚? 这个应该是说的开发流程;) 我们刚开始也是这个思路,但由此带来的工作量大大增加,所以就不在FUSE里面作试验了. > danga他們好像也有此計? > Eric Chang
我菜我怕谁 回复于:2005-05-21 09:09:00嗨,unix本身偶还没有搞懂,还是潜水吧!!
yftty 回复于:2005-05-21 10:36:30[quote:1f79b891a3=我菜我怕谁]嗨,unix本身偶还没有搞懂,还是潜水吧!![/quote:1f79b891a3] HOHO,这个在一定程度上和Unix没关系;) 我也很很不是很明白Unix,呵呵; IT业作为由美国主导,硅谷精英发起的消费型经济,向以眼花缭乱的概念为噱头赢利,从而令大众的购买力大大超支. 同时,他们不但构造了技术壁垒,市场壁垒,还有这种心理上的壁垒. :em03: 莫要被它吓倒喔. 大项目都是纸老虎,要从战略上鄙视它,这样才能从战术上操纵它 :em02: 再大的项目每个人所参与的都是一小部分,但我是否因为这一小小的一部分,可以说我在参与了这个领域,或这个社会的进步了呢 ;) ;漫长的历程仅仅是因为目标的不明确:) 附: 王国维所言的作事情的三种境界 -- 1. 昨夜西风调碧树,独上高楼,望尽天涯路! 2. 衣带渐宽终不悔,为伊消的人憔悴. 3. 众里寻它千百度,蓦然回首,那人却在,灯火阑珊处. (是你么 :em18: )
kofwang 回复于:2005-05-21 10:45:58有道理,不过你算是找对了方向。对于一般人来说: 1、昨夜烧酒空寒心,欲上高楼,无觅天涯路 2、体力透支终不支,钱包依旧若空池 3、杀场拼争三百年,卸甲归田,却发现,无家可归 :em16:
sttty 回复于:2005-05-21 10:47:34好一个 1. 昨夜西风调碧树,独上高楼,望尽天涯路! 2. 衣带渐宽终不悔,为伊消的人憔悴. 3. 众里寻它千百度,蓦然回首,那人却在,灯火阑珊处 一句话惊醒梦中人呀
kofwang 回复于:2005-05-21 10:53:52“以眼花缭乱的概念为噱头赢利” 如今正是概念经济大行其道的时候。对于中国人来说,“家庭影院”,“自驾游”,“三个代表”,吸引了多少眼球阿
yftty 回复于:2005-05-21 10:58:34[quote:834d16ab05=kofwang]有道理,不过你算是找对了方向。对于一般人来说: 1、昨夜烧酒空寒心,欲上高楼,无觅天涯路 2、体力透支终不支,钱包依旧若空池 3、杀场拼争三百年,卸甲归田,却发现,无家可归 :em16:[/quote:834d16ab05] 在牢房里望出去,一人看到了泥土,一人看到了星星 :wink: 人更多的是在看曲折后的坦途;所以这也是悲剧如<梁祝>更容易流传于世一样 在病态的执着后面你是否有这样的感受,早上总是被惊醒,但又不知道在担心或该担心什么?
akadoc 回复于:2005-05-21 14:23:17[quote:153f9ab03f=yftty] 对于一个社团来说, 它存在的价值在于: 首先它能帮助大家成长, 其次它能大家带来更多的机会. 请发布宣传性的东东如上以此为出发点;) 呵呵[/quote:153f9ab03f] Hoping to see a team as U say,in this project!
chifeng 回复于:2005-05-21 22:37:24不知道像我这样的菜鸟能否帮上忙? 做点具体的事情.....:)
tclwp 回复于:2005-05-22 17:25:14如果整和进新的开拓性技术,前途光明
yftty 回复于:2005-05-22 20:56:29[quote:1817ff4e36=akadoc] Hoping to see a team as U say,in this project![/quote:1817ff4e36] 团队已经建立起来了。目前有两位成员,第三位会在七月份到位;) 都有分布式文件系统的成功产品经验 :idea: 当然希望有更多的人参与到我们的工作中来 :em02: 和我们一起探索这方面的技术和相关的管理※工程经验。 :em02:
yftty 回复于:2005-05-22 20:59:56[quote:b86bd17476=chifeng]不知道像我这样的菜鸟能否帮上忙? 做点具体的事情.....:)[/quote:b86bd17476] 呵呵,人因为工作而有相应的水平,而不是有了那个水平才去做那个事情。成长应该是一个人毕生的追求,所以我们总是在用已知的去探索未知的;) 我们一直在努力 ! :em02:
sttty 回复于:2005-05-22 22:45:32成功的人都是这样一步步走出来的。希望我在几年后,也延续这条路走下去。
yftty 回复于:2005-05-22 23:46:03[quote:fe07ef93f7=tclwp]如果整和进新的开拓性技术,前途光明[/quote:fe07ef93f7] 像一个这样的或类似的项目,研发(新技术)的风险是相对来说比较小的,更大的是在工程方面.呵呵,通过作这件事情,我也渐渐明白了Google.com的两个创始人为什么一个负责技术,一个负责工程(当然我的理解可能有偏差). 在这样一个系统里,任何一个单独部分拿出来,都是比较简单的东西,并且从其它许多地方都能看到它的影子。但所有的东西整合到一起的时候,或我们通常说的形成一个系统的时候,技术的复杂性就上来了。特别是商业关键业务系统,其复杂性就更加明显。比如:一个大型的并发系统存在着非常多的Corner Cases, 优化的部分非常多从而难于把握具体的原因。而性能往往就是这个工程追求的唯一目标 :em03: 大家多支持多探讨 :)
whoto 回复于:2005-05-23 10:29:19我不懂Google fs,我对yfttyFS(姑且这么叫)理解是: 在一个虚拟的yfttyFS根文件系统下,提供提供对多种存储设备、多种文件系统、多种操作系统提供的存储空间、多种协议、包括yfttyFS本身的连接(挂接)能力,形成一个统一的存储系统,提供存储服务。 望高手多指教。 yfttyFS/--yfttyFS/X1 | --yfttyFS/X2 | --yfttyFS/X... | -/Xdev/--HD | |--SCSI | |--CD | |--DVD | |--etc. | -/Xfs/--UFS | |--UFS2 | |--Ext2 | |--NTFS | |--ISO9660 | |--etc. | -/Xsys/--BSD(s) | |--Linux(s) | |--Windows(s) | |--UNIX(s) | |--etc. | -/Xprotocol/--TFTP | |--FTP | |--HTTP | |--NFS | |--etc. | --/etc. | WEB --| MAIL --| VOD/IPTV --|---base on--yfttyFS Library --| etc. --|
yftty 回复于:2005-05-23 11:28:33hehe, I never think about and dare not to name it xxxFS as you said. As most ideas are stole from various resources, and there are members in our team much more intelligent than I. Here I disclose it just to want more insight into our project, as to benifit to the project and guys who contribute. Yes, seems you really know what we want to do ;) Yes, the storage is a pool, and is always on-demand ! As the air around you. And the tricky for my nickname: Here I can see your masterpiece saying is cause now yf is before a tty ;)
Solaris12 回复于:2005-05-25 18:43:30[quote:03acc5f034=yftty] 团队已经建立起来了。目前有两位成员,第三位会在七月份到位;) 都有分布式文件系统的成功产品经验 :idea: 当然希望有更多的人参与到我们的工作中来 :em02: 和我们一起探索这方面的技术和相关的管理※工..........[/quote:03acc5f034] 怎么和你联系,对这个项目很感兴趣, 可以在技术和工程管理方面多多交流。
yftty 回复于:2005-05-27 00:24:38[quote:066eb2232d=Solaris12] 怎么和你联系,对这个项目很感兴趣, 可以在技术和工程管理方面多多交流。[/quote:066eb2232d] 工程管理方面我们准备使用 PSP/TSPi and XP , 欢迎大家就这方面探讨 :em02: 另: 书都买了,还没来得及看 :em06:
javawinter 回复于:2005-05-27 01:15:51UP
Solaris12 回复于:2005-05-27 13:03:10[quote:28316166f7=yftty] 工程管理方面我们准备使用 PSP/TSPi and XP , 欢迎大家就这方面探讨 :em02:[/quote:28316166f7] 恕本人无知,PSP/TSPi是什么? XP是指极限编程吗? 根据我的理解,XP比较适合开发人员少,以客户需求为导向的项目。FS的产品不必要套用XP。 当然,在软件开发中确实有很多best practice,我们可以根据自己的实际情况作出相应的调整,找到效率和流程的平衡点: 1. 关于SCM: 要做好一个产品,必须制定关于SCM的一系列政策和标准,主要在一下几方面: 版本控制管理 变化跟踪管理 2.关于process 需要制定代码集成的一些标准。 开发:概念性文档-->开发-->code review->代码集成 测试:测试计划-->测试开发-->测试->测试报告 对于比较小和资源有限的开发团队,SCM和process不宜搞得复杂,尽量减少开发文档,强化配置管理和code review 测试方面,最好能找到开源的测试工具,但这就要求,FS的编程接口不能是专有的,应尽量符合某种标准
yftty 回复于:2005-05-27 13:48:07(13:43:29) j-fox: 不管用什么管理模式,作好计划(各种计划,特别是风险应对计划)和状态监控是最主要的,先先开始拿一个小任务去尝试找到适用的方法 (13:45:45) j-fox: 先准备好开发文档 (13:46:04) yftty -- A dream makes a team, and the team builds the dream !: 好,我先把你的贴上
xuediao 回复于:2005-05-27 14:10:04[quote:4a2325b636=Solaris12] XP比较适合开发人员少,以客户需求为导向的项目。[/quote:4a2325b636] 如同Solaris12所说,XP是一个强调快速灵活,而PSP和TSPi是CMMi的一个延伸,强调计划和过程控制。 虽然说这是一个大型的工程项目,又以分布式开发为主,但同时实施这两个方法难度很大啊。 在这两个方法中取得平衡点,说不定即将开创一个新式的软件工程学,呵呵 :D
xuediao 回复于:2005-05-27 14:16:26[quote:0f936c6c7e=yftty](13:43:29) j-fox: 不管用什么管理模式,作好计划(各种计划,特别是风险应对计划)和状态监控是最主要的,先先开始拿一个小任务去尝试找到适用的方法 (13:45:45) j-fox: 先准备好开发文档 (13:46:04) yftty -- ..........[/quote:0f936c6c7e] 我比较赞同j-fox的观点,开发状态监控和风险应对是最重要的,如果单纯公司内部开发可能实施TSP要容易得多,对于国内的分布式开发,这算是一个尝试和学习的过程吧。
mozilla121 回复于:2005-05-27 14:29:27嚴格使用這套流程在執行上會比較難. 只有一個非常認同這種流程的團對才有可能執行下去.
yftty 回复于:2005-05-27 14:51:16[quote:9f96c85f21=xuediao] 如同Solaris12所说,XP是一个强调快速灵活,而PSP和TSPi是CMMi的一个延伸,强调计划和过程控制。 虽然说这是一个大型的工程项目,又以分布式开发为主,但同时实施这两个方法难度很大啊。 在这两个方法中取得?..........[/quote:9f96c85f21] 是啊,西学为用,中学为体;) 现在仅是模仿一点点了,对这件事情本身的理解也在不断深化中; 用了XP的增量模式, 和TSP的监控和评估. 我现在也是边学边卖, 可能给整的有点不伦不类了吧, 但愿那可以被成为是创新;) 现在在把目前的用户空间的实现往FreeBSD的Kernel里面挪, 真是感激六祖惠能所创的禅宗里的顿悟.
yftty 回复于:2005-05-27 14:54:38[quote:c51d41c4cb=mozilla121]嚴格使用這套流程在執行上會比較難. 只有一個非常認同這種流程的團對才有可能執行下去.[/quote:c51d41c4cb] 自知,自胜;知足,强行. -- <<道德经>>
xuediao 回复于:2005-05-27 14:54:46呵呵,这也是中庸之道,抑或是新式的洋务运动吧 小平哥说得好,管他黑猫白猫,能逮老鼠就是好猫!
Solaris12 回复于:2005-05-28 21:03:45[quote:6b9d40f3e2=xuediao] 如同Solaris12所说,XP是一个强调快速灵活,而PSP和TSPi是CMMi的一个延伸,强调计划和过程控制。 虽然说这是一个大型的工程项目,又以分布式开发为主,但同时实施这两个方法难度很大啊。 在这两个方法中取得?.........[/quote:6b9d40f3e2] 其实CMM这类东西非常适合外包公司做的。 我所在的开发团队,即不是XP,也不是CMM,但是却非常有效。 而且,你会在里面找到其他软件工程方法的影子, 所以,任何流程部重要,最重要的是和你拥有的资源匹配, 在我看来,很多国内软件公司最大的问题主要是以下几点: 1. SCM(软件配置管理)方面 没有称职的release engineer. 无法做到真正的版本管理 没有变化跟踪管理系统,无法捕捉系统的每一个变化 没有daily build,没有automatic 的 sanity test 和system test. 更重要的是,很多公司建立项目之初,就没有统一的 SCM的政策,比如code integreate criteria 2. 开发流程方面 没有民主权威机构来控制市场和软件体系结构的需求及功能改变 没有code review 没有automatic的regression test对应每一个daily build 不过任何软件工程和方法都是要占用额外资源的, 关键是每一个软件公司都能认识并且投入。 其实仔细看很多知名的开源项目的开发模式, 以上这些东西都能很好的满足,比如说: 你可以随时拿到它的daily build或者snapshot, 看到该build是否通过测试。还有bugtraq系统, 记录到了每一次的改动,包括bugfix,和新功能
yftty 回复于:2005-06-01 12:37:15To Solaris12, 现在也是按照你所说的思路去一步步实施的,但还没有建立起来. 1.SCM, 现在仅仅是简单的Commit Rules (参照的是Lustre的流程).也是为了和现有的资源相匹配. 2. 开发流程, 现在仅有设计Rivew.其它的需要人员去建立. 另: 现在突然觉得有点丢掉了那曾经熟悉的东西.
james.liu 回复于:2005-06-01 13:42:18看完这个帖子,第一印象不是这个项目或者牵涉的技术,而是yftty这个家伙 很能侃。 我不懂,但是我想看看,,,我该通过何种方式来旁观这个项目呢?
yftty 回复于:2005-06-01 14:05:20[quote:9d7fb121e5=james.liu]看完这个帖子,第一印象不是这个项目或者牵涉的技术,而是yftty这个家伙 很能侃。 我不懂,但是我想看看,,,我该通过何种方式来旁观这个项目呢?[/quote:9d7fb121e5] 技术方面有明确的问题我还是回答的,比如前面关于分布式锁的(distributed lock, as dlm). 如何旁观或参与,这也是这个发贴的意图. 对于国内和国外的系统和内核开发来说,就我的感觉也没有太好的入手方式. Kernel-mentors mailing list 算是个这方面的尝试,并且显示了初步的效果. 当然我很抱歉我说的话令您或者其他人产生误解或其它意思. 但我相信每个人都希望给自己和他人以成长的机会. 同时我感觉一个人的做事方式和他的个人性格有很大的关系.没想清楚的事情,在风险可以承担的情况下,我会先把它丢出去再根据情况作随机. 就如踢足球,无法进攻的时候先把球传给对方前锋.
风暴一族 回复于:2005-06-03 09:26:48不错的说~
yftty 回复于:2005-06-07 09:41:22here is the current sanity testing & results [yf@yftty xxxfs]$ tests/xxxfs_sanity -v 000010:000001:1118108292.377965:4560:(socket.c:63:xxxfs_net_connect()) Process entered config finished, ready to do the sanity testing ! xxxFS file creation testing succeeded ! xxxFS file read testing succeeded ! xxxFS file deletion testing succeeded ! xxxFS Sanity testing pid (4560) succeeded 1 ! [yf@yftty xxxfs]$
yftty 回复于:2005-06-07 11:38:57项目到现在已快要过两个季度,经过这些时间的实践和思考 我的浅见是这个项目从流程来说上面的发贴所谈的已经比较完善了 从分工和组织来说,大家看下面的是否合适? __________ __________ | 理论指导 | <-> | 开发指导 | ---------- ---------- | / | | / | ------ ------ -------- | 研发 | <-> | 开发 | <-> | 测试 | ------ ------ -------- 另: 这样还是有问题, 晕.
yftty 回复于:2005-06-08 09:30:30Dan Stromberg wrote: > The lecturer at the recent NG storage talk at Usenix in Anaheim, > indicated that it was best to avoid active/active and get > active/passive instead. > > Does anyone: > > 1) Know what these things mean? In the clustering world, active/active means 2 or more servers are active at a time, either operating on separate data (and thus acting as passive failover partners to each other), or operating on the same data (which requires the use of a cluster filesystem or other similar mechanism to allow coherent simultaneous access to the data). > 2) Know why active/passive might be preferred over active/active? Well, if youre talking about active/passive vs. active/active with a cluster filesystem or such, the active/passive is tons easier to implement and get right. Plus, depending on your application, the added complexity of a cluster filesystem might not actually buy you much more than you could get with, say, NFS or Samba (CIFS). -- Paul
yftty 回复于:2005-06-08 11:00:35http://tech.blogchina.com/53/2005-06-07/372338.html 想了解Google的企业文化,需要从Google创立时的一个插曲开始:当谢尔盖·布林(Sergey Brin)和拉里·佩奇(Larry Page)想将自己的网络梦想付诸实际,最大的障碍是,他们并没有足够的资金来购买价格昂贵的设备。于是两人花费数百美元购买了一些个人电脑来代替那些数百万美元的服务器。 在实际应用中,这些普通电脑的故障率自然要高于专业服务器。他们需要确保任何一台普通电脑发生故障时都不会影响到用户正常得出搜索结果,于是Google 决定自己开发软件工具来解决这些问题。比如Google文件系统。这种文件系统不仅能够高效处理大型数据,还能够随时应付突然发生的存储故障。配合 Google的三重备份体制,这些个人电脑组成的系统就可以完成那些服务器的工作。 而这种遇到任何问题都全力解决之的理念,极大的影响了后来Google的文化。至今,Google依旧保持着网络公司的风貌。拥有2700名员工的公司总部里有900人是技术人员,而且在这里没有几间办公室。在施密特衣柜般的小办公室楼下,布林和佩奇共用一间办公室。而那里就像一间大学宿舍,里面摆着冰球装备、滑板和遥控飞机模型、懒人椅等等。 ... 没有人质疑Google拥有魔幻般的技术和创新,但没有一家伟大的公司仅仅依靠出色的技术而成为世界级的公司。伟大的公司需要伟大的管理来帮助公司更上层楼。谁是Google的灵魂?当然是布林、佩奇再加上施密特组成的三人组。但谈到管理层面,49岁的施密特的确起到了至关重要的作用。 49岁的施密特曾经是Sun公司的CTO以及Novell公司的CEO,他至今仍清晰记得刚到这家公司时董事会对他的交待:“别把公司弄糟了,艾利克。公司的起点非常非常好,可别进行太大的改革。”他完全理解投资者的担心,他们不想这家创造力十足的公司变得僵化死板。 1999年施密特刚到这家公司的时候这里根本谈不上有什么管理,但他也不想照搬传统大公司那一套管理方法,他希望根据实际情况形成Google自己的管理模式。大多数情况下施密特和2位创始人一起行动,作出决策。通常情况下是施密特主持管理层会议,而2位创始人主持员工会议。当遇到重大问题需要解决的时候,Google3人组就会根据少数服从多数的基本规则作出决定。并且许多决定他们是当着员工的面得出结果的。公司管理层刻意保持企业文化中率直、自由的工程师文化,他们认为这是他们抗衡Yahoo和微软这样大规模公司的有力武器。 哈佛商学院教授大卫·友菲(David Yoffie)却并不看好这种管理模式:“如果很多人同时作决定,那等于没有决定任何事情。在Google每天会同时作出成千上万的计划,需要有一个人作出最终决断。” 施密特表示实际上他所扮演的角色更倾向于COO。他以雅虎和eBay举例来说,在这些公司里都是创始人来制定远景战略,尽管他们并不拥有首席执行官的头衔。但施密特的支持者认为,这名CEO的个人风格掩盖了他在公司中的实际地位。而曾经担任CEO的佩奇如今担任产品总裁。前董事长布林则担任技术总裁。而施密特则在过去的4年中为Google搭建了完善的架构。 布林和佩奇的管理哲学完全源于他们当初所在的斯坦福大学计算机科学实验室。Google的经理很少要求那些工程师去完成什么项目,取而代之的则是公司会宣布一个100项优先完成项目列表,工程师们根据自己的喜好参加不同的流动工作组,以周或者月为时间单位完成工作。
liuzhentaosoft 回复于:2005-06-10 23:49:57openMosix: 5.1 What Is openMosix? Basically, the openMosix software includes both a set of kernel patches and support tools. The patches extend the kernel to provide support for moving processes among machines in the cluster. Typically, process migration is totally transparent to the user. However, by using the tools provided with openMosix, as well as third-party tools, you can control the migration of processes among machines. Lets look at how openMosix might be used to speed up a set of computationally expensive tasks. Suppose, for example, you have a dozen files to compress using a CPU-intensive program on a machine that isnt part of an openMosix cluster. You could compress each file one at a time, waiting for one to finish before starting the next. Or you could run all the compressions simultaneously by starting each compression in a separate window or by running each compression in the background (ending each command line with an &). Of course, either way will take about the same amount of time and will load down your computer while the programs are running. However, if your computer is part of an openMosix cluster, heres what will happen: First, you will start all of the processes running on your computer. With an openMosix cluster, after a few seconds, processes will start to migrate from your heavily loaded computer to other idle or less loaded computers in the clusters. (As explained later, because some jobs may finish quickly, it can be counterproductive to migrate too quickly.) If you have a dozen idle machines in the cluster, each compression should run on a different machine. Your machine will have only one compression running on it (along with a little added overhead) so you still may be able to use it. And the dozen compressions will take only a little longer than it would normally take to do a single compression. If you dont have a dozen computers, or some of your computers are slower than others, or some are otherwise loaded, openMosix will move the jobs around as best it can to balance the load. Once the cluster is set up, this is all done transparently by the system. Normally, you just start your jobs. openMosix does the rest. On the other hand, if you want to control the migration of jobs from one computer to the next, openMosix supplies you with the tools to do just that. OSCAR: Setting up a cluster can involve the installation and configuration of a lot of software as well as reconfiguration of the system and previously installed software. OSCAR (Open Source Cluster Application Resources) is a software package that is designed to simplify cluster installation. A collection of open source cluster software, OSCAR includes everything that you are likely to need for a dedicated, high-performance cluster. OSCAR takes you completely through the installation of your cluster. If you download, install, and run OSCAR, you will have a completely functioning cluster when you are done. The design goals for OSCAR include using the best-of-class software, eliminating the downloading, installation, and configuration of individual components, and moving toward the standardization of clusters. OSCAR, it is said, reduces the need for expertise in setting up a cluster. In practice, it might be more fitting to say that OSCAR delays the need for expertise and allows you to create a fully functional cluster before mastering all the skills you will eventually need. In the long run, you will want to master those packages in OSCAR that you come to rely on. OSCAR makes it very easy to experiment with packages and dramatically lowers the barrier to getting started. OSCAR was created and is maintained by the Open Cluster Group (http://www.openclustergroup.org), an informal group dedicated to simplifying the installation and use of clusters and broadening their use. Over the years, a number of organizations and companies have supported the Open Cluster Group, including Dell, IBM, Intel, NCSA, and ORNL, to mention only a few. Because OSCAR is an extensive collection of software, it is beyond the scope of this book to cover every package in detail. Most of the software in OSCAR is available as standalone versions, and many of the key packages included by OSCAR are described in later chapters in this book. Consequently, this chapter focuses on setting up OSCAR and on software unique to OSCAR. By the time you have finished this chapter, you should be able to judge whether OSCAR is appropriate for your needs and know how to get started. Rocks: NPACI Rocks is a collection of open source software for building a high-performance cluster. The primary design goal for Rocks is to make cluster installation as easy as possible. Unquestionably, they have gone a long way toward meeting this goal. To accomplish this, the default installation