WordPress SOC 2009
03/31/2009
Ideas
Current WordPress search functionality sucks. It sucks in a way that
- 1. It doesn’t highlight the keyword in the results so you have no idea where they are in a long post
2. It doesn’t excerpt
3. It doesn’t search in comments
4. It doesn’t search in tags, categories and authors
I already wrote a C++ program that solved some of the problems. However, a PHP rewrite is required and the code has to be incorporated well into WordPress’ plugin system. Meanwhile, a better administration back-end will be done to enable or disable parts of the search functionalities, because searching is resource consuming and blog owners may not need all the features.
Another new feature of the advanced search plugin is to detect the keyword when visitors are referred from a search engine, and to automatically recommend visitors with links to related posts.
An advanced search function is the core of “automatic tagging system” where posts are compared and ordered based on content similarity.
When speed is a concern in blogs that have thousands of posts and hundreds of comments under each post, an alternative search “core” written in C++ (I already have) can be used.
Schedule of Deliverables
The estimated average weekly time I can put in this project is 10 hours (8 hours in weekends and two in weekdays). The deliverables will be
1. Mid-term deliverables by July 6
A draft version of enhance search plugin for WordPress. Its features should include
- advanced search to allow searching by multiple metadata selections (such as category, tag, author field and content)
- searching keywords in both posts and comments
- properly excerpting context around keywords in results
- highlighting keywords
- suppressing HTML tags in search results
- displaying posts and comments in hierarchy
- paging of search results
- supporting various search ‘logic’ for multiple keyword
- an admin back-end
2. Final deliverables by August 17
I will improve the plugin and add new features based on the mid-term evaluation. The plugin should be released in beta version in the begining of this period. Feedback will be collected and bugs eliminated. The 1.0 version is due to release by August 17.
[下载] 一本C#和.NET的好书
07/24/2008

《Pro C# 2008 and the .NET 3.5 Platform, Fourth Edition》
作者 Andrew Troelsen
我把这本电子书放在了Megaupload上,下载链接。打开时需要密码 c@sharp.com
这本书的前一半介绍C#和.NET的基础,包括四个部分,分别是:
1. .NET的基础概念(CLR/CTS/CLS等)和C#的编译环境。值得一提的是侧重介绍了非Visual Studio环境,包括命令行环境。这对理解C#的编译过程很有帮助。
2. 核心的C#语言构造,包括类的定义、继承、多态、成员函数重载、虚函数和抽象类,exception处理,以及特别有一章是关于garbage collection的,值得一看。
3. 高阶C#构造,包括interface, collection, delegate, indexder, 以及C#2008的新特性和LINQ入门。
4. 介绍 .NET 的assembly构造,多线程和CIL等。
后半介绍C#下的微软扩展库(比如ADO.NET, WCF, WindowsForms, WPF和ASP.NET等)。
特别值得一提的在附录中专门有一章介绍.NET的平台无关移植 — Mono项目。
该书条理清晰,将一个个概念按逻辑顺序逐个解释,同时辅以例子。即可以作为初学者的入门书,同时这部1300页、拥有详细索引的书也可作为进阶的参考书,所以强烈推荐!
.NET比C/C++更快?
07/19/2008
C#和JAVA以及Python一样,是解释性语言(Interpreted language)的一种,这类语言和编译性语言(Compiled language)如C/C++/FORTRAN的区别在于后者将源代码编译为机器代码执行,而前者通过将源代码编译为平台无关的bytecode,然后再通过虚拟机的“即时编译”(Just-in-time compilation, JIT)在执行时“解释”为机器代码然后运行。在跨平台性和代码易于调试维护上前者比后者有绝对优势,而且往往添加了一些关键特性比如garbage collection。
通常编译性语言比解释性语言快,但并不是一定的。两者执行效率差距完全取决于编译器和虚拟机的质量。.NET平台就是一个很好的例子。
.NET 是微软的一种应用程序构架(Framework),C#/C++/VB等源代码可以通过微软的编译器先生成一种称为CIL(Common Intermediate Language, 通用中间语言)的bytecode,然后通过.net的虚拟机编译为机器代码执行。C#是.NET的默认语言,你也许会争论C++是.NET的默认语言,因为其可控性更强,但这种争论是错误的,因为:C#和managed C++生成的都是相同的CIL bytecode,而C++语言本身显然缺少C#的新特性。
即使是C#.NET和unmanaged C++(非生成CIL的C++)相比,前者执行效率上也有优势。下面我通过实例(N体问题的数值解)来比较C#和C++(unmanaged)的运行效率,采用了computer language shootout中的源代码,并做了少量修改,但算法都是一样的。我修改过的代码在下面下载:
C# 源代码文件 nbody.cs
C 源代码文件 nbody.c
C++ 源代码文件 nbody.cpp
编译环境为:
CPU: Pentium 4 3.0Ghz
OS: Windows XP / SP2
Memory: 1G
Compilers:
Visual Studio 08: .NET(3.5), C# 编译器 csc.exe (3.5.21022.8), C++/C 编译器 cl.exe(15.00.21022.08),
Cygwin中的gcc/g++(3.4.4)
另外对C#编译后的.exe的bytecode还可以通过ngen.exe进行预编译为机器码放入Native Image Cache(C:\windows\assembler)中尝试加快执行速度(免去对CIL的JIT时间)。
编译和运行都在cygwin的shell下通过命令行执行,编译命令行和程序执行的时间见下。
C# (csc.exe)
$ csc /o nbody.cs
9.604s, 9.587s, 9.593s, 平均值 9.595s
C# (csc.exe with ngen.exe)
$ csc /o nbody.cs
$ ngen install nbody.exe
9.000s, 8.984s, 9.000s, 平均值 8.995s
C (gcc)
$ gcc -O3 nbody.c -o body
10.859s, 10.781s, 10.828s, 平均值 10.823s
C (cl.exe)
$ cl /Ox nbody.c
12.029s, 12.008s, 12.004s, 平均值 12.014s
C++ (g++)
$ g++ -O3 nbody.cpp -o nbody
10.938s, 11.016s, 11.015s, 平均值 10.990s
c++ (cl.exe)
$ cl /Ox nbody.cpp
12.584s, 12.572s, 12.528s, 平均值 12.561s
执行的命令行都是
$ time ./nbody.exe 20000000
通过比较得出结论:
1. 算法才是王道。决定效率的第一因素是算法而不是语言。
2. 解释性语言和编译性语言的执行效率取决于解释器和编译器,而.NET的执行效率确实是高度优化的。从上面的结果可以看出.NET的虚拟机比gcc或者微软自身的C++编译器生成的机器代码效率都要高(9.595s vs 10.823s)。也许你会说这个N体问题特殊,对其他某些问题C/C++会比C#快,但两者效率肯定是一个数量级别的。
3. 微软的C++编译器很烂。在都是用优化选项的前提下,无论是C还是C++代码,微软的cl.exe编译器都比GCC生成的代码慢。C代码慢11%而C++慢了14%.
4. 从对比使用ngen和不用ngen的结果来看,后者仅仅比前者慢了6.7%(0.6秒),可见.NET对CIL的执行时编译为机器的效率是很高的。
总之,C#.NET的前途是光明的,考虑到Windows的market share有90%,未来的趋势将是:C#.NET打败JAVA,千秋万代一统江湖
