| Schedule of conditions for the technical implementation of the WikiMedia Parser in C#
Introduction
This is a small program parser to make a static dump (HTML output) of Wikipedia which is based on WikiMedia (http://sourceforge.net/projects/wikipedia/)
The program is designed to extract WikiMedia tags (including the template http://en.wikipedia.org/wiki/Wikipedia:Template_messages/All) from text to transform onto html output. The Html must comply with the W3C’s HTML specifications.
The parser must be written in C# language. The main class should have an easy to use method for getting the text parsed.
I would like to use the API something like this:
String OrignalText = “wiki textâ€ÂÂ; WikimediaParser parser = new WikimediaParser(); String textParsed = parser.Parse( OrignalText);
You can take as a starting point this site: http://meta.wikimedia.org/wiki/Alternative_parsers
Job types • .NET C# • Regular expression • HTML/CSS
Resume Wikitext language or wiki markup is a markup language that offers a simplified alternative to HTML and is used to write pages in wiki websites. Wikitext is text in this language. There is no commonly accepted standard wikitext language. The grammar, structure, features, keywords and so on are dependent on the particular wiki software used on the particular website. For example, all wikitext markup languages have a simple way of hyperlinking to other pages within the site, but there are several different syntax conventions for these links. Some wiki programs allow extensive optional use of HTML tags within wikitext, others a smaller subset, and still others no HTML at all. Other wiki programs allow the restrictions on HTML to be set by the particular site. MediaWiki’s wikitext allows you to freely mix wiki format and HTML, but it provides a simple, readable syntax that allows users to not even know HTML
Project Wiki markup I would like to translate all wiki markup that is on this page: http://en.wikipedia.org/wiki/Wikipedia:How_to_edit_a_page Wiki template Wiki markup templates on this page: http://en.wikipedia.org/wiki/Wikipedia:Template_messages/All I don’t need “User talk namespaceâ€ÂÂ. Log Message: I want to use Logger4Net to log each error and accurate debug message when debug message is enabled. Flexible code I want flexible code to add future Wiki Markup or Wiki Template. The code must be commented very clearly.
Platform The API must be run on Windows and with the .NET Framework 1.1 or more. The API must be written with C# language.
Budget We pay only at the end of the project. Any method payment is accepted ( Paypal, wire, etc…)
Data I/O • Input It must be string, text file or xml file.
• Output The output must be complying with HTML specifications.
Methods I need 2 methods, you can implement this Interface.
Public interface IWikiParser { String Parse( string wikitext); String Parse( string wikitext, int length); }
For the second method, be carefully don’t split between two html tags.
Test You can test all Wikipedia articles with this database dump: http://download.wikimedia.org/wikipedia/en/pages_current.xml.bz2
I give you also smaller files for testing the parser.
Release The API must be on production release the mid January. But I would like to see every x days a working parser to check the quality of the dump.
Additional files submitted:
CDCF_Wikimedia.doc |