From ac7c2e5d071151f69872f8e97dac414e41976168 Mon Sep 17 00:00:00 2001 From: ache Date: Tue, 16 Jun 2020 17:37:32 +0200 Subject: Documentation README --- README.md | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 0000000..6d9a041 --- /dev/null +++ b/README.md @@ -0,0 +1,102 @@ +Dicofr +====== + +An utility to create and query a French dictionary based on [Wiktionary archive dump](https://dumps.wikimedia.org/frwiktionary/20200601/). + + +Technically +---------- + +A bunch of **Python** scripts to transform wiktionary archive dump to MySQL database file. + + - With a simple CLI. + - With a simple WUI, flask based. + - Regex support. + - + +How to create the database +-------------------------- + +First you have to download a wiktionary archive file. +For example the file `frwiktionary-20200601-pages-articles.xml.bz2` witch is a full dump of the current version of every pages. + +For now you have to decompress it completely before you can treat it. +The use of [bz2](https://docs.python.org/3/library/bz2.html) may be considered in the future to make this step optional and thus reduce disk usage. + +~~~shell +$ bunzip2 frwiktionary-20200601-pages-articles.xml.bz2 +~~~ + +Then, you will create an intermediary file, a msgpack file, of every data of wiktionary. +This file is interesting for developers not for end users. +It's a serialization of the internal used dictionary (python dictionary). + +~~~shell +$ python dump2msgp.py -i frwiktionary-20200601-pages-articles.xml.bz2 +~~~ + +Then, you can create the SQLite database file. +~~~shell +$ python msgPack2sqlite_msgPack.py -i dicofr.msgpk -o dicofr.db +~~~ + +You can then use `dicofr.py` to search a word from the CLI or use the WUI with the command: +~~~shell +$ python web.py +~~~ + +How to use it +------------- + +You can use the CLI. + +~~~shell +$ dicofr -h +usage: dicofr [-h] [--sql] [--matching] PATTERN + +Get a french word's definition. + +positional arguments: + PATTERN the word or the pattern to match + +optional arguments: + -h, --help show this help message and exit + --sql search a definition using SQL regex, _ to match a letter, % to match a group of letters + --matching search the french words that match the regex +~~~ + +For example + +~~~shell +$ dicofr julien + julien + /ʒy.ljɛ̃/, adjectif + (Chronologie) Qui est lié à Jules César et à sa décision d’instaurer l’alternance entre trois années de trois cent soixante-cinq jours et une année bissextile de trois cent soixante-six jours. + * Calendrier *julien*. + * Année *julienne*. + * Correction *julienne*. +~~~ + +~~~shell +$ dicofr -m /julien/ +julienois +juliennette +juliennoises +juliennes +julien +julienne +julienoises +juliennettes +juliennoise +julienoise +juliennois +juliens +~~~ + +How to contribute ? +------------------- + +This project is free, you are free to send me a PR to improove this software. +Respect each other is the only rule. + +License: MIT like. -- cgit v1.2.3