aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorache <ache@ache.one>2020-06-16 17:37:32 +0200
committerache <ache@ache.one>2020-06-16 17:37:32 +0200
commitac7c2e5d071151f69872f8e97dac414e41976168 (patch)
treee2f9c51b5f89e61ce4e08125160b46cd5994f9f2 /README.md
parentRegex matching (diff)
Documentation README
Diffstat (limited to 'README.md')
-rw-r--r--README.md102
1 files changed, 102 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..6d9a041
--- /dev/null
+++ b/README.md
@@ -0,0 +1,102 @@
+Dicofr
+======
+
+An utility to create and query a French dictionary based on [Wiktionary archive dump](https://dumps.wikimedia.org/frwiktionary/20200601/).
+
+
+Technically
+----------
+
+A bunch of **Python** scripts to transform wiktionary archive dump to MySQL database file.
+
+ - With a simple CLI.
+ - With a simple WUI, flask based.
+ - Regex support.
+ -
+
+How to create the database
+--------------------------
+
+First you have to download a wiktionary archive file.
+For example the file `frwiktionary-20200601-pages-articles.xml.bz2` witch is a full dump of the current version of every pages.
+
+For now you have to decompress it completely before you can treat it.
+The use of [bz2](https://docs.python.org/3/library/bz2.html) may be considered in the future to make this step optional and thus reduce disk usage.
+
+~~~shell
+$ bunzip2 frwiktionary-20200601-pages-articles.xml.bz2
+~~~
+
+Then, you will create an intermediary file, a msgpack file, of every data of wiktionary.
+This file is interesting for developers not for end users.
+It's a serialization of the internal used dictionary (python dictionary).
+
+~~~shell
+$ python dump2msgp.py -i frwiktionary-20200601-pages-articles.xml.bz2
+~~~
+
+Then, you can create the SQLite database file.
+~~~shell
+$ python msgPack2sqlite_msgPack.py -i dicofr.msgpk -o dicofr.db
+~~~
+
+You can then use `dicofr.py` to search a word from the CLI or use the WUI with the command:
+~~~shell
+$ python web.py
+~~~
+
+How to use it
+-------------
+
+You can use the CLI.
+
+~~~shell
+$ dicofr -h
+usage: dicofr [-h] [--sql] [--matching] PATTERN
+
+Get a french word's definition.
+
+positional arguments:
+ PATTERN the word or the pattern to match
+
+optional arguments:
+ -h, --help show this help message and exit
+ --sql search a definition using SQL regex, _ to match a letter, % to match a group of letters
+ --matching search the french words that match the regex
+~~~
+
+For example
+
+~~~shell
+$ dicofr julien
+ julien
+ /ʒy.ljɛ̃/, adjectif
+ (Chronologie) Qui est lié à Jules César et à sa décision d’instaurer l’alternance entre trois années de trois cent soixante-cinq jours et une année bissextile de trois cent soixante-six jours.
+ * Calendrier *julien*.
+ * Année *julienne*.
+ * Correction *julienne*.
+~~~
+
+~~~shell
+$ dicofr -m /julien/
+julienois
+juliennette
+juliennoises
+juliennes
+julien
+julienne
+julienoises
+juliennettes
+juliennoise
+julienoise
+juliennois
+juliens
+~~~
+
+How to contribute ?
+-------------------
+
+This project is free, you are free to send me a PR to improove this software.
+Respect each other is the only rule.
+
+License: MIT like.