Dfr === An utility to create and query a French dictionary based on [Wiktionary archive dump](https://dumps.wikimedia.org/frwiktionary/20200601/). Technically ---------- A bunch of **Python** scripts to transform wiktionary archive dump to MySQL database file. - With a simple CLI. - With a simple WUI[^WUI], flask based. - Regex support. *[^WUI]: Web User Interface Install ------- There is a Makefile to do that. ~~~shell $ git clone https://git.ache.one/dfr/ $ cd dfr $ sudo make install ~~~ It will download a dump of wiktionary, create the sqlite database and install dfr. Then delete the dump of wiktionary. You can uninstall everything with `uninstall`. ~~~shell $ sudo make uninstall ~~~ How to create the database -------------------------- You can use the script `download/download.py` to automate this process to create a sqlite database based on the lastest available dump of wiktionary (fr) : ~~~shell $ python download/download.py -d -o datadb.sqlite ~~~ It will download the dump, and create the whole database. But by hand, you can do the following. First you have to download a wiktionary archive file. For example the file `frwiktionary-20200601-pages-articles.xml.bz2` witch is a full dump of the current version of every pages. For now you have to decompress it completely before you can treat it. The use of [bz2](https://docs.python.org/3/library/bz2.html) may be considered in the future to make this step optional and thus reduce disk usage (it's now possible with `download.py`). ~~~shell $ bunzip2 frwiktionary-20200601-pages-articles.xml.bz2 ~~~ Then, you will create an intermediary file, a msgpack file, of every data of wiktionary. This file is interesting for developers not for end users. It's a serialization of the internal used dictionary (python dictionary). ~~~shell $ python download/dump2msgp.py -i frwiktionary-20200601-pages-articles.xml ~~~ Then, you can create the SQLite database file. ~~~shell $ python download/msgPack2sqlite_msgPack.py -i dfr.msgpk ~~~ You can then use `dfr.py` to search a word from the CLI or use the WUI with the command: ~~~shell $ python web.py ~~~ How to use it ------------- You can use the CLI. ~~~shell $ dfr -h usage: dfr [-h] [--sql] [--matching] PATTERN Get a french word's definition. positional arguments: PATTERN the word or the pattern to match optional arguments: -h, --help show this help message and exit --sql search a definition using SQL regex, _ to match a letter, % to match a group of letters --matching search the french words that match the regex ~~~ For example ~~~shell $ dfr julien julien /ʒy.ljɛ̃/, adjectif (Chronologie) Qui est lié à Jules César et à sa décision d’instaurer l’alternance entre trois années de trois cent soixante-cinq jours et une année bissextile de trois cent soixante-six jours. * Calendrier *julien*. * Année *julienne*. * Correction *julienne*. ~~~ ~~~shell $ dfr -m /julien/ julienois juliennette juliennoises juliennes julien julienne julienoises juliennettes juliennoise julienoise juliennois juliens ~~~ Or the WUI : ~~~shell $ python web.py ~~~ Why only french ? ----------------- Because that's the only language I'm able to tackle. I can't verify anything about others languages. **On the way to support other languages.** Feel free to contribute. How to contribute ? ------------------- This project is free, you are free to send me a PR to improove this software. Respect each other is the only rule. License: MIT like. License --------- dfr is shared under the terms of the Cecile-B license. Please refer to the licence file for detailed information. dfr is also distribued under te terms of the MIT license. Please refer to the license file for detailed information.