Dfr
An utility to create and query a French dictionary based on Wiktionary archive dump.
Technically
A bunch of Python scripts to transform wiktionary archive dump to MySQL database file.
- With a simple CLI.
- With a simple WUI1, flask based.
- Regex support.
*1: Web User Interface
Install
There is a Makefile to do that.
$ git clone https://git.ache.one/dfr/
$ cd dfr
$ sudo make install
It will download a dump of wiktionary, create the sqlite database and install dfr. Then delete the dump of wiktionary.
You can uninstall everything with uninstall
.
$ sudo make uninstall
How to create the database
You can use the script download/download.py
to automate this process to create a sqlite database based on the lastest available dump of wiktionary (fr) :
$ python download/download.py -d -o datadb.sqlite
It will download the dump, and create the whole database.
But by hand, you can do the following.
First you have to download a wiktionary archive file.
For example the file frwiktionary-20200601-pages-articles.xml.bz2
witch is a full dump of the current version of every pages.
For now you have to decompress it completely before you can treat it.
The use of bz2 may be considered in the future to make this step optional and thus reduce disk usage (it's now possible with download.py
).
$ bunzip2 frwiktionary-20200601-pages-articles.xml.bz2
Then, you will create an intermediary file, a msgpack file, of every data of wiktionary. This file is interesting for developers not for end users. It's a serialization of the internal used dictionary (python dictionary).
$ python download/dump2msgp.py -i frwiktionary-20200601-pages-articles.xml
Then, you can create the SQLite database file.
$ python download/msgPack2sqlite_msgPack.py -i dfr.msgpk
You can then use dfr.py
to search a word from the CLI or use the WUI with the command:
$ python web.py
How to use it
You can use the CLI.
$ dfr -h
usage: dfr [-h] [--sql] [--matching] PATTERN
Get a french word's definition.
positional arguments:
PATTERN the word or the pattern to match
optional arguments:
-h, --help show this help message and exit
--sql search a definition using SQL regex, _ to match a letter, % to match a group of letters
--matching search the french words that match the regex
For example
$ dfr julien
julien
/ʒy.ljɛ̃/, adjectif
(Chronologie) Qui est lié à Jules César et à sa décision d’instaurer l’alternance entre trois années de trois cent soixante-cinq jours et une année bissextile de trois cent soixante-six jours.
* Calendrier *julien*.
* Année *julienne*.
* Correction *julienne*.
$ dfr -m /julien/
julienois
juliennette
juliennoises
juliennes
julien
julienne
julienoises
juliennettes
juliennoise
julienoise
juliennois
juliens
Or the WUI :
$ python web.py
Why only french ?
Because that's the only language I'm able to tackle. I can't verify anything about others languages.
On the way to support other languages.
Feel free to contribute.
How to contribute ?
This project is free, you are free to send me a PR to improove this software. Respect each other is the only rule.
License: MIT like.
License
dfr is shared under the terms of the Cecile-B license. Please refer to the licence file for detailed information.
dfr is also distribued under te terms of the MIT license. Please refer to the license file for detailed information.