aboutsummaryrefslogtreecommitdiff

Dfr

An utility to create and query a French dictionary based on Wiktionary archive dump.

Technically

A bunch of Python scripts to transform wiktionary archive dump to MySQL database file.

  • With a simple CLI.
  • With a simple WUI1, flask based.
  • Regex support.

*1: Web User Interface

Install

There is a Makefile to do that.

$ git clone https://git.ache.one/dfr/
$ cd dfr
$ make install

It will download a dump of wiktionary, create the sqlite database and install dfr. Then delete the dump of wiktionary.

You can uninstall everything with uninstall.

$ make uninstall

How to create the database

You can use the script download/download.py to automate this process to create a sqlite database based on the lastest available dump of wiktionary (fr) :

$ python download/download.py -d -o datadb.sqlite

It will download the dump, and create the whole database.

But by hand, you can do the following.

First you have to download a wiktionary archive file. For example the file frwiktionary-20200601-pages-articles.xml.bz2 witch is a full dump of the current version of every pages.

For now you have to decompress it completely before you can treat it. The use of bz2 may be considered in the future to make this step optional and thus reduce disk usage (it's now possible with download.py).

$ bunzip2 frwiktionary-20200601-pages-articles.xml.bz2

Then, you will create an intermediary file, a msgpack file, of every data of wiktionary. This file is interesting for developers not for end users. It's a serialization of the internal used dictionary (python dictionary).

$ python download/dump2msgp.py -i frwiktionary-20200601-pages-articles.xml

Then, you can create the SQLite database file.

$ python download/msgPack2sqlite_msgPack.py -i dfr.msgpk

You can then use dfr.py to search a word from the CLI or use the WUI with the command:

$ python web.py

How to use it

You can use the CLI.

$ dfr -h
usage: dfr [-h] [--sql] [--matching] PATTERN

Get a french word's definition.

positional arguments:
  PATTERN     the word or the pattern to match

optional arguments:
  -h, --help  show this help message and exit
  --sql       search a definition using SQL regex, _ to match a letter, % to match a group of letters
  --matching  search the french words that match the regex

For example

$ dfr julien
   julien
   /ʒy.ljɛ̃/, adjectif
   	(Chronologie) Qui est lié à Jules César et à sa décision d’instaurer l’alternance entre trois années de trois cent soixante-cinq jours et une année bissextile de trois cent soixante-six jours.
   		 * Calendrier *julien*.
   		 * Année *julienne*.
   		 * Correction *julienne*.
$ dfr -m /julien/
julienois
juliennette
juliennoises
juliennes
julien
julienne
julienoises
juliennettes
juliennoise
julienoise
juliennois
juliens

Or the WUI :

$ python web.py

Why only french ?

Because that's the only language I'm able to tackle. I can't verify anything about others languages.

On the way to support other languages.

Feel free to contribute.

How to contribute ?

This project is free, you are free to send me a PR to improove this software. Respect each other is the only rule.

License: MIT like.