aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 6d9a041f3066a12af621f8a320fab8e5a1a72c52 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
Dicofr
======

An utility to create and query a French dictionary based on [Wiktionary archive dump](https://dumps.wikimedia.org/frwiktionary/20200601/).


Technically
----------

A bunch of **Python** scripts to transform wiktionary archive dump to MySQL database file.

 - With a simple CLI.
 - With a simple WUI, flask based.
 - Regex support.
 - 

How to create the database
--------------------------

First you have to download a wiktionary archive file.
For example the file `frwiktionary-20200601-pages-articles.xml.bz2` witch is a full dump of the current version of every pages.

For now you have to decompress it completely before you can treat it.
The use of [bz2](https://docs.python.org/3/library/bz2.html) may be considered in the future to make this step optional and thus reduce disk usage.

~~~shell
$ bunzip2 frwiktionary-20200601-pages-articles.xml.bz2
~~~

Then, you will create an intermediary file, a msgpack file, of every data of wiktionary.
This file is interesting for developers not for end users.
It's a serialization of the internal used dictionary (python dictionary). 

~~~shell
$ python dump2msgp.py -i frwiktionary-20200601-pages-articles.xml.bz2
~~~

Then, you can create the SQLite database file.
~~~shell
$ python msgPack2sqlite_msgPack.py -i dicofr.msgpk -o dicofr.db
~~~

You can then use `dicofr.py` to search a word from the CLI or use the WUI with the command:
~~~shell
$ python web.py
~~~

How to use it
-------------

You can use the CLI.

~~~shell
$ dicofr -h
usage: dicofr [-h] [--sql] [--matching] PATTERN

Get a french word's definition.

positional arguments:
  PATTERN     the word or the pattern to match

optional arguments:
  -h, --help  show this help message and exit
  --sql       search a definition using SQL regex, _ to match a letter, % to match a group of letters
  --matching  search the french words that match the regex
~~~

For example

~~~shell
$ dicofr julien
   julien
   /ʒy.ljɛ̃/, adjectif
   	(Chronologie) Qui est lié à Jules César et à sa décision d’instaurer l’alternance entre trois années de trois cent soixante-cinq jours et une année bissextile de trois cent soixante-six jours.
   		 * Calendrier *julien*.
   		 * Année *julienne*.
   		 * Correction *julienne*.
~~~

~~~shell
$ dicofr -m /julien/
julienois
juliennette
juliennoises
juliennes
julien
julienne
julienoises
juliennettes
juliennoise
julienoise
juliennois
juliens
~~~

How to contribute ?
-------------------

This project is free, you are free to send me a PR to improove this software.
Respect each other is the only rule.

License: MIT like.