blob: 6d9a041f3066a12af621f8a320fab8e5a1a72c52 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
|
Dicofr
======
An utility to create and query a French dictionary based on [Wiktionary archive dump](https://dumps.wikimedia.org/frwiktionary/20200601/).
Technically
----------
A bunch of **Python** scripts to transform wiktionary archive dump to MySQL database file.
- With a simple CLI.
- With a simple WUI, flask based.
- Regex support.
-
How to create the database
--------------------------
First you have to download a wiktionary archive file.
For example the file `frwiktionary-20200601-pages-articles.xml.bz2` witch is a full dump of the current version of every pages.
For now you have to decompress it completely before you can treat it.
The use of [bz2](https://docs.python.org/3/library/bz2.html) may be considered in the future to make this step optional and thus reduce disk usage.
~~~shell
$ bunzip2 frwiktionary-20200601-pages-articles.xml.bz2
~~~
Then, you will create an intermediary file, a msgpack file, of every data of wiktionary.
This file is interesting for developers not for end users.
It's a serialization of the internal used dictionary (python dictionary).
~~~shell
$ python dump2msgp.py -i frwiktionary-20200601-pages-articles.xml.bz2
~~~
Then, you can create the SQLite database file.
~~~shell
$ python msgPack2sqlite_msgPack.py -i dicofr.msgpk -o dicofr.db
~~~
You can then use `dicofr.py` to search a word from the CLI or use the WUI with the command:
~~~shell
$ python web.py
~~~
How to use it
-------------
You can use the CLI.
~~~shell
$ dicofr -h
usage: dicofr [-h] [--sql] [--matching] PATTERN
Get a french word's definition.
positional arguments:
PATTERN the word or the pattern to match
optional arguments:
-h, --help show this help message and exit
--sql search a definition using SQL regex, _ to match a letter, % to match a group of letters
--matching search the french words that match the regex
~~~
For example
~~~shell
$ dicofr julien
julien
/ʒy.ljɛ̃/, adjectif
(Chronologie) Qui est lié à Jules César et à sa décision d’instaurer l’alternance entre trois années de trois cent soixante-cinq jours et une année bissextile de trois cent soixante-six jours.
* Calendrier *julien*.
* Année *julienne*.
* Correction *julienne*.
~~~
~~~shell
$ dicofr -m /julien/
julienois
juliennette
juliennoises
juliennes
julien
julienne
julienoises
juliennettes
juliennoise
julienoise
juliennois
juliens
~~~
How to contribute ?
-------------------
This project is free, you are free to send me a PR to improove this software.
Respect each other is the only rule.
License: MIT like.
|