aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 5a9a1edfb42c59098d91c248dc9cf7ad5c8b3636 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
Dfr
===

An utility to create and query a French dictionary based on [Wiktionary archive dump](https://dumps.wikimedia.org/frwiktionary/20200601/).


Technically
----------

A bunch of **Python** scripts to transform wiktionary archive dump to MySQL database file.

 - With a simple CLI.
 - With a simple WUI[^WUI], flask based.
 - Regex support.

*[^WUI]: Web User Interface

Install
-------

There is a Makefile to do that.

~~~shell
$ git clone https://git.ache.one/dfr/
$ cd dfr
$ sudo make install
~~~

It will download a dump of wiktionary, create the sqlite database and install dfr.
Then delete the dump of wiktionary.

You can uninstall everything with `uninstall`.
~~~shell
$ sudo make uninstall
~~~

How to create the database
--------------------------

You can use the script `download/download.py` to automate this process to create a sqlite database based on the lastest available dump of wiktionary (fr) :

~~~shell
$ python download/download.py -d -o datadb.sqlite
~~~

It will download the dump, and create the whole database.


But by hand, you can do the following.

First you have to download a wiktionary archive file.
For example the file `frwiktionary-20200601-pages-articles.xml.bz2` witch is a full dump of the current version of every pages.

For now you have to decompress it completely before you can treat it.
The use of [bz2](https://docs.python.org/3/library/bz2.html) may be considered in the future to make this step optional and thus reduce disk usage (it's now possible with `download.py`).

~~~shell
$ bunzip2 frwiktionary-20200601-pages-articles.xml.bz2
~~~

Then, you will create an intermediary file, a msgpack file, of every data of wiktionary.
This file is interesting for developers not for end users.
It's a serialization of the internal used dictionary (python dictionary).

~~~shell
$ python download/dump2msgp.py -i frwiktionary-20200601-pages-articles.xml
~~~

Then, you can create the SQLite database file.
~~~shell
$ python download/msgPack2sqlite_msgPack.py -i dfr.msgpk
~~~

You can then use `dfr.py` to search a word from the CLI or use the WUI with the command:
~~~shell
$ python web.py
~~~

How to use it
-------------

You can use the CLI.

~~~shell
$ dfr -h
usage: dfr [-h] [--sql] [--matching] PATTERN

Get a french word's definition.

positional arguments:
  PATTERN     the word or the pattern to match

optional arguments:
  -h, --help  show this help message and exit
  --sql       search a definition using SQL regex, _ to match a letter, % to match a group of letters
  --matching  search the french words that match the regex
~~~

For example

~~~shell
$ dfr julien
   julien
   /ʒy.ljɛ̃/, adjectif
   	(Chronologie) Qui est lié à Jules César et à sa décision d’instaurer l’alternance entre trois années de trois cent soixante-cinq jours et une année bissextile de trois cent soixante-six jours.
   		 * Calendrier *julien*.
   		 * Année *julienne*.
   		 * Correction *julienne*.
~~~

~~~shell
$ dfr -m /julien/
julienois
juliennette
juliennoises
juliennes
julien
julienne
julienoises
juliennettes
juliennoise
julienoise
juliennois
juliens
~~~

Or the WUI :

~~~shell
$ python web.py
~~~

Why only french ?
-----------------

Because that's the only language I'm able to tackle.
I can't verify anything about others languages.

**On the way to support other languages.**

Feel free to contribute.

How to contribute ?
-------------------

This project is free, you are free to send me a PR to improove this software.
Respect each other is the only rule.

License: MIT like.

License
---------

dfr is shared under the terms of the Cecile-B license.
Please refer to the licence file for detailed information.

dfr is also distribued under te terms of the MIT license.
Please refer to the license file for detailed information.