Quickstart
There are two things you can do right after cloning the repository:
- Running a demo pipeline to see how the pipeline in this project is
supposed to work.
- Building the project and playing around with some toy data on the
frontend.
Running the demo pipeline
To run the demo pipeline, execute
python3 backend/bwg/demo_pipeline.py
Your terminal should show you the following on successful execution:
===== Luigi Execution Summary =====
Scheduled 4 tasks of which:
* 4 ran successfully:
- 1 DemoTask1(...)
- 1 DemoTask2(...)
- 1 DemoTask3(...)
- 1 SimpleReadingTask(...)
This progress looks :) because there were no failed tasks or missing external dependencies
===== Luigi Execution Summary =====
To see the results of your pipeline, go to
backend/data/pipeline_demo
: There you can the output of different
pipeline tasks. The output of the final task,
demo_corpus_replaced.json
, is written in prettified JSON
to
enhance readability.
If you want to know more about how the demo pipeline works, visit this
site in
the project’s documentation.
Building the project with toy data
Building is handled by
Docker, so make
sure to have it installed beforehand.
Afterwards, the project setup is fairly simple. Just go the root
directory of the project and execute the following command:
docker-compose build && docker-compose up
The building of the docker images in this project might take a while,
especially during the first time you’re using this project. requests to
the API using port 6050
by default (see the documentation for
`bwg/run_api.py
<http://bigworldgraph.readthedocs.io/bwg.run_api.html>`__
for more information).
Now you can play around on the frontend by visiting
127.0.0.1:8080 on your browser!
Getting serious: Running the real pipeline
First of all, make sure to have the corpus file in its designated
directory (backend/data/corpora_french/
by default).
Set up the project like in the previous step with
docker-compose build && docker-compose up
After all containers are running, you can run the pipeline by executing
the following:
cd ./pipeline/
docker build . -t pipeline
docker run -v /var/run/docker.sock:/var/run/docker.sock -v `pwd`/stanford/models/:/stanford_models/ --name pipeline pipeline
If you are on a Windows system, replace pwd
inside the -v
flag
with the absolute path to the stanford/models
directory.
First of all, all the necessary Stanford models will be downloaded from
a MAJ server to /pipeline/stanford/models
if necessary. This might
take a while. Afterwards, the pipeline will be started. Depending on the
size of the corpus file and the tasks in the pipeline, run time can also
vary heavily.
The final output of the pipeline should look something like this:
===== Luigi Execution Summary =====
Scheduled 4 tasks of which:
* 3 present dependencies were encountered:
- 1 FrenchPipelineRunInfoGenerationTask(...)
- 1 FrenchServerPropertiesCompletionTask(...)
- 1 FrenchServerRelationMergingTask(...)
* 1 ran successfully:
- 1 FrenchRelationsDatabaseWritingTask(...)
This progress looks :) because there were no failed tasks or missing external dependencies
===== Luigi Execution Summary =====
Now go to 127.0.0.1:8080 again and marvel at your
graph!