Draft: MR for evaluation -- Challenge 1 (!7) · Merge requests · Hackathon 2024 / Parsing tool / Platforms doc parsing

lemeb requested to merge hackathon-2024/contributions/lemeb/platforms-doc-parsing:main into main Jan 22, 2024

Signed-off-by: Léopold Mebazaa lemeb@users.noreply.github.com

This parsing tool is relatively simple, but quite performant, and makes heavy use of OpenAI's API.

Configuration

You will need an OpenAI API key to run this. The OpenAI key is stored in app/secret.py, under the following format:

OPENAI_API_KEY="sk-..."

If you want to run this on your own, you will need to create the OpenAI API key at this address, or re-use an existing key. You can then either:
- copy-and-paste it in app/secret.py
- define it as a CI/CD variable in GitLab (For this repository, it would likely be at this address). It will be picked up by .gitlab-ci.yml. Warning: to ensure disambiguation, I put it as OPENAI_KEY.

Execution

The CI/CD script will ensure the TikTok example gets parsed and its resulting JSON validated. The way it does it is through this series of commands:

  script:
    - "export SPEC='TikTok_QueryVideos'"
    - "./scripts/generate_file_for_schema_validation ${SPEC}"
    - "./.venv/bin/check-jsonschema --schemafile schema.json examples/${SPEC}/output/${SPEC}_result.json"

This means that, if you have a directory of the same structure as TikTok_QueryVideos, you can substitute it and have the script run on it. The repo makes a few assumptions, though:

Within the SPEC folder, there is an input subfolder, with a Markdown file to process.
The aforementioned Markdown file should respect the ${SPEC}.md naming convention and only reference one API endpoint.
Within the SPEC folder, there is an output subfolder, where a JSON file ${SPEC}_result.json can be written.

I will try to fix these assumptions later today, but since these were commonalities to the two examples, I thought this was fair game. Since the Markdown files in the input folder are 1-per-API, unlike the Markdown files in the versions repository which are 1-per-multiple-API, I'm only assuming you are splitting them up in the backend.

Edited Jan 22, 2024 by lemeb

Draft: MR for evaluation -- Challenge 1

Configuration

Execution

Merge request reports