Project structure

GLOBALS

DATABASE: SQLite

SAMPLE_SIZE=size of dataframe * 5
PREPROCESSING (in this version it remains constant and not available for user to tune)
First, we encode discrete data, then discretize continuous data.
The purpose of this action and more details can be found in section “About BAMT algorithms”.
encoder = pp.LabelEncoder()
discretizer = pp.KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='quantile')
EXAMPLE PARAMS
Geological dataset:
has_logit=0,
use_mixture=0,
score_function="K2"
Social dataset:
has_logit=1,
use_mixture=1,
score_function="MI" (different params to avoid isolated vertices)

API

There are 4 main and 1 additional modules implemeneted. Additional modules are tests, we use them only in dev stage. Each module follows this pattern:

  1. Controller. File with query

  2. Service. File with core functions

  3. Models. File with declarations of tables in database

  4. Schema. File with docs.

  5. Other elements for particular route group.

Quick API reference

Resource

Operation

Description

DELETE /api/data_manager/wipe_cache

GET /swagger.json

GET /

GET /swaggerui/(path:filename)

AuthToken

POST /api/auth/get_token

Get Token

BN

GET /api/experiment/(string:owner)/(string:name)/(string:dataset)/(bn_params)

Train bayessian network

BNAnalyser

GET /api/bn_manager/get_equal_edges

Get difference between 2 networks

BNDownloader

GET /api/bn_manager/download_BN

Download bn

BNGetNamesManager

GET /api/bn_manager/get_BN_names/(string:owner)

Get list with names of bns

BNManager

GET /api/bn_manager/get_BN/(string:owner)

Get dict with bns of user and their info

BNRemover

DELETE /api/bn_manager/remove/(string:owner)/(string:name)

Remove bn

CheckFullness

GET /api/data_manager/check_fullness

Check database fullness

DataUploader

POST /api/data_manager/upload

Upload dataset

DatasetObserver

GET /api/data_manager/get_datasets

Get dataset description

DatasetRemover

DELETE /api/data_manager/remove_dataset

Remove dataset

Models

GET /api/experiment/get_models

Get available models

Register

POST /api/auth/signup

Registration

RootNodes

GET /api/data_manager/get_root_nodes

Get root nodes from dataset

Sampler

GET /api/bn_manager/get_display_data/(string:owner)/(string:net_name)/(string:dataset_name)/(string:node)

Get data to display on x- and y-axis and metrics

SignIn

PUT /api/auth/signin

Sign in

AuthMod

This module provides a communication between user and auth system.

Controller

POST /api/auth/get_token

Authorize user.

Parameters:
  • username – user’s name

  • password – password

Status Codes:
  • codes

    • 200 Success - returns {“token”: token}.

    • 400 Unauthorized - NotFound or incorrect password

PUT /api/auth/signin

Link token to user.

Parameters:
  • username – user’s name

  • password – password

  • token – token

Status Codes:
  • codes

    • 200 Success

    • 400 NotFound

POST /api/auth/signup

User registration.

Parameters:
  • username – user’s name

  • password – password

Status Codes:
  • codes

    • 200 Success - registration successful.

    • 400 Bad request
      • Forbidden name

      • Empty body

      • User not found

      • User already exists

Models

Declare the tables related to auth system.

Service

Here we defined functions to work with auth system.

Experiment

One of the most important module in application. It is responsible for training bayssian network, sample from it.

Controller

GET /api/experiment/get_models

Get available models for nodes.

Parameters:
  • model_type – str, “regressor” or “classifier”

Status Codes:
  • codes

    • 200 - list with models as strings

    • 500 - server error

GET /api/experiment/(string: owner)/(string: name)/(string: dataset)/(bn_params)

Train BN and sample from it, then save it to db.

Parameters:
  • owner – bn’s owner

  • name – name of bayessian network

  • dataset – dataset name

  • bn_params – additional parameters

bn_params json:
{
    "scoring_function": "K2",
    "use_mixture": "true" or True,
    "has_logit": "true" or True,
    "classifier": "LogisticRegression",
    "regressor": "LinearRegression",
    "compare_with_default": "true" or True,
    "params": {
                "remove_init_edges": "true" or True or None,
                "init_edges": [["node1", "node2"], ["node2", "node3"], ["node3", "node4"]] or None,
                "init_nodes": ["node3", "node4"] or None
               }
}
Status Codes:
  • codes

    • 200 Success - returns trained network data (see below)

    • 400 Bad request -
      • net name is too big

      • use_mixture of has_logit or both or scoring_function is not defined

    • 404 NotFound - User is not found

    • 406 - bn’s limit reached

    • 422 - check for uniqueness of name failed

    • 500 - server error

network json:
{
    "network":
            {
             "name": name of net,
             "dataset_name": name of dataset bn trained on,
             "edges": edges,
             "nodes": nodes,
             "use_mixture": bool,
             "has_logit": bool,
             "classifier": str,
             "regressor": str,
             "params": {"init_edges": None or List[List[str]],
                        "init_nodes": None or List[str],
                        "white_list": FROZEN FEATURE,
                        "bl_add": FROZEN FEATURE,
                        "remove_init_edges": bool or none},
             "scoring_function": str,
             "descriptor": str, string with dictionary with pairs
             }
}

Models

Declare tables with networks and samples.

Service

Core functions to fit bayessian network and save them.

BN manager

Module provides operations with bayessian networks in database such as: find BN(-s) if exists, delete, put and train.

Controller

GET /api/bn_manager/get_equal_edges

get different edges between 2 nets.

Parameters:
  • names – nets name as List[str]

  • owner – owner of nets

Status Codes:
  • codes

    • 400 Bad request

    • 404 Nets wasn’t found

Return:

{“equal_edges”: List of strings with nodes}

GET /api/bn_manager/download_BN

Download BN.

Parameters:
  • user – username

  • bn_name – name of bn

Status Codes:
  • codes

    • 200 Success - send file

    • 400 Bad request - user or bn_name wasn’t found in request

    • 404 Bad request -
      • user or bn_name wasn’t found

      • network was not find

GET /api/bn_manager/get_display_data/(string: owner)/(string: net_name)/(string: dataset_name)/(string: node)

Get real and sampled data.

Parameters:
  • owner – username

  • net_name – name of network

  • dataset_name – name of dataset

  • node – name of node

Status Codes:
  • codes

    • 200 Success - return json with data to display

    • 400 Bad Request

    • 404 NotFound - Sample wasn’t found.

display json:
{
    'data': List with data for y-axis,
    'xvals': List with data for x-axis,
    'metrics': {metric: val},
    'type': Str, type of node
}
GET /api/bn_manager/get_BN_names/(string: owner)

Get BN names to validate uniqueness.

Parameters:
  • owner – net holder

Status Codes:
  • codes

    • 200 Success - return {“networks”: list with names of nets for owner}

    • 404 NotFound - user not found.

DELETE /api/bn_manager/remove/(string: owner)/(string: name)

Delete bn and its samples.

Parameters:
  • owner – username

  • name – name of bayessian network

Status Codes:
  • codes

    • 200 Success

    • 404 NotFound - net was not found

GET /api/bn_manager/get_BN/(string: owner)

Get BN Data.

Parameters:
  • owner – bn’s owner

Status Codes:
  • codes

    • 200 Success - return user’s bns and info about them

    • 404 NotFound

network json:
{
    "networks":
            {"number": {
                         "name": name of net,
                         "dataset_name": name of dataset bn trained on,
                         "edges": edges,
                         "nodes": nodes,
                         "use_mixture": bool,
                         "has_logit": bool,
                         "classifier": str,
                         "regressor": str,
                         "params": {"init_edges": None or List[List[str]],
                                    "init_nodes": None or List[str],
                                    "white_list": FROZEN FEATURE,
                                    "bl_add": FROZEN FEATURE,
                                    "remove_init_edges": bool or none},
                         "scoring_function": str,
                         "descriptor": str, string with dictionary with pairs
                        }
            }
}

Service

Core functions to work with samples. It contains SampleWorker class that provide sample analysis and processing.

Data manager

Module provides operations with data such as: (up)-, (down-) load datasets, their removal and preprocessing.

Controller

DELETE /api/data_manager/remove_dataset

Remove dataset.

Parameters:
  • name – dataset name

  • user – username

Status Codes:
  • codes

    • 200 Success

    • 400 BadRequest - Empty location provided

    • 403 - Attempt to delete our data

    • 404 NotFound - Dataset was not found in database

GET /api/data_manager/get_root_nodes

Return all possible root nodes.

Note that under vk and hack names we store our datasets. If you want to get them, you don’t need to pass an owner.

Parameters:
  • name – dataset name

  • owner – OPTIONAL owner doesn’t accept if dataset is ours.

Status Codes:
  • codes

    • 200 Success - return json {“root_nodes”: List[str]}

    • 400 BadRequest - Empty parameters or empty location

    • 404 NotFound

GET /api/data_manager/check_fullness

Return True if the upload_folder is the same as the list of locations from the database.

Return:

if corrupted returns a paths, if not returns message “Database is full.”

GET /api/data_manager/get_datasets

Get a list with user’s datasets.

Parameters:
  • user – username

Status Codes:
  • codes

    • 200 Success - return dict with {dataset.name:dataset.description}

    • 404 NotFound

    • 422 - user wasn’t found in request body

DELETE /api/data_manager/wipe_cache

Clean cached samples

POST /api/data_manager/upload

Put dataset’s link into db.

Dataset itself is put inside folder of user, into db stores only links.

Parameters:
  • name – name of dataset

  • owner – user’s name

  • description – Description of dataset

  • content – Raw file

Status Codes:
  • codes

    • 200 Success

    • 400 BadRequest -
      • name of dataset must be unique

      • dataset contains “Unnamed:0” column

      • dataset contains too many rows and/or columns

    • 404 NotFound
      • no file or user not found

      • empty file or conversion error

    • 405 - dataset’s limit reached

    • 422 BadRequest - cannot read the file

Models

Declare tables with datasets.

Service

Core functions to upload datasets and save them.