How to serve Pororo Models

Recently, Kakao Brain Corp. has released a very interesting open source library called Pororo, named after a very famous Korean animation character. This library provides various number of natural language related models and APIs to easily apply them to your services, especially for Korean applications.

One complication is that the level of abstraction is a double edged sword: it allows for easy usage, but it hinders manual tweaking. My purpose here is to serve their model: which means that I would have to track down towards their actual model save file and prediction logics, etc. Let get to it.

Specifications and Dependencies

The library is mainly built on 2 packages: fairseq and transformers, which both is based on PyTorch framework. Therefore, torchserve will be used to serve these models.

Also, it requires openjdk, which is a required package from mecab, a popular morphology analysis tool originally for Japanese, but also ported to Korean.

Implementation

The code is based on a tutorial given in the following blog. Few adjustments have been made in the model definition part in the handler.py, specific to the Pororo package.

According to the official Pororo documentation, the simplest way to start a MRC task is by:

    >>> mrc = Pororo(task="mrc", lang="ko")
    >>> mrc(
    "카카오브레인이 공개한 것은?",
    "카카오 인공지능(AI) 연구개발 자회사 카카오브레인이 AI 솔루션을 첫 상품화했다. 카카오는 카카오브레인 '포즈(pose·자세분석) API'를 유료 공개한다고 24일 밝혔다. 카카오브레인이 AI 기술을 유료 API를 공개하는 것은 처음이다. 공개하자마자 외부 문의가 쇄도한다. 포즈는 AI 비전(VISION, 영상·화면분석) 분야 중 하나다. 카카오브레인 포즈 API는 이미지나 영상을 분석해 사람 자세를 추출하는 기능을 제공한다."
    )
    >>> ('포즈(pose·자세분석) API', (33, 44))
    >>> # when mecab doesn't work well for postprocess, you can set `postprocess` option as `False`
    >>> mrc("카카오브레인이 공개한 라이브러리 이름은?", "카카오브레인은 자연어 처리와 음성 관련 태스크를 쉽게 수행할 수 있도록 도와 주는 라이브러리 pororo를 공개하였습니다.", postprocess=False)
    ('pororo', (30, 34))

This is similar to the pipeline abstraction classes provided by huggingface’s transformers library. However, our goal is to serve the actual model, and one of the requirements is to load the model from a local checkpoint file. Therefore, we need to get to the class that can actually be instantiated from a model weight file.

First, calling Pororo(task="mrc", lang="ko") creates a PororoMrcFactory class. Then, it loads corresponding model and returns through:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
     def load(self, device: str):
        """
        Load user-selected task-specific model

        Args:
            device (str): device information

        Returns:
            object: User-selected task-specific model

        """
        if "brainbert" in self.config.n_model:
            try:
                import mecab
            except ModuleNotFoundError as error:
                raise error.__class__(
                    "Please install python-mecab-ko with: `pip install python-mecab-ko`"
                )
            from pororo.models.brainbert import BrainRobertaModel
            from pororo.utils import postprocess_span

            model = (BrainRobertaModel.load_model(
                f"bert/{self.config.n_model}",
                self.config.lang,
            ).eval().to(device))

            tagger = mecab.MeCab()

            return PororoBertMrc(model, tagger, postprocess_span, self.config)

Okay… So we can observe two things here.

BrainRobertaModel should contain the actual implementation of the model.
PororoBertMrc class would contain some implementaions that are specific to the MRC task. Therefore, this class should be used for inference during serve!

Now, let’s go deeper into the BrainRobertaModel class.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
 class BrainRobertaModel(RobertaModel):
    """
    Helper class to load pre-trained models easily. And when you call load_hub_model,
    you can use brainbert models as same as RobertaHubInterface of fairseq.
    Methods
    -------
    load_model(log_name: str): Load RobertaModel
    """

    @classmethod
    def load_model(cls, model_name: str, lang: str, **kwargs):
        """
        Load pre-trained model as RobertaHubInterface.
        :param model_name: model name from available_models
        :return: pre-trained model
        """
        from fairseq import hub_utils

        ckpt_dir = download_or_load(model_name, lang)
        tok_path = download_or_load(f"tokenizers/bpe32k.{lang}.zip", lang)

        x = hub_utils.from_pretrained(
            ckpt_dir,
            "model.pt",
            ckpt_dir,
            load_checkpoint_heads=True,
            **kwargs,
        )
        return BrainRobertaHubInterface(
            x["args"],
            x["task"],
            x["models"][0],
            tok_path,
        )

Again, we observe that BrainRobertaModel.load_model is returning another object of BrainRobertaHubInterface. This class inherits from fairseq.models.roberta.RobertaHubInterface:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
 class RobertaHubInterface(nn.Module):
    """A simple PyTorch Hub interface to RoBERTa.
    Usage: https://github.com/pytorch/fairseq/tree/master/examples/roberta
    """

    def __init__(self, cfg, task, model):
        super().__init__()
        self.cfg = cfg
        self.task = task
        self.model = model

        self.bpe = encoders.build_bpe(cfg.bpe)

        # this is useful for determining the device
        self.register_buffer("_float_tensor", torch.tensor([0], dtype=torch.float))

Finally, a class that is a plain PyTorch nn.Module! Also, this takes the model parameter, which is in fact, the pretrained base language model.

Phew, it was a long way to go through the different layers of abstractions, but we reached the final level. So what do we need now?

Get the pretrained weight of the base model: roberta in this case. We have to save the weight file (*.pt) and vocabulary files for the tokenizer in advance for the torchserve to work.
Load the pretrained weight, and instantiate a BrainRobertaHubInterface object.
using the BrainRobertaHubInterface object, create a PororoBertMrc object, which can be used as a final model class to perform inference during serve.

The resultant Handler for the PororoBertMrc model can be found here.

One little implementation detail: while loading the pretrained weight, fairseq.hub_utils.from_pretrained is used. This method actually hard-codes some of the path variables, and makes it hard to manipulate. One work around is to simply provide a absolute path to the file. Therefore, in handler.py, a global variable DATA_PATH is defined.

Usage

The setup details are provided in the README of the following repository.

After properly set up, one can send requests to the server and receive a json object. A simple example in Python is:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
 import requests

question = "조사된 사람은 몇명이야?"
context = """청와대는 대통령비서실·국가안보실 행정관급 이하 직원 및 직계 가족 3,714명과 경호처 직원
            및 직계가족 3,458명에 대한 토지 거래 내용을 자체 조사한 결과, 경호처 소속 A씨가 2017년
            9월 광명 지역 토지 413m²를 형수와 매입한 사실을 확인했다고 정만호 청와대 국민소통수석이
            전했다. A씨는 경호처 과장(4급)이다. 청와대는 사실을 확인한 지난 16일 곧바로 대기발령 조치했다.
            A씨는 청와대에 '부모님 부양 목적으로 가족과 공동 명의로 매입한 땅'이라는 취지로 해명했다.
            그러나 청와대 고위 관계자는 "(투기) 의심 사례"라며 "(소명 내용은) 제외한 채 거래 사실
            및 구입과 관련한 자료만 특수본에 넘기기로 했다"고 밝혔다. 이번 사례는 조사단 조사의 '맹점'을
            고스란히 보여준다. 조사단은 이날 발표를 포함해 LH, 국토교통부 직원 등 2만3,000여 명의
            토지 거래 현황을 전수조사했지만, '본인'으로만 대상을 한정했다. 이 때문에 이번처럼 배우자
            등 가족 명의 거래 내용은 파악할 수 없다. A씨 형은 3급 상당 직원으로 LH 측에 '가족이
            3기 신도시 내에서 토지 거래를 했다'고 자진 신고한 것으로 전해졌다. 비서실·안보실 직원 대상
            조사에서는 3기 신도시와 인근 지역에서의 부동산 거래 3건을 확인했으나, 투기로는 의심되지
            않는다는 게 청와대 측 설명이다."""

data = question + "|" + context

r = requests.post("http://localhost:8888/predictions/mrc", data=data.encode("utf-8"))

print("answer:", r.json())

>>> answer: ['3,458명', [37, 41]]

2021 Update!

Recently, another cool project named gradio has been published. This is a Python based library that builds a simple web based UI for any function with a very easy to use APIs. Since this API works at a high level and does not require actual model binary for serving, it is much easier to use.