Developing Maritime LLM Competency (DEMALCO)

Large language models (LLMs) may be the most important digital technology in shipping since the internet and the satellite phone.

But the industry has not yet developed much ability to use them in ways a shipowner or senior manager might care about, such as avoiding accidents and expensive mistakes, improving operational and vessel efficiency, passing vetting inspections.

What they do so far for us is write emails and reports, run chatbots and write code. It looks impressive, but is also probably just scratching the surface of what they are capable of in shipping.

Its becoming clear that shipping companies will need internal expertise on LLMs.

Getting value from them will involve connecting the LLM with multiple internal data sources, putting the right scaffolding around it, and supporting people to use properly.

You can't just purchase a LLM as part of a software package. Or if you did, it might be a very expensive one which doesn't do so much of what you need.

To support development of this competency, as People Tech Maritime we are proposing the Developing Maritime LLM Competency program which makes a nice word DEMALCO.

It will have four levels - awareness, tools selection, integration and leadership.

As you develop your LLM competency, you assess which level you have reached. If you apply for a job which wants someone at a certain level, you can be externally assessed about whether you reach it as part of the application process.

We'll share some ideas about what may be involved at each level, and perhaps stimulate you to think of share your own ideas.

1) AWARENESS LEVEL

Description of basic LLM capability:

generate text in response to a prompt
do web searches and bring the results together,
generate code
generate images and movies.

Some basic tasks LLMs can do building on this:

analyse an investment portfolio
suggesting how we can write better emails (embedded in Gmail / Office)

Examples of how maritime companies have used LLMs so far:

writing emails
getting answers to questions based on material on the internet, or specific documents included in the prompt, such as company manuals, HR procedures.
creating custom training materials
writing, research, documentation
chatbot for company procedures
talking to other software tools
creating custom training materials for crew, based on a text transcript of a video

Some basic technical features:

LLMs can create their own code to 'talk' to another software's API..

So may be able to interact with the same information that would be displayed in the user interface (but not engage with the data in the software at a deeper level).

LLMs work by converting text data or other data into what's known as a numerical vector, which is a simplified version of it, then they find patterns and create new material with the same patterns.

Choice of LLM model:

Nearly everybody is using public LLMs, known as foundational models, like ChatGPT
A new version of ChatGPT is released every 3 months roughly.

The model is trained on material available during the training process, not continually trained.

You can also make your own private models, by downloading the code and training it on your own data. This can be known as small language models. It doesn't seem to be happening so much.

2) TOOL SELECTION LEVEL

When to use LLM based tools and when not to:
For data analytics, with explicit answers, you probably don't want to use large language models.

If the challenge is involving relationships between specific data, you are probably better off with classical AI based tools, and you'll probably buy software to do this, unles you have in-house data scientists, because this needs specialist expertise. For example predictive maintenance, understanding vessel speed / power relationships, identifying performance degradation, optimising voyages, optimising chartering by matching vessel with cargo.

If you choose CoPilot write your emails may be very dangerous - just because it doesn't understand your working environment and what you need to do.

If you provide staff with access to a LLM to do an important operational task, such as finding the appropriate procedure, then there's costs if it makes a mistake. You need to test it, set guardrails in the prompt, warn people it may give the wrong answer, advise them to check against source material if they have any doubt.

Providing seafarers with LLM access may work, if you trust the satcom connectivity reliability, so it can work fast enough.

Choice of LLM:
If you are using a LLM, you'll probably be using a foundational model like ChatGPT.
Lots of people are talking about some being better than others. Other people say it really doesn't matter, and you don't necessarily need the best one.

Structure of your set-up:
The simplest set-up example is a chatbot - which might involve code which sends a request from your website to the LLM , together with your company manual, with instruction to find the answer to the question in the manual.

3) INTEGRATION LEVEL

Some definitions of systems involving LLMs:

Agent, where LLM is part of a tool which achieves a specific task

Scaffolding, the code a company puts around its use of a LLM to get desired outputs from it

RAG, where LLM writes code to speak to another software's API, and uses the output in its answer.

Cybersecurity concerns with integration:
While data added in prompts shouldn't end up getting released to anyone over the internet, it will be available to employees of the LLM operator at least, so shouldn't include big secrets.

Data a LLM can work with:

The hard part of integrating LLM with your data. it needs contextualising, or putting together. That sounds like a complex technical topic, but the rough summary could be, if it is put together so a person can understand it with no prior knowledge, so can a machine.

Text is already created by someone to support someone else's understanding, so written text by definition is contextualised. For example, you might explain that a certain sensor is attached to a certain pump.

If you expect LLMs to run across all your raw data in multiple systems and get the right answers you are likely to be disappointed.

People are talking about AI governance policies, setting rules in your company about how you use it.

4) LEADERSHIP LEVEL

Where you are driving the LLM roll-out in your company, or playing an active role.

At this level you'll want to understand what LLMs may be able to do (even if they can't now) and how to get there.

For example:

help avoiding operational mistakes
help supporting vessel performance decisions

And to undersatnd what they are far from being able to do, such as directly interrogate corporate databases / software data stores and get answers.

They can't work out cause and effect relationships the way a person can, because they can only work with data in front of them.

You should be able to identify LLM overpromising / hype and exaggeration. For example, a promise LLMs can help you pass vetting inspections. LLMs may be able to get insights from vetting data (comparing current vessel with past vessels and inspection results), but they cannot directly advise on whether a ship will pass an inspection.

Appropriateness of AI in specific maritime tasks:
In recruitment. A LLM can easily screen CVs for you, but perhaps it changes the game to reward candidates with "good for AI" applications. Which might have been written by AI.

Dangers of relying on LLM generated explanation - a LLM can't abstract information in a way good for people to read, like a person can. So LLM explanations can be exhausting and difficult to read.

As leadership you might want to develop your company's enterprise AI strategy, a "full AI environment"?

​Developing Maritime LLM Competency (DEMALCO)

Developing Maritime LLM Competency (DEMALCO)