ONNX - One Framework to Rule Them All (or why you very much should use ONNX for Machine Learning)

Table of Contents

Ugh, must I read the entire article?

Well, yes, you must. But because I know you won’t, anyway - who ever reads articles these days? - here’s the takeaway. The rest of the article is my defiance of Grammarly suggesting conciseness at every other word:

  • ONNX fast
  • ONNX nice
  • use ONNX because it is fast and nice
  • don’t try and work with different services yourself, make ONNX do it

Further below there is also a brief example of training and saving a PyTorch model with ONNX, but you probably don’t want to read that. Grammarly insists I should be concise, after all.




What is ONNX?

A Something that is Useful. That’s it, that’s the post.

You really want to know more? Sigh.

ONNX - standing for Open Neural Network Exchange - is an open-source project, supported by a host of big-tech companies and a large, active community.

Its main goal is to break the shackles placed by tool and service groups upon ML and AI developers. There are so many different tools that simply cannot integrate with each other.

Okay, so why bother trying to integrate? Just.. I dunno, use one good service?

Are you daft? As efficient as a particular tool or service may be, being able to pool resources from other frameworks is always going to be more efficient.

And that is the end-goal of ONNX - to allow this cross-service resource pooling, diversifying the tools available to developers.

This way, you cannot blame your failing model on being unable to access better models in a different framework, ha. Take that!

Anyway.


Okay, fine, it Integrates. Why must I use it though?

… isn’t that reason enough? Integration is Nice, okay? It is Useful, because I said so.

Fine, if you insist, I’ll tell you why.

Say you have a model trained using a certain service SentorLowf. You adore this model - it has an accuracy so high it could break the ceiling. A small problem though - TensorFlow’s inference sucks. It is so bad you feel like burning your computer down everytime you must beg a prediction of your model.

There is also a service called PorchTy. This allows you to make inferences so fast that the prediction is delivered even before you submit the input. It essentially reads your mind, predicts what the input will be, predicts an output and returns it faster than the blink of an eye. The caveat, of course, is that its training is so bad it takes 10,000 epochs for the model to learn a knife from a fork. Simply unbearable, really.

Wouldn’t it be great if you could just combine the best parts of both these frameworks and get an epic solution that trains with superhuman intelligence and delivers results faster than lighting? Truly, One Solution to Rule Them All.

Of course, there is no possible way for this to happen. One model is saved in a different format, with different properties and tags and technical jargons, than another. There is no possible way for the blazing inference of PorchTy to understand the genius model of SentorLowf. No way at all.

Except, of course, if you use ONNX.

… what? Are you seriously suggesting that this I am a paid shill? The audacity!

Ugh, how I detest interruptions. Where were we? Ah yes, use ONNX.

Imagine all that scenario again, except this time, after training your superb model in SentorLowf, you add a small line at the very end.


ặ̴̾̂z̸̛͎̪͇̟͓̱͓̤̙̎̂̓̀̄̄x̴̧̛͔̱͙̪͎̖̥͈̳̾͝d̴̛̠̗͑́̀̿̏͘͝f̴̰͛͂́͂̾̕͠g̸̨̠͇̯̺̳̻̈̈́̐̄͛̅̓̎b̷̧͓̭͓͕͂̌̃͠ĥ̴̡̢̛͔̮̝̋̂͒̓̒̚͘n̴̨͖̗͍͉̈́̔̊̃̆̈̾j̷̡͔͈̬̠̽̔̃̽̈́̐̇͝l̸͓̗̮̖̬͊́͛͗̒͝͝d̸͖̖̫͔̥̣͐̔̔̈̚j̸̛̛̱͆̉̃̇̚̕s̶̮̞͖̣͍̩̑̉̆̏̾̆̓̓͜c̸̛̯̞͔̗̟̺͈̅͆̿͑̈́̃̕͝͠ņ̵̨̰͍̭̖̻̖̐̑̇ͅ


A simple enough invocation, really - and oh, pray don’t be alarmed! Perfectly safe, perfectly safe - I assure you.

Having typed this proclamation, you save your model as normal, and resignedly stump over to the PorchTy inference. You, of course, have no hope, because you know this won’t work - it never does.

But what harm could there be in trying one last time? It isn’t that much effort, is it?

See, knew you’d see reason eventually.

Imagine. You start the inference script. You expect the errors and the could not load models and the tracebacks.

But they never come.

Instead, all you get is a prediction - even before you could send in the picture of a thing you opened prior.

Prediction: Thing

Are you convinced yet?

… it’s not magic, you say. Alright, alright - I’ll give that one to you - it isn’t magic. But, and now be honest, would you ever have been convinced without the magical part?

Hehe. Thought so.


Okay, fine, it’s great. What are the drawbacks?


… Drawbacks? What? Why would you care about that? If it works it works, right?

Ugh, fine. Drawbacks.

The only major drawback to ONNX for now is that it’s relatively new. A young’un, you might say. As such, some frameworks still don’t have support for it, especially in non-Python languages. I have also read that its support for some hardware architecture is not as well-developed as older libraries. However, I do believe this has since been corrected, and I do not think there are any major drawbacks to using ONNX, especially if you’re sticking with Python and the ol' regulars.

No, I am not a paid shill! Whatever is the matter with you?!


Fine, show me how it’s used then…


So now you decide you wanna see how it’s used? After interrogating me for so long! Hmph.

Using ONNX is really as simple as I said, even though you don’t believe me, for some incomprehensible reason. You do really just add one line, although I will concede it isn’t magic.

I think a quick breakdown of how ONNX models are created is in order.

ONNX models depend upon converters to transform certain parts of the base model - generally, the parts that handle inference and such - from the base model components to streamlined, ONNX ones. The end result of this is all the benefits mentioned above; whatever the base model may be written in, the part that we have to deal with for inference is universal.

Okay, so what are these converters?

That depends upon what framework you are working with for a base model. ONNX integrates with a variety of frameworks, and each of those frameworks have their own methods for converting models. For instance, PyTorch has a built-in package torch.onnx that allows ONNX integration, while Scikit-Learn uses an open-source package sklearn-onnx.

For the sake of this example I use a PyTorch model. This only affects the model conversion process - after that, the streamlined inference kicks in and it doesn’t matter.

If, for some absurd reason, you are not using PyTorch, then… use PyTorch? Or don’t, I don’t care. The syntax will be slightly different than what’s below if you use another framework, though.

But just.. Read the docs. They have everything you need to know.

Anyway, moving on.

To start with, you need to have a model. No, I won’t show you how to get to that point, just Google it or something. Not my concern, really.


my_model = ARandomTorchModel

Now that we have a model, let’s actually work the magic.

torch.onnx.export(
    MyModel, # The Torch model to export.
    torch.randn(1, 3, 150, 150), # Expected image size. This will change depending on what sizes you trained on. 
    "my_cool_onnx_model.onnx", # The name of the ONNX model.
    do_constant_folding=True, # Optimisation things.
    export_params=True, # Makes sure the trained weights get exported too.
    input_names=['input'], # Saves input names.
    output_names=['output'] # Saves output names.
)

There. That’s the magic bit. That little function there saves out the Torch model as an ONNX one.

There are some further customisations that may be needed on an individual basis, but this is the baseplate - with this, you’re good to go and get a functional model out of it.

After running this little piece of work, you will notice a nice .onnx model showing up in your folder.


Okay, that’s cool - ONNX model, yay fancy. What do we do with this now?

Ugh, you really have no patience! I was getting to that, you know.

To get predictions from an ONNX model, you have to use a runtime for ONNX. We will use one that is called - and it gets an award for creative naming - ONNXRuntime. See, I knew you’d be struck by the genius of that name!

This runtime is also an open-source project backed by Microsoft. It is stupidly easy to use and is fast, lightweight, etc. - all the standard benefits of ONNX come included.

And here’s where the best (yes, yes, I said it before - but there are multiple bests, okay) part about ONNX comes in - the runtime does not care about what your model initially was. All it needs is to receive an ONNX model, and it can run inferences from it. Quick and easy.

No, still not a paid shill, don’t even start.


Okay, show me the runtime already!

Your impatience is really getting on my nerves now… Sometimes I wonder why I even bother…

Anyway.

There are also some magic lines involved in using the runtime - it’ll be easier to go through them step-by-step instead of all at once. I doubt there will be much comprehending happening if I were to go through them together.

import onnxruntime as rt

Magic line no. 1 - and I doubt this one needs much explaining - you import the ONNX Runtime. As rt. That’s just for convenience and isn’t even really convention, but it is ten times easier than typing a painstaking onnxruntime over and over again.

session = rt.InferenceSession("my_cool_onnx_model.onnx")

Magic line no. 2. This is the line the loads up the model and makes it available for us to, y’know, predict things with. Use the model. Do what the entire point of this article is. Make it Think on our behalf.

input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

Remember when saving the model with ONNX we specified input and output names? This is why we needed them. No, not just to get them and save them - we need them to run predictions, as you will see in just a moment.

results = session.run([outputs], {inputs: image})[0]

This is the line that actually calls the model, and passes in a (preloaded) image to the model. The output from this is kind of messy - it returns a NumPy array of probabilities.

To access the actual prediction, you need to have a list of labels beforehand, that can be assigned to the prediction values.

labels = ['Item1', 'Item2', 'Item3', 'Item4', 'Item5']

label = labels[np.argmax(results)]

For ease I am just defining it down here - this is not ideal file structure, but I’m sure you can figure out variable-order-conventions by yourself…

Anyhow. label will now contain the final prediction of whatever-it-is you tried to get a prediction of.

As you can doubtless see, ONNXRuntime and ONNX make the entire ordeal of obtaining predictions a breeze. It took us all of seven lines to get a prediction! You cannot possibly have any doubts as to why it’s useful now - I think it quite speaks for itself on that account.


Okay, okay - got it. Are we done yet?

Not quite - there is one last point which I’d like to raise to further sway you.

ONNX runtime is much faster but also lighter, requiring fewer dependencies to run than most inference frameworks. If you’d like to read exactly how much faster and lighter ONNX Runtime is, give this a read - the kindly author ran all the tests and brought out the figures to prove ONNXs worth. Skip to the end to read just the tests without the FastAI-ONNX conversion stuff.

Yes, yes, I’m done. Unless you still believe ONNX is not worth using, in which case… You know what, in that case you’re just a lost cause I suppose, and I shan’t trouble myself further about you.

In conclusion:

  • ONNX makes life a whole lot easier
  • ONNX makes your model a whole lot faster
  • ONNX makes your deploy a whole lot lighter
  • Use ONNX.

Well, I hope this article helped you figure out what ONNX and ONNX Runtime is, and if it’s a worthwhile addition to your project.

Assuming you made it this far, that is. Kudos to you if you did so.