Open Source Summit Europe 2018 - A review

As my first Open Source Summit, I wasn’t really sure what to expect. I arrived in Edinburgh on the Sunday afternoon as the conference keynotes started at 9AM sharp on the Monday morning. My accommodation was the Mercure Edinburgh Haymarket which was around 5 minutes walk from the conference which was at the Edinburgh Conference Centre.

On Monday morning I made it to the conference centre for around 8.30 with what I thought was plenty of time to have a wander round the sponsors area and grab some freebies (was especially afeter a SUSE chameleon in case they ran out - I needn’t have worried). The sponsor stands were quite impressive, lots of big names, RedHat, Oracle, Intel, BMW, SUSE along with lots of smaller embedded players. The BMW stand in particular stood out for me for having their infotainment development system, recreating the hardware present in the car, screens, speakers, modules etc.

I ended up talking to someone on the Synopsys stand about container scanning and only made a run for the main auditorium for the keynote at just before 9! Unfortunately the auditorium was full so I ended up being sent to the overflow room where unfortunately there was only an audio feed from the main auditorium - no video. This was a bit of a disappointment.

I won’t make any comment on the keynotes as these are already available on the Linux Foundation Youtube page, but they were of a good standard.

The keynotes lasted until around 11, at which point I went to the first talk of my own selection -

A Pragmatic Introduction to Machine Learning with Maartens Lourens (@thundercomb), Automation Logic

This talk was aimed at engineers, so no deep mathematics required and explained the background of Machine Learning (he didn’t delve into Deep Learning - this was purely classical Machine Learning)

Recommended Andrej Karpathy “The unreasonable effectiveness of recurrent neural networks - he is now at Tesla, and referenced AlphaGo which beat Go masters with ML as well as DeepMind, now bought by Google.

The key is to understand where ML can be applied. In his words, ML can be summed up as learning from data.

Data science produces insights, machine learning produces predictions, AI produces actions (David Robinson - varianceexplained.org)

As an example of the application of machine learning he showed how it could be used for log parsing. Log lines are not in a standardised format - different date formats used, process naming etc, some could be JSON. Also the case of “previous message repeated x times”.

The first step is classification - this is supervised machine learning.

Training Data -> data preparation -> training -> model

For the solution he used Python, scikit-learn, which can deal with classical and statistical ML, and Pandas a library that works with tabular data.

First the data was divided into training data and testing data which will later be used to test how well the model is trained. Depending on the amount of data you have will dictate the proportion split of training to test data.

The code he showed used the variable names x and y which would normally be frowned upon in programming as being undescriptive but these are a convention used in the ML world. X is used to refer to data, y the type data (answers).

At this point we don’t know which algorithm is most suited; binary tree, support vectors, fit function.

We can use these different algorithms to create models.

After creating the model we get out metrics of how successful the model is. Potential outputs are True and False positives True and False negatives

He explained this using the boy and the wolf. First the boy cried wolf when there was no wolf - false positive Then the wolf actually showed up and he cried wolf - true positive Default position of there being no wolf and him not crying wolf - true negative Wolf and not shouting wolf - false negative

He showed the scikit-learn functions for getting out a confusion matrix and accuracy report.

The main learning were to get to know your data.

The example code from the talk is available at https://github.com/automationlogic/log-analysis - this is a Jupyter notebook. He also recommended checking out Andrew Ngs tutorial.

AIOps: Anomaly detection with Prometheus - Spice up your monitoring with AI by Marcel Hild, Red Hat

Explained about Project Thoth which is to do with evaluating AI workloads on OpenShift

He started by explaining the architecture of Prometheus, and the various back ends that can be used: Thanos - long term storage of prometheus data , unlimited retention, prometheus at scale.

InfluxDB can also be integrated. Data scientist will love tis as can be hooked uo to Pandas. Unfortunately there is no way to do clustering without paying.

Ceph object storage can be used. Part of Open Data Hub and can be S3 compliant. Data can be scraped into Ceph. THis has a future proof migration path to Thanos as the data is in exactly the same structure as you would get back from Prometheus.

Search can be performed using Apache Spark - also good for data scientists.

The anomaly detection is done with Prophet. This is a container that scrapes Prometheus, does stuff and and puts it back into Prometheus.

He showed a demo running locally on MiniShift.

There is a fair amount of configuration that needs to be done, but the commerical AIOps solutions, although they provide a more polished experience still require a lot of configuration.

Open Source Summit Europe 2018 - A review

2018/10/25

A Pragmatic Introduction to Machine Learning with Maartens Lourens (@thundercomb), Automation Logic

AIOps: Anomaly detection with Prometheus - Spice up your monitoring with AI by Marcel Hild, Red Hat

Getting Your Patches in Mainline Linux: What Not To Do (and a few things you could try instead) - Marc Zyngier, ARM

Open Source MQTT Brokers with Leon Anavi, Konsulko Group

Porting OpenMandriva to many architectures with Bernhard Rosenkranzer

How to handle Security flaws in an Open Source Project - Jeremy Allison

Accelerated Linux Build System with Jeff Shaw

Tuesday

The future of AI is Data…In more ways than you think - Eric Berlow, Vibrant Data Inc

Building an Open Source Culture at Microsoft - Stephen Walli, Microsoft

Digital Echoes: understanding patterns of mass violence - Patrick Ball, Human rights data analysis group

How to avoid writing kernel device drivers - Chris Simmonds

Performance tuning and Troubleshooting in container platforms - Manoj Pillai, Red Hat

Deploying Apache Kafka for Exabyte scale data coordination - James Mountfield, Sumo Logic

Natural Language processing with Python - Barbara Fusinska

oomd - Daniel Xu

Wednesday

Why lock down the kernel? - Matthew Garrett, Google

What are my microservices doing? - Juraci Paixao Krohling, Red Hat

Creating your own tiny linux distro using Yocto: Keeping it small with Poky-Tiny - Alejandro Hernandez, Xilinx

Spectre, Meltdown & Linux - Greg Kroah-Hartman, The Linux Foundation

Kata containers - Eric Ernst and K. Y. Srinivasan

Community vs Enterprise: How not to piss off your community (and still be profitable) - Colin Charles, GrokOpen