As my first Open Source Summit, I wasn’t really sure what to expect. I arrived in Edinburgh on the Sunday afternoon as the conference keynotes started at 9AM sharp on the Monday morning. My accommodation was the Mercure Edinburgh Haymarket which was around 5 minutes walk from the conference which was at the Edinburgh Conference Centre.
On Monday morning I made it to the conference centre for around 8.30 with what I thought was plenty of time to have a wander round the sponsors area and grab some freebies (was especially afeter a SUSE chameleon in case they ran out - I needn’t have worried). The sponsor stands were quite impressive, lots of big names, RedHat, Oracle, Intel, BMW, SUSE along with lots of smaller embedded players. The BMW stand in particular stood out for me for having their infotainment development system, recreating the hardware present in the car, screens, speakers, modules etc.
I ended up talking to someone on the Synopsys stand about container scanning and only made a run for the main auditorium for the keynote at just before 9! Unfortunately the auditorium was full so I ended up being sent to the overflow room where unfortunately there was only an audio feed from the main auditorium - no video. This was a bit of a disappointment.
I won’t make any comment on the keynotes as these are already available on the Linux Foundation Youtube page, but they were of a good standard.
The keynotes lasted until around 11, at which point I went to the first talk of my own selection -
A Pragmatic Introduction to Machine Learning with Maartens Lourens (@thundercomb), Automation Logic
This talk was aimed at engineers, so no deep mathematics required and explained the background of Machine Learning (he didn’t delve into Deep Learning - this was purely classical Machine Learning)
Recommended Andrej Karpathy “The unreasonable effectiveness of recurrent neural networks - he is now at Tesla, and referenced AlphaGo which beat Go masters with ML as well as DeepMind, now bought by Google.
The key is to understand where ML can be applied. In his words, ML can be summed up as learning from data.
Data science produces insights, machine learning produces predictions, AI produces actions (David Robinson - varianceexplained.org)
As an example of the application of machine learning he showed how it could be used for log parsing. Log lines are not in a standardised format - different date formats used, process naming etc, some could be JSON. Also the case of “previous message repeated x times”.
The first step is classification - this is supervised machine learning.
Training Data -> data preparation -> training -> model
For the solution he used Python, scikit-learn, which can deal with classical and statistical ML, and Pandas a library that works with tabular data.
First the data was divided into training data and testing data which will later be used to test how well the model is trained. Depending on the amount of data you have will dictate the proportion split of training to test data.
The code he showed used the variable names x and y which would normally be frowned upon in programming as being undescriptive but these are a convention used in the ML world. X is used to refer to data, y the type data (answers).
At this point we don’t know which algorithm is most suited; binary tree, support vectors, fit function.
We can use these different algorithms to create models.
After creating the model we get out metrics of how successful the model is. Potential outputs are True and False positives True and False negatives
He explained this using the boy and the wolf. First the boy cried wolf when there was no wolf - false positive Then the wolf actually showed up and he cried wolf - true positive Default position of there being no wolf and him not crying wolf - true negative Wolf and not shouting wolf - false negative
He showed the scikit-learn functions for getting out a confusion matrix and accuracy report.
The main learning were to get to know your data.
The example code from the talk is available at https://github.com/automationlogic/log-analysis - this is a Jupyter notebook. He also recommended checking out Andrew Ngs tutorial.
AIOps: Anomaly detection with Prometheus - Spice up your monitoring with AI by Marcel Hild, Red Hat
Explained about Project Thoth which is to do with evaluating AI workloads on OpenShift
He started by explaining the architecture of Prometheus, and the various back ends that can be used: Thanos - long term storage of prometheus data , unlimited retention, prometheus at scale.
InfluxDB can also be integrated. Data scientist will love tis as can be hooked uo to Pandas. Unfortunately there is no way to do clustering without paying.
Ceph object storage can be used. Part of Open Data Hub and can be S3 compliant. Data can be scraped into Ceph. THis has a future proof migration path to Thanos as the data is in exactly the same structure as you would get back from Prometheus.
Search can be performed using Apache Spark - also good for data scientists.
The anomaly detection is done with Prophet. This is a container that scrapes Prometheus, does stuff and and puts it back into Prometheus.
He showed a demo running locally on MiniShift.
There is a fair amount of configuration that needs to be done, but the commerical AIOps solutions, although they provide a more polished experience still require a lot of configuration.