Using Airflow to get notified over Slack when things don’t go as planned while loading data

At Sirena, we use Apache Airflow for running all of our data orchestration. Airflow is an open-source tool that was created by Airbnb to create and schedule data workflows. In essence, Airflow is an orchestrator that runs tasks on given frequencies while also handling backfilling, task dependencies and so much more. In Airflow, workflows are defined as Directed Acyclic Graphs (DAGs) that define the dependencies among different tasks.

At Sirena, we found another use for Airflow besides helping us move data from one place to another. We also use it to trigger alerts on certain data quality measures. The idea…


Hands-on Tutorials

Implementing a custom ensemble model with under-sampling for imbalanced data

This post will show you how to implement your own model and make it compliant with scikit-learn’s API. The final result will be a model that can not only be fitted and used for predictions but also be used in combination with other scikit-learn tools like grid search and pipelines.

Introduction

This post traces back to a few months ago. In one of my machine learning courses, we were discussing the topic of imbalanced data and how algorithms have a hard time learning when data is not balanced. For our learning algorithms, the event we are trying to predict is so…


In a previous post, I discussed the concept of linear regressions in the realm of Bayesian statistics. I will do something similar in this post but we’ll change the topic a little bit. We’ll go from finance to weight loss.

Introduction

I have recently decided to start being more healthy in my life (about time). However, if I wanted to make a change in my life, it meant I was going to have to measure those changes to make sure I’m going in the right direction. …


How to detect changes in usage, react in time and make your customers happy

The topic of anomaly detection is fascinating. There is a vast number of methods that can be used, from simple statistics to more complex unsupervised learning methods. Additionally, the impact of anomaly detection is huge. The ability to detect when things are not going as planned is a fantastic tool and one that has the potential to save lives (for example in preventive maintenance).

In this article, we will explore an extremely easy approach to anomaly detection. Sometimes the simplest solutions can have the greatest impact. …


Continuing with my (mostly) healthy obsession with Bayesian statistics (see my previous article), in this article, I’ll use a linear regression model from a Bayesian perspective.

To demonstrate how linear regressions in the context of Bayesian statistics work, I will use the CAPM to estimate stock’s Betas.

Introduction

It’s not fundamentally important that you are familiar with CAPM to understand the example of regressions using Bayesian statistics. However, for those out there who might be interested, CAPM stands for Capital Asset Pricing Model. The model defines a relationship between an asset’s (usually stocks) expected returns and the market risk premium. …


In this article, I will walk you through the process of performing an A/B testing analysis from the perspective of Bayesian statistics to determine the best WhatsApp message template to send to customers.

To be fair, there is no difference if this is about a website landing page, a product feature or WhatsApp messages. The logic of A/B testing can be applied to any experiment. However, because I work at Sirena, I find it appropriate to use the example of WhatsApp messages.

Context about the problem

Let’s set up the case for some context. You are an online retail business operating through WhatsApp in…

Juan Gesino

Head of Data @ Sirena

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store