Data Logistics with Apache Nifi

Manuel Lamelas BigData Architecture, Technology Leave a Comment

As announced in a previous post we’re now going to introduce you to Apache Nifi, the latest trend in ingestion tools. A new project from the Apache Software Foundation that allows you to manage data flows with a cool graphical interface. If we didn’t catch your attention yet, wait until you hear this: NSA created it!!! Nifi – the UPS of data …

Kerberos & Hadoop: Securing Big Data (part I)

Celeste Duran BigData Architecture, Technology Leave a Comment

When I began to use Hadoop with Kerberos I felt as I was in the middle of the ocean. I found a lot of information about Kerberos technology but it was very difficult for me to find something about how to use it on Hadoop, why to use it and how to configure it for working with Hadoop. This trilogy of posts is going to …

Morphlines – Hadoop ETL by Cloudera

Manuel Lamelas BigData Architecture Leave a Comment

Today we are going to talk about Morphlines,  an open source framework developed by Cloudera, that provides a new way to do ETL on Hadoop. What are these morphlines? Morphlines are simple configurations files that defines how to transform data on the fly. It consists on a file that describes the steps a data flow has to pass in order to …

Ingest & Search JSON events in Real Time (III): Flume Architecture & BenchMarking

Manuel Lamelas BigData Architecture Leave a Comment

To end this series of articles in Ingestion & Searching we are going to see the Flume Architecture for High Availability and see some benchmark tests. Flume Architecture To achieve high availability we have two flume characteristics to play with: 1. File Channel vs Memory Channel This is a decision on 100% delivery vs fast ingestion. With file channel the …

Data Science Madrid Meetup Enero 2017

Datatons BigData Architecture, Business, Data Science, Technology Leave a Comment

El pasado 21 de de Enero tuvo lugar el meetup mensual de Data Science Madrid, en esta ocasión además pudimos colaborar con nuestros colegas de Madrid 4 OpenStack para celebrar en el evento en el Auditorio de Medialab Prado. La agenda de las charlas fué la siguiente: • “Open Stack y Big Data” por Daniel Mellado y Carlos Camacho Introducción …