Federico Cargnelutti https://blog.fedecarg.com Posts from 2008 - 2018 Mon, 12 Feb 2024 17:54:52 +0000 en hourly 1 http://wordpress.com/ https://blog.fedecarg.com/wp-content/uploads/2017/07/61066.jpeg?w=32 Federico Cargnelutti https://blog.fedecarg.com 32 32 Intercepting class method invocations using metaclass programming in Python https://blog.fedecarg.com/2018/10/08/intercepting-class-method-invocations-using-metaclass-programming-in-python/ https://blog.fedecarg.com/2018/10/08/intercepting-class-method-invocations-using-metaclass-programming-in-python/#respond <![CDATA[Federico]]> Mon, 08 Oct 2018 19:52:00 +0000 <![CDATA[Programming]]> <![CDATA[Python]]> http://blog.fedecarg.com/?p=2190 <![CDATA[In Ruby, objects have a handy method called method_missing which allows one to handle method calls for methods that have not been defined. Most examples out there explain how to implement this in Python using __getattr__, however, none of them (honestly, none) explain how to intercept class method (@classmethod) invocations using __metaclass__. And this is […]]]> <![CDATA[

In Ruby, objects have a handy method called method_missing which allows one to handle method calls for methods that have not been defined. Most examples out there explain how to implement this in Python using __getattr__, however, none of them (honestly, none) explain how to intercept class method (@classmethod) invocations using __metaclass__.

And this is the reason why I created this post.

The function type is the built-in metaclass Python uses, it not only lets you know the type of an object, but also create classes on the fly. When you write, for example: class Example(object) the class object Example is not created in memory straight away. Python looks for the __metaclass__ attribute in the class definition and if it finds it, it uses it to create the object class Example. If it doesn’t, it uses type to create the class. The main purpose of a metaclass is to change the class automatically, when it’s created.

Here’s an example of how to use metaclass programming to intercept class method calls similar to the method_missing technique in Ruby:

class ClassMethodInterceptor(type):

    def __getattr__(cls, name):
        return lambda *args, **kwargs: \
                   cls.static_method_missing(name, *args, **kwargs)

    def static_method_missing(cls, method_name, *args, **kwargs):
        e = "type object 'static.%s' has no attribute '%s'" \
            % (cls.__name__, method_name)
        raise AttributeError(e)

class Example(object):

    __metaclass__ = ClassMethodInterceptor

    def __getattr__(self, name):
        return lambda *args, **kwargs: \
                   self.method_missing(name, *args, **kwargs)

    def method_missing(self, method_name, *args, **kwargs):
        e = "type object '%s' has no attribute '%s'" \
            % (self.__class__.__name__, method_name)
        raise AttributeError(e)

    @classmethod
    def static(cls):
        print 'static.%s' % cls.__name__

    def instance(self):
        print self.__class__.__name__

Console:

>>> Example.static()
static.Example
>>> Example.foo()
Traceback (most recent call last):
...
  File "example.py", line 12, in static_method_missing
    raise AttributeError(e)
AttributeError: type object 'static.Example' has no attribute 'foo'
>>> e = Example()
>>> e.instance()
Example
>>> e.foo()
Traceback (most recent call last):
...
  File "example.py", line 26, in method_missing
    raise AttributeError(e)

AttributeError: type object 'Example' has no attribute 'foo'

If you ever implement something like this, remember that Python doesn’t distinguish between methods and attributes the way Ruby does. There is no difference in Python between properties and methods. A method is just a property whose type is instancemethod.

]]>
https://blog.fedecarg.com/2018/10/08/intercepting-class-method-invocations-using-metaclass-programming-in-python/feed/ 0 kewnode
API Development Tips https://blog.fedecarg.com/2018/06/18/api-development-tips/ https://blog.fedecarg.com/2018/06/18/api-development-tips/#comments <![CDATA[Federico]]> Mon, 18 Jun 2018 12:45:00 +0000 <![CDATA[Node.js]]> <![CDATA[Software Architecture]]> <![CDATA[Web Services]]> http://blog.fedecarg.com/?p=2211 <![CDATA[Organisations who are paying attention already know they need to have an open web API, and many already have under development or in the wild. Make sure you haven’t been caught by the pitfalls of many early API releases. Multiple points of failure: The 5 tips Test it all Plan for future versions Embrace standards […]]]> <![CDATA[

Organisations who are paying attention already know they need to have an open web API, and many already have under development or in the wild. Make sure you haven’t been caught by the pitfalls of many early API releases.

Multiple points of failure:

  • Back-end systems: db servers/caches, hardware failures, etc.
  • Interconnections: router failures, bad cables, etc.
  • External Dependencies: fail whales, random cloud latency, etc.

The 5 tips

Test it all

  1. Unit test are not enough, they are just the beginning.
  2. Test what users experience. Perform end-to-end black box tests.
  3. Replay your access logs. Very accurate.
  4. Validate return payloads. A stack trace is not valid XML.

Plan for future versions

  1. Versions are not sexy/semantic (but do it anyway).
  2. Announce versions often.

Embrace standards

  1. APIs are better when predictable.
  2. Standard approaches mean tools.
  3. Avoid uncomfortable migrations. No one wants an OAuthpocalypse.

Monitor everything & be honest

  1. Trends are your friend.
  2. Users are not your early-warning ops team.
  3. Be open and honest, or your users will tweet that your API sucks!

Fail well

  1. Well-formed errors win friends and makes users more tolerant to failure.
  2. Make monitoring easy.
  3. Don’t punish everyone. Determine who gets hurt most by failures.

Watch the video here: Understanding API Activity by Clay Loveless

]]>
https://blog.fedecarg.com/2018/06/18/api-development-tips/feed/ 2 kewnode
Collective Wisdom from the Experts https://blog.fedecarg.com/2018/06/06/collective-wisdom-from-the-experts/ https://blog.fedecarg.com/2018/06/06/collective-wisdom-from-the-experts/#respond <![CDATA[Federico]]> Wed, 06 Jun 2018 23:06:00 +0000 <![CDATA[Software Architecture]]> http://blog.fedecarg.com/?p=2109 <![CDATA[I’ve finally had a chance to read a book I bought a while ago called “97 Things Every Software Architect Should Know – Collective Wisdom from the Experts“. Not the shortest title for a book, but very descriptive. I bought this book at the OSCON Conference in Portland last year. It’s an interesting book and […]]]> <![CDATA[

I’ve finally had a chance to read a book I bought a while ago called “97 Things Every Software Architect Should Know – Collective Wisdom from the Experts“. Not the shortest title for a book, but very descriptive. I bought this book at the OSCON Conference in Portland last year. It’s an interesting book and I’m sure anyone involved in software development would benefit from reading it.

More than 40 architects, including Neal Ford and Michael Nygard, offer advice for communicating with stakeholders, eliminating complexity, empowering developers, and many more practical lessons they’ve learned from years of experience. The book offers valuable information on key development issues that go way beyond technology. Most of the advice given is from personal experience and is good for any project leader involved with software development no matter their job title. However, you have to keep in mind that this is a compilation book, so don’t expect in-depth information or theoritical knowledge about architecture design and software engineering.

Here are some extracts from the book:

Simplify essential complexity; diminish accidental complexity – By Neal Ford

Frameworks that solve specific problems are useful. Over-engineered frameworks add more complexity than they relieve. It’s the duty of the architect to solve the problems inherent in essential complexity without introducing accidental complexity.

Chances are your biggest problem isn’t technical – By Mark Ramm

Most projects are built by people, and those people are the foundation for success and failure. So, it pays to think about what it takes to help make those people successful.

Communication is King – By Mark Richards

Every software architect should know how to communicate the goals and objectives of a software project. The key to effective communication is clarity and leadership.

Keeping developers in the dark about the big picture or why decisions were made is a clear recipe for disaster. Having the developer on your side creates a collaborative environment whereby decisions you make as an architect are validated. In turn, you get buy-in from developers by keeping them involved in the architecture process

Architecting is about balancing – By Randy Stafford

When we think of architecting software, we tend to think first of classical technical activities, like modularizing systems, defining interfaces, allocating responsibility, applying patterns, and optimizing performance.  Architects also need to consider security, usability, supportability, release management, and deployment options, among others things.  But these technical and procedural issues must be balanced with the needs of stakeholders and their interests.

Software architecting is about more than just the classical technical activities; it is about balancing technical requirements with the business requirements of stakeholders in the project.

Skyscrapers aren’t scalable – By Michael Nygard

We cannot easily add lanes to roads, but we’ve learned how to easily add features to software. This isn’t a defect of our software processes, but a virtue of the medium in which we work. It’s OK to release an application that only does a few things, as long as users value those things enough to pay for them.

Quantify – Keith Braithwaite

The next time someone tells you that a system needs to be “scalable” ask them where new users are going to come from and why. Ask how many and by when? Reject “Lots” and “soon” as answers. Uncertain quantitative criteria must be given as a range: the least, the nominal, and the most. If this range cannot be given, then the required behavior is not understood.

Some simple questions to ask: How many? In what period? How often? How soon? Increasing or decreasing? At what rate? If these questions cannot be answered then the need is not understood. The answers should be in the business case for the system and if they are not, then some hard thinking needs to be done.

Architects must be hands on – By John Davies

A good architect should lead by example, he/she should be able to fulfill any of the positions within his team from wiring the network, and configuring the build process to writing the unit tests and running benchmarks. It is perfectly acceptable for team members to have more in-depth knowledge in their specific areas but it’s difficult to imagine how team members can have confidence in their architect if the architect doesn’t understand the technology.

Use uncertainty as a driver – By Kevlin Henney

Confronted with two options, most people think that the most important thing to do is to make a choice between them. In design (software or otherwise), it is not. The presence of two options is an indicator that you need to consider uncertainty in the design. Use the uncertainty as a driver to determine where you can defer commitment to details and where you can partition and abstract to reduce the significance of design decisions.

You can purchase this book on Amazon: 97 Things Every Software Architect Should Know

]]>
https://blog.fedecarg.com/2018/06/06/collective-wisdom-from-the-experts/feed/ 0 kewnode
How to create a Data Container Component in React https://blog.fedecarg.com/2018/03/24/react-data-container-component-pattern/ https://blog.fedecarg.com/2018/03/24/react-data-container-component-pattern/#respond <![CDATA[Federico]]> Sat, 24 Mar 2018 14:31:38 +0000 <![CDATA[Design Patterns]]> <![CDATA[Frameworks]]> <![CDATA[Javascript]]> <![CDATA[Node.js]]> <![CDATA[React]]> http://blog.fedecarg.com/?p=2393 <![CDATA[One pattern I’ve used quite a lot while working with React at the BBC and Discovery Channel is the Data Container pattern. It became popular in the last couple of years thanks to libraries like Redux and Komposer. The idea is simple. When you build UI components in React you feed data into them via […]]]> <![CDATA[

One pattern I’ve used quite a lot while working with React at the BBC and Discovery Channel is the Data Container pattern. It became popular in the last couple of years thanks to libraries like Redux and Komposer.

wireframe

The idea is simple. When you build UI components in React you feed data into them via containers. Inside those containers you may need to access different data sources, filter data, handle errors, etc. So data containers help you build data-driven components and separate them into two categories: Data components and Presentational components.

  • A Presentational component is mainly concerned with the view, it doesn’t specify how the data is loaded or mutated. They receive data and callbacks exclusively via props.
  • A Data component talks to the data sources and provides the data and behaviour to the Presentational component. It’s usually generated using higher order function, such as connect() or createContainer().

There are actually 2 ways to implement this pattern, using inheritance or composition:

  1. Inheritance: a React component class extends a Data Container component class.
  2. Composition: a React component is injected into the Data Container (React Komposer uses this approach).

I recommend composition over inheritance as a design principle because it gives you more flexibility.

Example

Let’s say you want to display a list of notifications and you have 2 components: NotificationsContainer and NotificationsList

First, you need to fetch the data and add it to the NotificationsContainer:

import React, { createElement } from "react";
import PropTypes from "prop-types";
import https from "https";
import DataStore from "/path/to/DataStore";

export default function createContainer(SubComponent, subComponentProps) {
  class DataContainer extends React.Component {
    constructor(props) {
      super(props);

      this.name = props.name;
      this.dataSourceUrl = props.dataSourceUrl;

      this.state = {
        data: null,
        error: null,
      };
    }

    componentDidMount() {
      this.setInitialData();
    }

    setInitialData() {
      if (DataStore.hasData(this.name)) {
        this.setState({
          data: DataStore.getData(this.name),
        });
      } else {
        this.fetchData();
      }
    }

    fetchData() {
      https.get(this.dataSourceUrl, (res) => {
        let chunkedData = "";

        res.on("data", (data) => {
          chunkedData += data;
        });

        res.on("end", () => {
          this.setState({
            data: chunkedData,
          });
        });

        res.on("error", (error) => {
          this.setState({ error });
        });
      });
    }

    render() {
      return createElement(
        SubComponent,
        Object.assign({}, subComponentProps, this.state)
      );
    }
  }

  DataContainer.propTypes = {
    name: PropTypes.string,
    dataSourceUrl: PropTypes.string,
  };

  return DataContainer;
}

Then you need to create a NotificationsList component that receives the data as a prop:

import React from "react";
import PropTypes from "prop-types";

class NotificationsList extends React.Component {
  constructor(props) {
    super(props);
  }

  render() {
    const listItems = this.props.data.items || [];

    return (
      <ul>
        {listItems.map((item, index) => {
          return <NotificationListItem item={item} index={index} />;
        })}
      </ul>
    );
  }
}

NotificationsList.propTypes = {
  data: PropTypes.object,
  error: PropTypes.object,
};

export default NotificationsList;

And, finally, you need to create and render the data container:

import React from "react";
import NotificationsList from "./NotificationsList";
import createContainer from "./createContainer";

export default class HomePage extends React.Component {
  render() {
    const NotificationsContainer = createContainer(NotificationsList, {
      propName: "propValue",
    });

    return (
      <NotificationsContainer
        dataSourceUrl="/api/notifications/list"
        name="notifications"
      />
    );
  }
}

If you are looking for something a bit more advanced, similar to what I was using at the BBC, then check out this nice little project called  Second. Or, if you are building a more complex app and need to manage state or map components to multiple containers, then you should consider using Redux. Here’s a great presentation about React/Redux.

For those using React 16.3, keep an eye on the following projects: react-waterfall and unistore. They are data stores built on top of the new Context API.

]]>
https://blog.fedecarg.com/2018/03/24/react-data-container-component-pattern/feed/ 0 kewnode wireframe
How to pass variables to a Docker container when building a Node app https://blog.fedecarg.com/2016/11/05/how-to-pass-variables-to-a-docker-container-when-building-a-node-app/ https://blog.fedecarg.com/2016/11/05/how-to-pass-variables-to-a-docker-container-when-building-a-node-app/#respond <![CDATA[Federico]]> Sat, 05 Nov 2016 21:42:00 +0000 <![CDATA[Javascript]]> <![CDATA[Linux]]> <![CDATA[Node.js]]> <![CDATA[Open-source]]> <![CDATA[Programming]]> http://blog.fedecarg.com/?p=2555 <![CDATA[Environment variables are declared with the ENV statement and are notated in the Dockerfile either with $VARIABLE_NAME or ${VARIABLE_NAME}. Passing variables at build-time The ENV instruction sets the environment variable to the value. The environment variables set using ENV will persist when a container is run from the resulting image. For example: The Dockerfile allows you to specify arguments […]]]> <![CDATA[

Environment variables are declared with the ENV statement and are notated in the Dockerfile either with $VARIABLE_NAME or ${VARIABLE_NAME}.

Passing variables at build-time

The ENV instruction sets the environment variable to the value. The environment variables set using ENV will persist when a container is run from the resulting image. For example:

FROM node:9

ENV PORT 3000
ENV NODE_ENV development

The Dockerfile allows you to specify arguments at build-time. The ARG instruction defines a variable that users can pass at to the builder:

FROM node:9

ARG PORT
ARG NODE_ENV

When building a Docker image from the command line, you can set those values using –build-arg:

$ docker build --tag webapp --build-arg PORT=3000 --build-arg NODE_ENV=development .

Executing commands using the shell

And, here is the secret ingredient. If the $NODE_ENV variable is set, then you can use the shell to run an NPM script:

FROM node:9 

ARG PORT 
ARG NODE_ENV 

ENV PORT $PORT 
ENV NODE_ENV $NODE_ENV

RUN mkdir -p /usr/app
WORKDIR /usr/app
RUN cd /usr/app
ADD . .

RUN npm install
RUN /bin/bash -c '[[ "${NODE_ENV}" == "production" ]] && npm run build:prod || npm run build:dev'

EXPOSE $PORT

CMD ["npm", "run", "start"]

Finally, you expose the port number and start the HTTP server.

Thanks for reading and happy Dockering!

]]>
https://blog.fedecarg.com/2016/11/05/how-to-pass-variables-to-a-docker-container-when-building-a-node-app/feed/ 0 kewnode
Website performance monitoring tool https://blog.fedecarg.com/2016/08/01/website-performance-monitoring-open-source-tool/ https://blog.fedecarg.com/2016/08/01/website-performance-monitoring-open-source-tool/#comments <![CDATA[Federico]]> Mon, 01 Aug 2016 16:54:00 +0000 <![CDATA[Javascript]]> <![CDATA[Node.js]]> <![CDATA[Tools]]> <![CDATA[Web Services]]> http://blog.fedecarg.com/?p=2353 <![CDATA[Monitoring systems allow you to monitor changes to your front-end code base over time, catching any regression issues and monitoring the ongoing effects of any performance optimisation changes. Easy to use dashboards are a must when it comes to monitoring the state of your web apps. Companies like Calibre or SpeedCurve offer this as a […]]]> <![CDATA[

Monitoring systems allow you to monitor changes to your front-end code base over time, catching any regression issues and monitoring the ongoing effects of any performance optimisation changes. Easy to use dashboards are a must when it comes to monitoring the state of your web apps. Companies like Calibre or SpeedCurve offer this as a professional service, but not everyone can afford them.

Meet SpeedTracker

SpeedTracker is an open source (MIT license) self-hosted solution to monitor your app’s uptime and APIs, developed by Eduardo Bouças. It runs on top of WebPageTest and makes periodic performance tests on your website and shows a visualisation of how the various performance metrics evolve over time.

SpeedTracker provides clean charts and graphs that can help you identify possible problem areas.

speedtracker01

Check out the demo here: https://bbc.github.io/iplayer-web-speedtracker/

WebPageTest is an incredibly useful resource for any web developer, but the information it provides becomes much more powerful when monitored regularly, rather than at isolated events. Web application monitoring is not just for detecting downtime, it also gives you additional insight into performance trends during peak load times, as well as by time of day, and day of the week.

speedtracker04

For me, the best thing about SpeedTracker is that it runs on your GitHub repository! Data from WebPageTest is pushed to a GitHub repository. It can be served from GitHub Pages, from a private or public repository, with HTTPS baked in for free.

SpeedTracker also allows you to define performance budgets for any metric you want to monitor and receive alerts when a budget is overrun. This can be an e-mail or a message on Slack.

For instructions on how to install this tool, visit the following GitHub repo: https://github.com/speedtracker/speedtracker

 

]]>
https://blog.fedecarg.com/2016/08/01/website-performance-monitoring-open-source-tool/feed/ 2 kewnode speedtracker01 speedtracker04
Node.js: How to mock the imports of an ES6 module https://blog.fedecarg.com/2016/07/18/node-js-how-to-mock-the-imports-of-an-es6-module/ https://blog.fedecarg.com/2016/07/18/node-js-how-to-mock-the-imports-of-an-es6-module/#respond <![CDATA[Federico]]> Mon, 18 Jul 2016 21:58:00 +0000 <![CDATA[Javascript]]> <![CDATA[Node.js]]> <![CDATA[Programming]]> http://blog.fedecarg.com/?p=2288 <![CDATA[The package mock-require is useful if you want to mock require statements in Node.js. It has a simple API that allows you to mock anything, from a single exported function to a standard library. Here’s an example: app/config.js app/services/content.js test/services/content_spec.js]]> <![CDATA[

The package mock-require is useful if you want to mock require statements in Node.js. It has a simple API that allows you to mock anything, from a single exported function to a standard library. Here’s an example:

app/config.js

function init() {
  // ...
}

module.exports = init;

app/services/content.js

import config from '../../config.js';

function load() {
  // ...
}

module.exports = load;

test/services/content_spec.js

import { assert } from "chai";
import sinon from "sinon";
import mockRequire from "mock-require";

describe("My module", () => {
  let module; // module under test
  let configMock;

  beforeEach(() => {
    configMock = {
      init: sinon.stub().returns("foo"),
    };

    // mock es6 import (tip: use the same import path)
    mockRequire("../../config.js", configMock);

    // require es6 module
    module = require("../../../app/services/content.js");
  });

  afterEach(() => {
    // remove all registered mocks
    mockRequire.stopAll();
  });

  describe("Initialisation", () => {
    it("should have an load function", () => {
      assert.isFunction(module.load);
    });
  });
});
]]>
https://blog.fedecarg.com/2016/07/18/node-js-how-to-mock-the-imports-of-an-es6-module/feed/ 0 kewnode
Recommender Systems: Content-based, Social recommendations and Collaborative filtering https://blog.fedecarg.com/2016/06/26/recommender-systems-content-based-social-recommendations-and-collaborative-filtering/ https://blog.fedecarg.com/2016/06/26/recommender-systems-content-based-social-recommendations-and-collaborative-filtering/#respond <![CDATA[Federico]]> Sun, 26 Jun 2016 13:50:00 +0000 <![CDATA[Software Architecture]]> <![CDATA[Web Development]]> http://blog.fedecarg.com/?p=2610 <![CDATA[With the proliferation of video on-demand streaming services, viewers face a big challenge: finding content across multiple screens and apps. There may be quality information available online but it may be difficult to find. Traditionally, viewers resort to “app switching” which can be frustrating when it comes to finding quality content. With the emergence of […]]]> <![CDATA[

With the proliferation of video on-demand streaming services, viewers face a big challenge: finding content across multiple screens and apps. There may be quality information available online but it may be difficult to find. Traditionally, viewers resort to “app switching” which can be frustrating when it comes to finding quality content.

With the emergence of new technologies like AI, metadata, and machine learning, traditional content discovery approaches can’t cut the mustard anymore for content publishers. The solution is to integrate their catalogues and programming guides to a Content Discovery Platform. But, what is a Discovery Platform, and how can it make it easier for users to find what they want? Discovery Platforms with metadata aggregation, AI/ML enrichments, search and recommendations are the new disruptors in Content Marketing. Today’s post will only focus on one of the pillars of Content Discovery: the recommendations engine.

The goal of a recommendations engine is to predict the degree to which a user will like or dislike a set of items such as movies or videos. With this technology, viewers are automatically advised of content that they might like without the need to search for specific items or browse through an online guide. Recommender systems allow viewers to watch shows at times convenient for the viewer, convenient digital access to those shows and to find shows using numerous indices. Indices include genre, actor, director, keyword and the probability that the viewer will like the show as predicted by a collaborative filtering system. This results in greater satisfaction for the viewer with increased loyalty and higher revenues for the business. 

1. Methods

Most recommender systems use a combination of different approaches, but broadly speaking there are three different methods that can be used:

  • Content-based analysis and extraction of common patterns
  • Social recommendations based on personal choices from other people
  • Collaborative filtering based on users’ behaviour, preferences and ratings

Each of these approaches can provide a level of recommendations so that most recommendation platforms take a hybrid approach, using information from each of these different sources to define what shows are recommended to the users.

1.1. Content-based

Content-based recommenders use features such as the genre, cast and age of the show as attributes for a learning system. However, such features are only weakly predictive of whether viewers will like the show. There are only a few hundred genres and they lack the specificity required for accurate prediction.

In the TV world, the only content-analysis technologies available to date rely on the metadata associated with the programmes. The recommendations are only as good as the metadata, and are typically recommendations within a certain genre or with a certain star.

1.2. Social recommendations

Social-networking technologies allow for a new level of sophistication whereby users can easily receive recommendations based on the shows that other people within their social network have ranked highly, providing a more personal level of recommendations than are achieved using a newspaper or web site.

A number of social networks dedicated to providing music recommendations have emerged over the last few years, the most well known of this being imdb.com which encourages users to like and review films and then applies a collaborative filtering algorithm to identify similar users and then ask them for recommendations.

The advantage of social recommendations is that because they have a high degree of personal relevance they are typically well received, with the disadvantage being that the suggested shows tend to cluster around a few well known or cult-interest programmes.

1.3. Collaborative filtering

Collaborative filter methods are based on collecting and analysing a large amount of information on users’ behaviour, activity or preferences and predicting what users will like based on their similarity to other users.

There are two types of filtering:

  • Passive filtering: Provides recommendations based on activity without explicitly asking the users’ permission (e.g. Google, Facebook). Passive filtering is less problematic when collecting the data, but requires substantial processing in order to make the data attributable to a single user. Any keywords that users have searched for within a site provides an excellent basis for passive filtering. The major disadvantage of passive filtering is that users cannot easily specify which information they want to have used for recommendations and which they don’t, so any information used for passive filtering must be carefully governed by a set of business rules to reduce the potential for inappropriate recommendations.
  • Active filtering: Uses the information provided by the user as the basis for recommendations (e.g. Netflix). The main issue with active collaborative filtering for TV shows is that viewers will only rate a show after watching it. And there has been limited success in getting users to build a sufficiently large database of information to provide solid recommendations.

Collaborative filtering systems can be categorised along the following major dimensions:

  • User-user or item-item systems: In user-user systems, correlations (or similarities or distances) are computed between users. In item-item systems metrics are computed between items (e.g. shows or movies).
  • Form of the learned model: Most collaborative filtering systems to date have used k-nearest neighbour models in user-user space. However there has been work using other model forms such as Bayesian networks, decision trees, cluster models and factor analysis.
  • Similarity or distance function: Memory-based systems and some others need to define a distance metric between pairs of items or users. The most popular and one of the most effective measures used to date has been the simple and obvious Pearson product moment correlation coefficient (PMCC). Other distance metrics used have included the cosine measure and extensions to the PMCC which correct for the possibility that one user may rate programs more or less harshly than another user. Another extension gives higher weight to users that rate infrequently. 
  • Combination function: Having defined a similarity metric between pairs of users or items, the system needs to make recommendations for the active user for an unrated item. Memory-based systems typically use the k-nearest neighbour formula.
  • Evaluation criteria: The accuracy of the collaborative filtering algorithm may be measured either by using mean absolute error (MAE) or a ranking metric. Mean absolute error is just an average, over the test set, of the absolute difference between the true rating of an item and its rating as predicted by the collaborative filtering system. Whereas MAE evaluates each prediction separately and then forms an average, the ranking metric approach directly evaluates the goodness of the entire ordered list of recommendations. This allows the ranking metric approach to, for instance, penalise a mistake at rank 1 more severely than a mistake further down the list.

The  tasks for which collaborative filtering is useful are:

  • Help me find new items I might like. In a world of information overload, I cannot evaluate all things. Present a few for me to choose from. This has been applied most commonly to consumer items (music, books, movies).
  • Advise me on a particular item. I have a particular item in mind; does the community know whether it is good or bad?
  • Help me find a user I might like. Sometimes, knowing who to focus on is as important as knowing what to focus on. This might help with forming discussion groups, matchmaking, or connecting users so that they can exchange recommendations socially.
  • Help our group find something new that we might like. CF can help groups of people find items that maximise value to a group as a whole. For example, a couple that wishes to see a movie together or a research group that wishes to read an appropriate paper.
  • Help me find a mixture of “new” and “old” items. I might wish a “balanced diet” of restaurants, including ones I have eaten in previously; or, I might wish to go to a restaurant with a group of people, even if some have already been there; or, I might wish to purchase some groceries that are appropriate for my shopping cart, even if I have already bought them before.
  • Help me with tasks that are specific to this domain. For example, a recommender for a movie and a restaurant might be designed to distinguish between recommendations for a first date versus a guys’ night out. To date, much research has focused on more abstract tasks (like “find new items”) while not probing deeply into the underlying user goals (like “find a movie for a first date”).

1.3.1. Time-based Collaborative Filtering with Implicit Feedback 

Most collaborative filtering-based recommender systems use explicit feedback (ratings) that are collected directly from users.  When users rate truthfully, using rating information is one of the best ways to quantify user preferences. However, many users assign arbitrary ratings that do not reflect their honest opinions. In some e-commerce environments, it is difficult to ask users to give ratings. For instance, in a mobile e-commerce environment the service fee is dependent on the connection time. 

2. Accuracy

In the recommender systems community it is increasingly recognised that accuracy metrics such as mean average error (MAE), precision and recall, can only partially evaluate a recommender system. User satisfaction, and derivatives thereof such as serendipity, diversity and trust are increasingly seen as important. A system can make better recommendations using the following approaches:

  • Transparency. Explain how the system works. An explanation may clarify how a recommendation was chosen and isolate and correct misguided assumptions.
  • Scrutability. Allow users to tell the system it is wrong. Following transparency, a second step is to allow a user to correct reasoning, or make the system scrutable. 
  • Trust. Increase users’ confidence in the system. Trust in the recommender system could also be dependent on the accuracy of the recommendation algorithm. A study of users’ trust suggests that users intend to return to recommender systems which they find trustworthy.
  • Persuasiveness. Convince users to try or buy. It has been shown that users can be manipulated to give a rating closer to the system’s prediction, whether this prediction is accurate or not.
  • Effectiveness. Help users make good decisions. Rather than simply persuading users to try or buy an item, an explanation may also assist users to make better decisions. Effectiveness is by definition highly dependent on the accuracy of the recommendation algorithm.
  • Satisfaction. Make the use of the system fun. Explanations may increase user satisfaction with the system, although poor explanations are likely to decrease a user’s interest, or acceptance of a system. The presence of longer descriptions of individual items has been found to be positively correlated with both the perceived usefulness and ease of use of the recommender system.

3. Relevance

Google’s PageRank mechanism is possible in the web because pages are linked to each other, but for video on-demand and  streaming platforms we need to find another approach to relevance that will allow us to prioritise the most appropriate programming ahead of less relevant items. There are a number of potential elements that can be included and the best algorithms take into account each of these factors:

  • Platform: the platform that the content is on must be weighed against the scheduling. 
  • Programme Information: the metadata provided with the programme typically includes information on the programme, cast details, and categorisation. Prioritisation can be made on the quality of the metadata.
  • Scheduling: when the content is going to be made available on a given platform. The viewer is typically looking for content that is more readily available than not, and the initial results in the list should reflect this.
  • Popularity: when searching for sports, topics, or actors, the algorithm must prioritise more popular content ahead of others. For example a search for Tennis during Wimbledon should bring up the best coverage for this tournament rather than a documentary on the origins of the sport, even though the documentary might be broadcast on a more popular platform.
  • Viewer behaviour: by building a relevance map of user viewing, it is possible to augment the metadata of a show with other metadata that is common amongst its nearest neighbours on the relevance map. In this way, content that has strong proximity to other content with a similar topic can be weighted as more relevant to this topic than content that’s standalone in the relevance map.

4. Challenges

The difficulty in implementing recommendations is that different users have different tastes and opinions about which content  they prefer.

  • Quality: a substantial portion of the videos that are recommended to the user should be videos that they would like to watch, or at least might find interesting.
  • Transparency: it should be clear to the user why they have been recommended certain videos so that if they have been recommended a video they don’t like they can at least understand why.
  • User feedback: people are fanatical about their watching experience and if they are being recommended a video that they don’t like they should have an immediate way to say that they don’t like it and subsequently never have it recommended again.
  • Accuracy: use metrics to evaluate recommender systems, identify the strengths and the weaknesses of the metrics.

5. Research papers

  • A Survey of Explanations in Recommender Systems – Nava Tintarev, Judith Masthoff
  • A time-based approach to effective recommender systems using implicit feedback – T. Q. Lee, Y. Park
  • Evaluating collaborative filtering recommender systems – Jonathan L. Herlocker
  • Toward the Next Generation of Recommender Systems – Gediminas Adomavicius and Alexander Tuzhilin
]]>
https://blog.fedecarg.com/2016/06/26/recommender-systems-content-based-social-recommendations-and-collaborative-filtering/feed/ 0 kewnode
Geo Proximity Search: The Haversine Equation https://blog.fedecarg.com/2014/12/08/geo-proximity-search-the-haversine-equation/ https://blog.fedecarg.com/2014/12/08/geo-proximity-search-the-haversine-equation/#comments <![CDATA[Federico]]> Mon, 08 Dec 2014 11:48:00 +0000 <![CDATA[Databases]]> <![CDATA[Open-source]]> <![CDATA[Programming]]> <![CDATA[Python]]> <![CDATA[Web Development]]> http://kewnode.wordpress.com/?p=1217 <![CDATA[I’m working on a project that requires Geo proximity search. Basically, what I’m doing is plotting a radius around a point on a map, which is defined by the distance between two points on the map given their latitudes and longitudes. To achieve this I’m using the Haversine formula (spherical trigonometry). This equation is important […]]]> <![CDATA[

I’m working on a project that requires Geo proximity search. Basically, what I’m doing is plotting a radius around a point on a map, which is defined by the distance between two points on the map given their latitudes and longitudes. To achieve this I’m using the Haversine formula (spherical trigonometry). This equation is important in navigation, it gives great-circle distances between two points on a sphere from their longitudes and latitudes. You can see it in action here: Radius From UK Postcode.

This has already been covered in some blogs, however, I found most of the information to be inaccurate and, in some cases, incorrect. The Haversine equation is very straight forward, so there’s no need to complicate things.

I’ve implemented the solution in SQL, Python and PHP. Use the one that suits you best.

SQL implementation

set @latitude=53.754842;
set @longitude=-2.708077;
set @radius=20;

set @lng_min = @longitude - @radius/abs(cos(radians(@latitude))*69);
set @lng_max = @longitude + @radius/abs(cos(radians(@latitude))*69);
set @lat_min = @latitude - (@radius/69);
set @lat_max = @latitude + (@radius/69);

SELECT * FROM postcode
WHERE (longitude BETWEEN @lng_min AND @lng_max)
AND (latitude BETWEEN @lat_min and @lat_max);

Python implementation

from __future__ import division
import math

longitude = float(-2.708077)
latitude = float(53.754842)
radius = 20

lng_min = longitude - radius / abs(math.cos(math.radians(latitude)) * 69)
lng_max = longitude + radius / abs(math.cos(math.radians(latitude)) * 69)
lat_min = latitude - (radius / 69)
lat_max = latitude + (radius / 69)

print 'lng (min/max): %f %f' % (lng_min, lng_max)
print 'lat (min/max): %f %f' % (lat_min, lat_max)

PHP implementation

$longitude = (float) -2.708077;
$latitude = (float) 53.754842;
$radius = 20; // in miles

$lng_min = $longitude - $radius / abs(cos(deg2rad($latitude)) * 69);
$lng_max = $longitude + $radius / abs(cos(deg2rad($latitude)) * 69);
$lat_min = $latitude - ($radius / 69);
$lat_max = $latitude + ($radius / 69);

echo 'lng (min/max): ' . $lng_min . '/' . $lng_max . PHP_EOL;
echo 'lat (min/max): ' . $lat_min . '/' . $lat_max;

It outputs the same result:

lng (min/max): -3.1983251898421/-2.2178288101579
lat (min/max): 53.464986927536/54.044697072464

Happy Geolocating!

]]>
https://blog.fedecarg.com/2014/12/08/geo-proximity-search-the-haversine-equation/feed/ 27 kewnode
Installing multiple versions of Ruby using RVM https://blog.fedecarg.com/2014/08/26/installing-multiple-versions-of-ruby-using-rvm/ https://blog.fedecarg.com/2014/08/26/installing-multiple-versions-of-ruby-using-rvm/#comments <![CDATA[Federico]]> Tue, 26 Aug 2014 23:26:22 +0000 <![CDATA[Programming]]> http://blog.fedecarg.com/?p=2233 <![CDATA[Ruby Version Manager (RVM) is a tool that allows you to install multiple versions of Ruby and have multiple versions of the same interpreter. Very handy for those who have to maintain different applications using different versions of Ruby. To start, download RVM and install the latest stable version of Ruby: $ echo insecure >> […]]]> <![CDATA[

Ruby Version Manager (RVM) is a tool that allows you to install multiple versions of Ruby and have multiple versions of the same interpreter. Very handy for those who have to maintain different applications using different versions of Ruby.

To start, download RVM and install the latest stable version of Ruby:

$ echo insecure >> ~/.curlrc
$ curl -L https://get.rvm.io | bash -s stable --ruby
$ source ~/.bash_profile

Install an old version of Ruby:

$ rvm install 1.8.6
$ rvm use 1.8.6 --default
$ ruby -v
ruby 1.8.6

Create a Gem set and install an old version of Rails:

$ rvm gemset create rails123
$ gem install rails -v 1.2.3
$ rails -v
Rails 1.2.3

Switch back to your system:

$ rvm system
$ rails -v
Rails 2.3.5

Switch back to your RVM environment:

$ rvm 1.8.6@rails123

And, if you want to remove Rails 1.2.3, just delete the Gem set:

$ rvm gemset delete rails123

Alternatively to RVM, you also might look into rbenv.

]]>
https://blog.fedecarg.com/2014/08/26/installing-multiple-versions-of-ruby-using-rvm/feed/ 1 kewnode