Efficient payment transaction matching

M.S. Thesis Project
in cooperation with Ernst & Young, Copenhagen

Problem domain:

Payment transaction matching (henceforth simply “transaction matching”)
is the problem of matching two sets of payment transactions, each
containing an amount, a date and other attributes; e.g. one
representing bank account transactions, the others postings to a
journal or day book. Matching (Danish “afstemning”) means, informally: (a) identfying
corresponding (groups of) transactions based on (payment) amounts,
dates, identifying strings and/or other criteria; and (b) computing the difference, if any, of the
aggregate amounts of the matched groups.

Background:

Ernst & Young have developed an Excel/VBA based tool called RECON for performing
rule-based transaction matching. The rules enable user-specified matching.
RECON works as follows: For each rule, for each transaction group in the one
set satisfying the rule and for each transaction group in the other set
satisfying the rule, check whether the two groups match according to
the rule and, if so, output aggregate amounts of the two matching
groups as well as their difference. Because RECON employs nested loops, it results in prohibitive
Theta(r m n) run-time complexity, where r is the number of rules, m
the number of transactions in one set and n the number of transactions
in the other set.

Objectives:

The overall goal of the project is to develop a transaction matching tool
with drastically improved time efficiency.

Specific objectives are:

  1. Analysis and formal definition of relevant variations of matching.
    What exactly does “matching” mean?

  2. An efficient and scalable algorithm for and implementation of
    automatic, scalable (quasi-linear time for fixed r) transaction matching based on
    specifications of matching rules; in particular it should have
    interactive performance on tens or hundreds of thousands of
    transactions in each set and respond in no more than a few seconds for
    millions of transactions in each of the two transaction sets.
    In particular, it should employ and evaluate applicability of discriminatory joins with
    symbolic products [1].

    [2a. Optional extension: Consider both Boolean (does or does not match)
    and probabilistic matching (provide "goodness of match" measure for ranking
    different possible matches).
    For this part, no particular GUI or implementation platform is stipulated.]

  3. A tool that: incorporates the algorithm in 2; supports manual matching
    and unmatching (revoking/removing matchings “by hand”); supports convenient user
    interaction for import and export of data, for reporting of matching results,
    and for specification of matching rules. Use of .NET (see 4 below) is welcome but
    not required.

  4. Analysis of programming platforms for developing the tool for use in
    and at Ernst & Young; that is, taking account of Ernst & Young’s
    technical setup and intentions for the tool. In particular, consider
    the .NET platform as development platform, with VisualStudio’s
    support for development in high-level languages such as C# and F#, as
    an alternative to developing with Excel and VBA.

Advisors:

Main advisor: Fritz Henglein, DIKU
External advisor: Nishandan Ganesalingam, Ernst & Young

The main advisor is responsible for the project overall and is primarily responsible
for advising on objectives 1 and 2 above. The external advisor is primarily
responsible for advising on objectives 3 and 4.

Period:

Start: Between January 1st and 30th, 2012
End: ca. June 30th, 2012

Size:

1-2 persons, 30 ECTS points each.

Prerequisites:

  • Eligibility for starting an M.S. thesis.

  • Programming Languages and Systems profile recommended, but not required.

  • Signing of nondisclosure agreement, which provides access to
    source code of and information on RECON.

Language:

Advising: English or Danish. M.S. thesis: English.

More information:

Contact Fritz Henglein, DIKU room 3-2-17, tel. 30589576, email henglein@diku.dk.
(Please note: unavailable Dec. 23, 2011 to January 8th, 2012.)

[1] Fritz Henglein, Ken Friis Larsen, “Generic Multiset Programming with Discrimination-based Joins and Symbolic Cartesian Products”,
Higher-Order Symbolic Computation, 2011, DOI 10.1007/s10990-011-9078-8; preprint available from http://www.diku.dk/~henglein > “Papers”.

Context-guided form filling

Context

POETS is a new architecture for ERP systems developed in the 3gERP project (3gerp.org) which is radically different from current state-of-the-art ERP systems. Two central parts of POETS are events and contracts. Contracts model “real” legally binding contracts. A contract stipulates what obligations and permissions the parties who have entered into the contract have. To state that a obligation has been fulfilled or a permission exercised, events are registered and checked against the relevant contract to see whether the event happened according to what is written in the contract. If the event was as expected by the contract, the system can continue while if the event was not as expected, the contract is said to have been breached.

Problem

In order to register an event with the POETS system, one basically need to fill out a form. A form contains a number of fields depending on the kind of event to be registered. For example a ‘Payment’ event has amongst others a field called ‘amount’ while a ‘Delivery’ event has a field called ‘goods’.

It is of crucial importance that the fields are filled out according to what is expected by the contract. In fact, even a small typo in a ‘name of customer’ field would cause the contract to become breached.

Cases

In many cases, however, the values to be filled out in an event-form are uniquely determined by the contract with which the event will be registered. I.e. the amount to be payed (as registered by a Payment event) has already been agreed upon, so the system could pre-fill the corresponding field in the payment-form.

In other cases, the values to be filled in are not uniquely determined but simply constrained by some ‘predicate’ in the contract. I.e. a sales contract may allow the buyer to pay the total amount over multiple events and thus, during any payment-event she should not fill in the ‘amount’ fill to more than what is remaining to be payed and never less than 0. In such cases, the system could restrain the possible inputs the user can make.

Finally, there are cases where the value of a field is neither uniquely determined nor constrained, but rather it is the case that a set of values are likely to be the ones that the user desires. Consider the example where the user is asked to fill out her contact information; the user might be asked to fill out fields such as ‘name’, ‘address’, ‘phone-number’. It is somewhat likely that the user has filled out those values before and therefore the system could do well to simply remember those values and allow the user to opt for allowing the system ‘auto-refill’ the fields.

Aim

In all of the three cases mentioned, the context of the form can be used to extract information about what information the system is expecting or allowing to receive via the form. Ultimately, the user should never be required to fill out fields, but merely act as a verification that the fields of the event-form correspond to what has actually occurred in the real world.

Student contribution

As a minimum, at the end of this project students should have done the following:

  • Performed a study of what methods other people have applied in the context of helping users fill out forms
  • Implemented a constraint solving program that can be used to pre-fill values for fields that are uniquely determined; the program should not be tied to a particular GUI framework.

Misc

The project can be scoped as a bachelor project and will be supervised by Jesper Andersen (jespera@diku.dk).

Automating ERP system upgrades

Brief description of project

Microsoft Dynamics NAV (NAV) is an ERP system used by many companies worldwide. As NAV is a general purpose ERP system, most companies have had their version of NAV customized to fit their business.

While the customization of an NAV system is beneficial for the company’s use of said system, it is less beneficial when it comes to upgrading. Once a new version of NAV is released by Microsoft, companies are left with the choice of either

  1. re-customizing the new version of NAV, or
  2. continue using the old version of NAV.

The former choice can be expensive because there is little tool-support for automatic re-customizing new versions of NAV and therefore can require much development effort. The latter choice can be troublesome because the upgraded NAV system may come with important corrections or features and the old version will either need to have those corrections applied or features implemented anyway.

The ultimate aim of this project is to completely automate step 1. above, so that when a new version of NAV is released, a tool can automatically upgrade a customized version of NAV.

Example

Suppose N1 is the core version of an ERP system. Now company C bought a customized version M1 of N1. After a while the N1 developers releases a new version N2 which fixes important issues (security, change of law, accounting, etc). Company C now really wants to upgrade to N2, but they have the following problems:

  • A simple ‘diff’ between N1 and M1 does provide a patch P1 which can be applied to N1 to construct M1;
    • However, in this situation P1 could not be applied to N2
    • Development effort is therefore required to either
      • manually, rewrite the patch P1 so that it applies to N1 and hope that this constitutes a correctly customized version of N2 or
      • re-customize N2 all over again.
  • Company C has no explicit description of their customization of N1 into M1, so re-customizing N2 into M2 basically requires redoing much of the work done to customize N1 into M1.

An approach

The Coccinelle project [http://coccinelle.lip6.fr/] introduced the concept of semantic patches which can be seen as a generalization of the standard patches as produced by the diff tool. In its most simple form, a semantic patch is a generalization of a standard patch in two ways:

  • Concrete syntax can be abstracted away in the semantic patch
  • Changes can be specified to only apply on certain control-flow paths

In the context of this project, we would like to explore the use of semantic patches to describe the changes between a base ERP system and customized versions of the base system. We think this is beneficial because describing changes with semantic patches allows one to provide an ‘essential’ view of the changes. I.e. the semantic patches need not be concerned with line numbers and concrete names of variables and other pieces of concrete syntax.

In terms of the previous example:

Suppose SP1 is a semantic patch that describes the changes between base ERP system N1 and the customized M1.

It is now (this is our thesis) more likely that SP1 can be directly applied to N2 to obtain a customized M2 directly because SP1 is less dependent on concrete elements of N1.

In any case SP1 should be a much more concise description of the customization of N1 so that adapting SP1 to apply correctly to N2 should be much more easy than re-customizing N2 all over again.

Student contribution

Students taking on this project can work on several sub-projects.

First of all semantic patches can currently only target C source code but NAV is written using (among others) C/AL [http://en.wikipedia.org/wiki/C/AL]. One project is therefore to port a version of the semantic patch tool to C/AL.

About the project

This project is offered in cooperation with Microsoft Development Center Denmark and will be supervised by Jesper Andersen (jespera@diku.dk) from DIKU and Jesper Kiehn and Lars Hammer both from MDCC (Microsoft Development Center Copenhagen, located in Vedbæk). Students will need to sign an NDA.

The precise details and scope of the project will be arranged on the first meeting, but can be anything from a standard 7.5 ECTS project to a master-thesis.

Prerequisites

Students must have keen interest in program analysis and transformation. Students should have followed the course “Logic in Computer Sciene/Logik i Datalogi” or something equivalent before taking on this project. Also, since Coccinelle is written in Ocaml, knowledge of functional programming is a requirement. Knowledge of Microsoft Dynamics NAV is not required, but would be beneficial.

Specialized Views for ontology entities

Introduction

The goal of the Android-based POETS client (or client henceforth) is to be agile with respect to changes to the parts constituting the POETS architecture. Changes or additions to contracts, report-functions, rules, or the underlying ontology should not imply major work on updating the client.

The most basic part of the POETS architecture is the ontology defining the entities which one can talk about in contracts, report-functions, rules and so on. Therefore, a tool has been developed to translate ontology entities into Java classes. Besides containing the fields defined in the ontology, the generated classes also contain methods for conversion to and from XML conforming to a certain xsd.

The client makes use of the Java representation of the ontology entities for visualization and user input. For input, there is a Java class that given a Java class (AndroidGUI) representing an ontology entity, produces an Android View that allows the user to enter the fields needed of the ontology entity. For visualization, there is a Java class (AndroidVisual) that given an object of type “Java class representing an ontology entity” produces an Android View that visualizes to the user the fields defined in the ontology based on the type of the object.

The classes AndroidGUI and AndroidVisual are generic because they treat all ontology entities equally. However, most ontology entities has some meaning to users that could be reflected in the visualization and input of such entities. I.e. the visualization of a “Customer” entity could additionally include a pictogram of a person to convey to the user that a Customer entity is actually a person. The visualization could even try to fetch additional data such as pictures of the customer in question not already present according to the entity definition. Likewise for input of a Customer. In its current form a Customer entity only contains a field “name” which is a string. The AndroidGUI could use the information that fields named “name” in a “Customer” (or any subtype hereof) should be turned into an input-control suitable for entering names.

What we seek is therefore visualization of and input controls for the Java representation of ontology entities specialized to the particular ontology in question.

Projects

The following is a list of projects that approach the task described above.

  • Select a number of (high-level) entities from the ontology and construct really clever ways of visualizing and inputting those concepts. However, the caveat is that that changes to the definition of the selected entities should not break your clever code. Of particular interest is the visualization of “Contract” entities because they have associated temporal behavior as defined in by their corresponding contracts in the the contract language. E.g. one could choose to visualize a “DeliveryContract” on a map with all the delivery destinations pin-pointed on the map.
  • Integration of POETS contacts. There is an ontology entity called “Agent” which has amongst others “Customer” as subtype. The Android platform also as a concept of contacts and it would be beneficial to connect the two concepts such that when (e.g.) a “Customer” entitie needs to be entered by the user, she can simply pull all relevant information from the contact available on in the contacts stored on the phone.

Integrating Emails into POETS

A valuable source of customer related information are email conversations. It is therefore natural to integrate email sources into POETS. More specifically, the task is to feed incoming and outgoing emails from a set of email accounts into POETS. In order to make email messages available in the reporting system, the email messages are supposed to be fed into the system by generating according events in the event log.

Due to the architecture of POETS this integration can be performed without extending or changing the system itself. The email system simply acts as a client that feeds events to the system. This project involves the following subtasks:

  • implementation of collecting incoming an outgoing email messages from a given set of email accounts
  • implementation of generating proper events to feed the email messages into POETS
  • write proper definitions in the ontology language that capture email messages
  • write reports that enable to connect customers to their email conversations

Simplification of Type Constraints and Syntactic Extensions for the Reporting Language

Due to the event-driven nature of POETS’ architecture, most strikingly manifested in the complete absence of a traditional relational database, reporting functions play a central role in POETS. A reporting function is a computable function that extracts relevant information (then called “report”) from the event transaction log. That is, a reporting function makes a certain piece of information about the state of the system which is implicitly encoded in the history of events explicitly available.

In order to facilitate describing these reporting functions and making this also available for domain experts, POETS provides a domain-specific language for describing reporting functions. This reporting language is a purely functional programming language which allows to describe functions extracting information from the event transaction log in a declarative manner.

In order to keep theoretical reasoning about the language concise and to simplify the implementation of optimisations, the language only provides basic operations, which are considered sufficient for the domain in question. This however, makes writing reporting functions sometimes cumbersome. The goal of this project is to overcome this deficit by implementing a set of syntactic extensions to the reporting language such as list comprehensions for convenient list processing and dynamic typing operations for dealing with POETS’ data ontology. This goal is supposed to be achieve by reducing the syntactic extensions to the core language, or alternatively by an optimised implementation which is then tested against the intended semantics. This involves the following subtasks:

  • parsing the extended syntax using a monadic parser combinator
    library
  • extending the definition of the reporting language’s abstract
    syntax using extensible expression data types (“data types a
    la carte”)
  • implementing the type inference for the added syntax using
    the provided monadic infrastructure
  • implementing the transformation of the added syntax to the
    core language by making use of the used extensible expression
    data types
  • providing an alternative direct implementation of the
    additional language features
  • providing tests for checking the direct implementation
    against the transformation-based one

Another practical issue of the reporting language is the readability of the types inferred by the type system. The type system employs a number of type constraints for capturing subtypes, overlapping record names and other aspects of the reporting language. The inference system usually infers types with a confusingly large number of type constraints. Typically, these constraints can be rewritten to a substantially smaller set of constraints, in many practical cases even the empty set. The second part of this project is concerned with the implementation of a simplification algorithm for type constraints that transforms a set of type constraints to a minimal equivalent set of type constraints.

During the project the following skills should be developed:

  • writing “real world” Haskell code in a larger software project
  • working with a monadic parser combinator library
  • implementation of a type inference algorithm
  • constraint analysis and transformation
  • writing monadic functions using monad transformers
  • using extensible expression data types (data types a la carte)

The resulting Haskell code has to be integrated with the current implementation of POETS.

KPI Reports and Widgets

(Key) Performance Indicators are measures used to give an insight into how well a business is performing. KPIs vary from business to business and often across roles within a business. While management is usually interested in aggregate information such as bottom and top lines (and their quotient) a sales person might be more interested in his own performance in relation the top contenders in his competition for getting a bonus.

In both cases ubiquitous access to real time KPI data is of immense value.

The purpose of this project is to device and implement a system that allows end users of POETS to visualize and maintain the results of certain reports (with predefined input) as widgets (enlarged interactive icons) directly on the Android Home Screen. The system should allow users to select the reports they want to monitor along with threshold values and methods of visualisation (e.g., speedometers, stoplights, graphs etc.).

Module and Documentation System for CSL

A novel idea in the POETS architecture is the use of a contract engine, in order to represent the set of expected events (i.e., business transactions) at any given point in time. Such patterns of expected events (i.e., event traces) are referred to as contracts, since they describe the obligations expected to be carried out by a set of agents. Furthermore, once such expected events take place, the contracts specify any remaining obligations.

A sales process is an example of such contract, where e.g. a Buyer has to pay for some goods after Seller has delivered them. Then in case Buyer has not yet delivered, the only expected event is that Buyer delivers the goods. On the other hand, after Buyer has delivered the goods — assuming he does so in time — the expected event is that Seller pays for the goods accordingly.

In order to specify contracts as the one above, we have introduced the domain-specific language CSL (Contract Specification Language). CSL allows the user to implement contract templates, in potentially different source files. Currently however, there is no module system for CSL, which means that all CSL templates are defined in the same namespace, making it impossible for instance to redefine an existing template in one context, or to explicitly exclude templates in another context. Furthermore, CSL has no built-in support for structured documentation of template definitions, which means that the user will have to look in the source files for any documentation on existing contract templates. The goal of this project is to overcome these limitations by implementing a module system, e.g. based on the module system of Haskell, and a documentation system, e.g. inspired by the documentation system of the Microsoft Visual studio environment (in-place documentation for templates including their parameters).

The project will involve the following tasks:

  • Familiarizing with the contract language, including writing a set of contracts.
  • Parsing the extended syntax using a monadic parser combinator library.
  • Extending the definition of CSL’s abstract syntax using extensible expression data types (“data types a
    la carte”)
  • Implementing a suitable type system for the added syntax using the existing infrastructure.
  • Implementing a transformation of the added syntax to the core language by making use of the used extensible expression data types, or direct implementation of the added syntax.

During the project the following skills should be developed:

  • Writing “real world” Haskell code in a larger software project.
  • Working with a monadic parser combinator library.
  • Implementation of a type checker.
  • Using extensible expression data types (data types a la carte).

The resulting Haskell code has to be integrated with the current implementation of POETS.

Supervision: The project will be supervised by Fritz Henglein and Tom Hvitved.

Interested? Contact Tom Hvitved.

Extensions to the Contract Language CSL

A novel idea in the POETS architecture is the use of a contract engine, in order to represent the set of expected events (i.e., business transactions) at any given point in time. Such patterns of expected events (i.e., event traces) are referred to as contracts, since they describe the obligations expected to be carried out by a set of agents. Furthermore, once such expected events take place, the contracts specify any remaining obligations.

A sales process is an example of such contract, where e.g. a Buyer has to pay for some goods after Seller has delivered them. Then in case Buyer has not yet delivered, the only expected event is that Buyer delivers the goods. On the other hand, after Buyer has delivered the goods — assuming he does so in time — the expected event is that Seller pays for the goods accordingly.

In order to specify contracts as the one above, we have introduced the domain-specific language CSL (Contract Specification Language). CSL is a very compact language, which enjoys the property of deterministic blame assignment and run-time monitoring: by construction, all contracts written in CSL have the property that any failure to comply with the contract can be uniquely attributed to a (set of) agent(s) involved in the contract, and furthermore there exists a sound and complete monitoring algorithm for doing so.

The compactness of CSL makes theoretical reasoning feasible, for instance in order to show the above property of deterministic blame assignment, and in order to show correctness of the run-time monitor for CSL. However, having only very few basic constructs at hand when writing contracts in CSL, means that it may be cumbersome to write more realistic contracts. Furthermore, the expression language of CSL is currently very restricted, allowing only ad hoc built-in functions, which is not sufficient for all contracts. The goal of this project is to overcome these limitations by implementing a set of (syntactic) extensions to the contract language, specifically extending CSL with a language for specifying user-definable predicates (i.e., a predicate language). The predicate language should be able to handle the base values of POETS — in particular lists and records — and it is required that the language is strongly normalizing, i.e., a predicate expression must always terminate with a result. The project will involve the following tasks:

  • Familiarizing with the contract language, including writing a set of contracts.
  • Parsing the extended syntax using a monadic parser combinator library.
  • Extending the definition of CSL’s abstract syntax using extensible expression data types (“data types a
    la carte”)
  • Implementing a suitable type system for the added syntax using the existing infrastructure.
  • Implementing a transformation of the added syntax to the core language by making use of the used extensible expression data types, or direct implementation of the added syntax.

During the project the following skills should be developed:

  • Writing “real world” Haskell code in a larger software project.
  • Working with a monadic parser combinator library.
  • Implementation of a type checker.
  • Using extensible expression data types (data types a la carte).

The resulting Haskell code has to be integrated with the current implementation of POETS.

Supervision: The project will be supervised by Fritz Henglein and Tom Hvitved.

Interested? Contact Tom Hvitved.

Integration of the Rules Engine with the POETS Server

These projects concern the integration of the Legal Rule Language (LRL) with the POETS server. The LRL inference engine is connected to the POETS system via the ontology and via queries.

Queries are simply POETS values extended with named variables and unknowns. Upon submitting a query to the inference engine it will try to infer values for variables and unknowns. The result is a set of substitutions for variables and possibly extra constraints that the unknowns must fulfil.

The third and final part needed by the inference engine is, of course, a set of rules. Rules are specified in LRL, which is a domain specific rule language designed to support legal reasoning. Legal reasoning is different from classical reasoning because contradictions (called collisions or conflicts) can be resolved by applying meta principles (known as collision rules) such as Lex Superior, Lex Posterior, and Lex Specialis.

The inference engine is implemented via compilation to Defeasible Prolog (DPL) with constraints, which is a nonmonotonic paraconsistent extension of Prolog.

Projects
The following projects are offered:

1. Extending LRL with syntactic sugar and domain support for the syntactical structure present in legal rules such as chapters, sections, subsections etc.

2. Development of an IDE for rule authors. This project is closely linked with the project concerning the development of an IDE for ontology authors.

3. Two-way integration of the inference engine with the POETS server and Thrift API and development of default interaction primitives on the Android client (i.e., how can and should rules be applied in form-filling on the Android?).

The only way of getting to know and being able to evaluate LRL is by using it. Therefore all of these projects contain an element of rule authoring.

Supervision
These projects are offered by and will be supervised by Morten Ib Nielsen.

Interested?
Contact Morten Ib Nielsen.

Page 1 of 2:1 2 »