2-Day Seminar

Building an Enterprise Data Lake & Data Refinery for Enterprise Data as a Service

Register On-line:
30-31 March 2017, London

PDF File IRM UK Public Courses 2017


Most organisations today are dealing with multiple silos of information. These include cloud and on-premises based transaction processing systems, multiple data warehouses, data marts, reference data management (RDM) systems, master data management (MDM) systems, content management (ECM) systems and more recently Big Data NoSQL platforms such as Hadoop and other NoSQL databases. In addition the number of data sources is increasing dramatically especially from outside the enterprise.  Given this situation it is not surprising that many companies have ended up managing information in silos with different tools being used to prepare and manage data across these systems with varying degrees of governance.  In addition, it is not only IT that is now managing data. Business users are also getting involved with new self-service data wrangling tools.  The question is, is this the only way to manage data? Is there another level that we can get reach to allow us to more easily manage and govern data across an increasingly complex data landscape? 

This 2-day course looks at the business problems caused by poorly managed information. It looks at reference data, master data, transaction data, metrics, big data and unstructured content (e.g. documents, email, etc). It looks at the requirements to be able to define, govern, manage and share trusted high quality information in a hybrid computing environment.  It also explores a new approach getting control of your data that includes participation from IT data architects, business users and IT developers. This includes creating and organising data in reservoirs and introduces data refineries in an enterprise approach to managing data. It emphasises the need for a common collaborative process and common data services to govern and manage data.

Learning Objectives

Delegates will learn:

  • How to define a strategy for enterprise information management and how to implement it within your organisation.
  • Organise data in a distributed reservoir (a better name than a data lake), the importance of an information catalog for delivering data-as-a-service, and how data standardisation and business glossaries can help define the data to make sure it is understood.
  • An operating model for effective information governance, what technologies you need and implementation methodologies to get your data under control. 
  • How to apply methodologies to get master and reference data, big data, data warehouse data and unstructured data under control irrespective of whether it be on-premises or in the cloud.

Course Outline

Strategy & Planning
This session introduces enterprise information management (EIM) and looks at the reasons why companies need it. It looks at what should be in your EIM strategy, the operating model needed to implement EIM, the types of data you have to manage and the scope of EIM implementation. It also looks at the policies and processes needed to bring your data under control.

  • The ever increasing distributed data landscape
  • The siloed approach to managing and governing data
  • IT data integration, self-service data wrangling or both? – data governance or data chaos?
  • Key requirements for EIM
    • Structured data – master, reference and transaction data
    • Semi-structured data – JSON, BSON, XML
    • Unstructured data - text, video
    • Re-usable services to manage data
  • Dealing with new data sources - cloud data, sensor data, social media data, smart products (the internet of things)
  • Understanding scope
    • OLTP systems
    • Data Warehouses
    • Big Data systems
    • MDM and RDM systems
    • Data virtualisation
    • Messaging and ESBs
    • Enterprise Content M’gmt
  • Building a business case for EIM
  • Defining a strategy for EIM
  • A new inclusive approach to governing and managing data
  • Introducing the data reservoir and data refinery
  • The rising importance of an Information catalog
  • Key roles and responsibilities - getting the operating model right
  • Types of EIM policy
  • Formalising governance processes, e.g. the dispute resolution process
  • EIM in your enterprise architecture

Methodology & Technologies
Having understood strategy, this session looks at methodology and the technologies needed to help apply it to your data to bring it under control. It also looks at how platforms like Hadoop and common data services provide the foundation to manage information across the enterprise

  • A best practice step-by-step methodology structured data governance
  • Why the methodology has to change for semi-structured and unstructured data
  • Technology components in the new world of distributed data
  • Hadoop as a data staging area
  • Why Hadoop is not enough
  • EIM technology platforms e.g. Actian, Global IDs, IBM InfoSphere, Informatica, Oracle, SAP, SAS, Talend
  • Self-service data wrangling tools, e.g. Paxata, Trifacta, Tamr, ClearStory Data
  • Self-service data integration in BI tools
  • Implementation options
    • Centralised, distributed or federated
    • Self-service DI – the need for data governance at the edge
    • EIM on-premise and on the cloud
    • Common Data services for service-oriented data management

EIM Implementation – Data Standardisation & the Business Glossary
This session looks at the need for data standardisation of structured data and of new insights from processing unstructured data. The key to making this happen is to create common data names and definitions for your data to establish a shared business vocabulary (SBV). The SBV should be defined and stored in a business glossary.

  • Semantic data standardisation using a shared business vocabulary
  • SBV vs. taxonomy vs. ontology
  • The role of a SBV in MDM, RDM, SOA, DW and data virtualisation
  • How does an SBV apply to data in a Hadoop data reservoir?
  • Approaches to creating an SBV
  • Business glossary products
    • ASG, Cisco, Collibra, Global IDs, Informatica, IBM InfoSphere Information Governance Catalog, SAP Information Steward Metapedia, SAS Business Data Network
  • Planning for a business glossary Organising data definitions in a business glossary
  • Business involvement in SBV creation
  • Using governance processes in data standardisation

Organising the Data Lake
This session looks at how to organise data to still be able to manage it in a complex data landscape. It looks at zoning, versioning, the need for collaboration between business and IT and the use of an information catalog in managing the data

  • Organising data in a distributed data reservoir
  • Data ingestion zones, data exploration zones, data archive zones, trusted refined data zones
  • New requirements for managing data in a distributed data environment
  • Collaboration,
  • Hadoop as a staging area for enterprise data cleansing and integration
  • Beyond structured data - from business glossary to information catalog
  • Information catalog technologies e.g. Waterline Data, Alation, Informatica ‘Project Sanoma’ Live Data Map, IBM Information Governance Catalog
  • The power of a graph database for storing metadata – dynamic tracking of data and data relationships in real-time
  • The semantic web INSIDE THE ENTERPRISE – dynamics taxonomies of data in a distributed data reservoir

The Data Refinery Process
This session looks at the process of discovering where your data is and how to refine it to get it under control

  • Implementing systematic disparate data and data relationship discovery
  • Data discovery tools Global IDs, IBM InfoSphere Discovery Server, Informatica, Silwood, SAS
  • Automated data mapping
  • Data quality profiling
  • Automated profiling using analytics in data wrangling tools
  • Best practice data quality metrics
  • Key approaches to data integration – data virtualisation, data consolidation and data synchronisation
  • Generating data cleansing and integration services using common metadata
  • Taming the distributed data landscape using enterprise data cleansing and integration
  • Executing data refinery jobs in a distributed data reservoir
  • Introducing publish and subscribe and enterprise data as a service
  • Publishing data and data integration jobs to the information catalog
  • Data provisioning – provisioning consistent information into data warehouses, MDM systems, NoSQL DBMSs and transaction systems
  • Achieving consistent data provisioning through re-usable data services
  • Provisioning consistent refined data using data virtualisation and on-demand information services
  • Smart provisioning and governance using rules-based data services
  • Consistent data management across cloud and on-premise systems
  • Data Entry – implementing an enterprise data quality firewall
    • Data quality at the keyboard
    • Data quality on inbound and outbound messaging
    • Integrating data quality with data warehousing & MDM
    • On-demand and event driven Data Quality Services
  • Monitoring data quality using dashboards
  • Managing data quality on the cloud

Refining Big Data & Data for Data Warehouses
This session looks at how the data refining processes can be applied to managing, governing and provisioning data in a Big Data analytical ecosystem and in traditional data warehouses. How do you deal with very large data volumes and different varieties of data? How does loading data into Hadoop differ from loading data into a data warehouse? What about NoSQL databases? How should low-latency data be handled? Topics that will be covered include:

  • Types of Big Data
  • Connecting to Big Data sources, e.g. web logs, clickstream, sensor data, unstructured and semi-structured content
  • The role of information management in an extended analytical environment
  • Supplying consistent data to multiple analytical platforms
  • Best practices for integrating and governing multi-structured and structured Big data
  • Dealing with data quality in a Big Data environment
  • Loading Big Data – what’s different about loading Hadoop files versus NoSQL and analytical relational databases
  • Data warehouse offload – using Hadoop as a staging area and data refinery
  • Governing data in a Data Science environment
  • Joined up analytical processing from ETL to analytical workflows
  • Data Wrangling tools for Hadoop
  • Mapping discovered data of value into your DW and business vocabulary

Information Audit & Protection – The Forgotton Side of Data Governance
Over recent years we have seen many major brands suffer embarrassing publicity due to data security breaches that have damaged their brand and reduced customer confidence. With data now highly distributed and so many technologies in place that offer audit and security, many organisations end up with a piecemeal approach to information audit and protection. Policies are everywhere with no single view of the policies associated with securing data across the enterprise. The number of administrators involved is often difficult to determine and regulatory compliance is now demanding that data is protected and that organisations can prove this to their auditors.  So how are organisations dealing with this problem?  Are data privacy policies enforced everywhere? How is data access security co-ordinated across portals, processes, applications and data? Is anyone auditing privileged user activity? This session defines this problem, looks at the requirements needed for Enterprise Data Audit and Protection and then looks at what technologies are available to help you integrate this into you EIM strategy

  • What is Data Audit and Security and what is involved in managing it?
  • Status check - Where are we in data audit, access security and protection today?
  • What are the requirements for enterprise data audit, access security and protection?
  • What needs to be considered when dealing with the data audit and security challenge?
  • What about privileged users?
  • Securing and protecting Big data
  • What technologies are available to tackle this problem? – IBM Optim and InfoSphere Guardium, Imperva, EMC RSA, Cloudera, Apache Knox, Hortonworks Ranger
  • How do they integrate with Data Governance programs?
  • How to get started in securing, auditing and protecting you data


This seminar is intended for:

  • Chief Data Officers
  • Data Architects
  • Master Data Management Professionals
  • Big Data Professionals
  • Data Integration Developers
  • Business Data Analysts doing self-service data integration
  • Content Management Professionals
  • Database Administrators
  • Compliance Managers who are responsible for data management (including metadata management, data integration, data quality, master data management and enterprise content management)

This course assumes that you have an understanding of basic data management principles as well as a high level of understanding of the concepts of data migration, data replication, metadata, data warehousing, data modelling, data cleansing, etc.

Speaker Biography

Mike Ferguson

Mike Ferguson is Managing Director of Intelligent Business Strategies Limited.  As an analyst and consultant he specialises in business intelligence and enterprise business integration.  With over 34 years of IT experience, Mike has consulted for dozens of companies on business intelligence strategy, technology selection, enterprise architecture, and data management.  He has spoken at events all over the world and written numerous articles.  Mike is a resident analyst at the Big Data London Meetup – the largest Big Data meet-up in Europe, where he provides presentations articles, blogs and insights on the industry.  Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata DBMS and European Managing Director of Database Associates.  He teaches popular master classes in Operational Business Intelligence, New Technologies in DW and BI for the Agile Enterprise, Big Data Multi-Platform Analytics, Master Data Management and Enterprise Data Governance.

Seminar Fee
£1,245 + VAT (£249) = £1,494

Register On-line:
30-31 March 2017, London

Group Booking Discount

  • 2-3 Delegates - 10%
  • 4-5 Delegates - 20%
  • 6+ Delegates - 25%

Multiple Seminar Booking Discount

Attend more than one of our seminars and you will be entitled to the following discounts:

  • 2nd course 10%
  • 3rd course 15%
  • 4th course 20%
  • 5th+ course 25%

Please note, only one discount can be applied at any one time.

30-31 March 2017
VENUE: etc.venues Marble Arch  
Garfield House,
86 Edgware Rd,
London W2 2EA
Phone: +44 (0) 20 7793 4200

London Accommodation: IRM UK in association with JP Events Ltd has arranged special discounted rates at all venues and at other hotels nearby the venue. Please visit the JP Events website for further information.

Email: jane@jpetem.com Tel +44 (0)84 5680 1138 Fax +44 (0)84 5680 1139.

In-House Training
If you require a quote for running an in-house course, please contact us with the following details:

  • Subject matter and/or speaker required
  • Estimated number of delegates
  • Location (town, country)
  • Number of days required (if different from the public course)
  • Preferred date

Please contact:
Jeanette Hall
E-mail: jeanette.hall@irmuk.co.uk
Telephone: +44 (0)20 8866 8366
Fax: +44 (0) 2036 277202

Speaker: Mike Ferguson
Mike Ferguson

Enterprise Data Management Series
Ten Steps to Data Quality
Incorporating Big Data, Hadoop and NoSQL in BI Systems and Data Warehouses
Managing Your Information Asset
Predictive & Advanced Analytics
Building an Enterprise Data Lake & Data Refinery for Enterprise Data as a Service
Business-Oriented Data Modelling
Advanced Data Modelling: Communication, Consistency, and Complexity
The Logical Data Warehouse - Design, Architecture, and Technology
Information Management Fundamentals
Data Modelling Fundamentals
Data Modelling Masterclass

Multiple Seminar Booking Discount
Attend more than one of our seminars and you will be entitled to the following discounts:

  • 2nd course 10%
  • 3rd course 15%
  • 4th course 20%
  • 5th+ course 25%

Group Booking Discount

  • 2-3 Delegates - 10%
  • 4-5 Delegates - 20%
  • 6+ Delegates - 25%

We regret that this offer cannot be used in conjunction with the Multiple Seminar Discount or any other discount.

IRM UK Conferences

Innovation, Business Change, and Technology Forum Europe 2017
21-22 March 2017, London

2 co-located conferences
Data Governance Conference Europe 2017
MDM Summit Europe 2017
15-18 May 2017, London

Business Analysis Conference Europe 2017
25-27 September 2017, London

2 co-located conferences
Enterprise Architecture Conference Europe 2017
BPM Conference Europe 2017
16-19 October 2017, London

2 co-located conferences
Business Intelligence & Analytics Conference Europe 2017
Enterprise Data Conference Europe 2017
20-23 November 2017, London

Click here to purchase past conference documentation.