Home | Contact Us
   
Spatial Intelligence Identity Management Business Intelligence Solution Real time matching
           

 

Data Quality Consulting

Prior to making a proposal for Data-cleansing and De-Duplication, an in-depth study and/ or pilot may be required to evaluate the inherent data-quality issues, source & destination systems deployed/ to-be-deployed, and the prospective client’s objectives & requirements. The study will provide the prospect as well as Ixsight’s consultants a clear understanding of current data-quality. This will enable more accurate assessment of effort and time required to cleanse the data (or convert it), the resources & expertise needed along with their allocation, and clear up process related issues.
The parameters/rules to be used for the next stage need to be mutually agreed and documented prior to deploying the cleansing tools.

DataFix provides Data Quality Audit Tools and Consultancy services encapsulating the following:

1. Consultancy Overview

The Data Quality Requirement Assessment (DQRA) gives an overview of database compliance with predefined requirements, data integrity requirements, and business logic. On the basis of the data quality audit, one can cleanse data, eliminate the causes for typical errors in databases and take data quality problems into consideration in decision-making. This process would enable quick investigation to determine the health of client’s data and evaluate how well it complies with their developed business rules.

This process will enable

(A)  Identification of Data Quality errors

  • Establishing base-line data quality metrics.
  • Performing in-depth data quality analysis of source systems – with particular emphasis on Name and Address fields.
  • Creating a comprehensive data quality assessment document.

(B) Creation of Valuable Meta Data

  • Creating valuable meta data.
  • Providing complete data cleansing specifications.

(C) Drawing up a Roadmap for Cleansing

  • Creating a roadmap for processes to ensure that clean and reliable data populates data stores and data warehouses.

The report submitted would include the results of performing audit tests, references to error causes, suggestions for data cleansing or minimizing their influence on decision-making, description of the databases, the relationships between database tables (relations) and the graph of database’s logical architecture.
The Data Quality Audit would provide a good overview of the present state of client’s information systems leading to suggestions for information system development that would ensure high quality of data in the future. Also, findings about contradictions in client’s raw data will enable identification of areas in which data-cleansing is needed. These could provide error level estimates or estimates of error levels to be targeted for reduction as a result of the cleansing processes adopted in the execution of this project.

The consultancy assignment would go through the following five stages

  1. Accuracy, certainty (validation results)

    This should provide one or more measures of the accuracy of this data item, i.e., its correspondence to the real-world entity or phenomenon that it is meant to represent, as evaluated by some objective validation effort. This should also include some measure of the certainty of these accuracy measures. There may be multiple measurements of the accuracy of a given data item, corresponding to different validation techniques or distinct validation efforts; each such measurement should be explicitly linked to the particular validation process (whether automated or manual) that produced it.

    For example: The City/Area field indeed may contain City in which case there is accurate information. But how many cities are wrongly entered? This will measure the validity of the information with real world information. Thus accuracy and validity measures need to be undertaken for various elements within the database.
    In a Name field how many different types of information are present apart from the Name itself? Are there specific patterns observed that will allow creation of a new Business Rule to extract such information into a different field?

  2. Consistency (verification results)

    This measures the consistency of this data item with any constraints or other relationships specified for it at the data-element or database levels, as evaluated by some verification effort. There may be multiple measurements of the consistency of a given data item, corresponding to different verification techniques or distinct verification efforts; each such measurement should be explicitly linked to the particular verification process (whether automated or manual) that produced it.

    In the case of Name and address several consistence checks can be applied. For example If the Name is a male name does the Title field show Ms?. If  the product is Gruhalakshmi is the Title Male? Does Mumbai have pincodes which exceed the range of pincodes applicable? 400305 is in correct format but is not within the range of Mumbai Pincodes.

  3.  Currency (outdated information, expiration dates, “degradation modes”, etc.)

    This provides information about the currency of instance data. There are two aspects of currency.

    • Information about when a data value was created and information about how long it can be expected to remain valid. In general, it is not possible to tell how current a data value is simply by knowing how long ago it was created; some old values may remain valid indefinitely, whereas some recent values may become obsolete very quickly. Currency metadata should therefore include timestamps for when a data item was originally generated (if known), when it was first added to this database, and when it was last updated or validated. But it must also include information about when a data item can be expected to become obsolete (i.e,. an “expiration date” or “last valid date”) or when it is expected to be superseded by newer data.
    • Finally, currency metadata for a data item should include information as to the “mode” in which the data item is expected to degrade over time: some values become continuously less accurate or less meaningful as they age, whereas others remain entirely valid until they “expire” (i.e., when some event changes the reality which they represent).

  4. Completeness

    This measures the missing values in the data. Incomplete addresses, missing names, missing date of births will result in an inability to use the data for any serious decision making processes. Correcting such data is not possible unless one does a physical verification either with source records or with the concerned individuals.

  5. Duplication

    This measures whether there are duplicate records in the system, no presence of unique identifiers and identifies existence of replication within and across system.
    It is possible that there are several holders of a policy and these records may be replicated with policy information being repeated for each holder.


 

 
 
|   About UsContact UsJoin UsPressAffiliates