Great Article. I work in a Higher Education industry where we buy the application for our needs and most of the time these applications are delivered with a very complex data model. That being said, the applications that are currently being written are service based and we don’t see a data model there for these services. The applications hosted on the cloud creates additional challenges. This creates a lot of headache for the data teams that are responsible for Datawarehouses and Operational Data Stores. All these applications that are bought and developed are creating data silos that the leaders are not aware of. If we are able to fix the issues at the top in the application chain, the downstream application like Datawarehouses and Operational Data store might not have to go through the headaches.
You aren't alone Ravindra. The biggest problem with SaaS products and how companies use/ buy them is they don't integrate well with one another because a lack of attaching to a common data model/ structure. I've been through a lot of technology application implementations, and unless it is accompanied by a consulting firm or strong internal implementation team, the short-term gain you expect from the applications fall through after 3-6 months and it becomes a burden (or isn't used at all). I'm specifically thinking about this topic from a Data Technology/ SaaS Strategy article that I plan to put out in August 2024
Safe to say, I think this issue will continue to be prevalent, and companies need to understand that when they are buying SaaS applications, they need to push the vendors to show their work when it comes to systems integration and architecture
Great article! I am looking forward to keeping up with these posts. What I have seen recently from clients is they expect to throw the new shiny tool at the problem and their data quality woes will go away. However, as you have aptly pointed out in this article that data quality, failed data initiatives, etc. are a symptom of _many_ problems. From my experience, organizational leaders, for whatever reason don't seem to grasp the importance of stepping back and evaluating the problem.
Thanks Matt! I completely agree with you, and have seen the same thing from clients.
My hypothesis is that given the speed data teams are expected to work at and deliver at, combined with the lack of knowledge that executives/ org leaders have of the data capability, you get a culture of fail fast and throw something else at it. Long-term investment into hard, complex things are not done in data, hence everything you mention
As others have noted here (and also those 'not' actually stating so), good article.
While you are the expert on this topic and I am not - I am an expert in strategy/tactics and AWS Cloud Architect/Engineering (and only recently obtained my GCP Pro & AWS Data Engineer certs) - I just have one li'l item I disagree with, a tad bit.
Data Quality... While I know that no one can obtain 90, 95% or better Data Quality coming in the door so to speak - but - working on getting higher/better Data Quality coming in the door 'can' be an input.
Correct?
To say that good Data Quality is only an output is a bit much. Of course that is just me.
But after seeing good and bad data quality as input, from the Mainframe days of yore (talking about Hollerith punch cards here) to Datawarehouses (Redshift and Big Query) today, I would dispute the statement that Data Quality is only an output.
Having 'some' good data quality input should be a goal that everyone wants (those that care that is) and work on improving that - polishing it up even better for output.
I learned this lesson from having learned to read, write and speak 5 foreign languages along with multiple computer languages:
mainframe assembler (which I still love but do not use today), to cobol, to fortran, pascal, c++ and now SQL, Python and a small bit of R.
I am one of the ilk who tries to pay attention to good data input as well as much better data output...
All in all, I agree with you, Data Quality is a significant part of the whole pie that needs to be seriously looked at - with others and not in silos as @ravindra refers to.
I can tell ya stories about data siloes in the Intelligence Community as well - big time. Just cannot give any details... ;-)
Great Article. I work in a Higher Education industry where we buy the application for our needs and most of the time these applications are delivered with a very complex data model. That being said, the applications that are currently being written are service based and we don’t see a data model there for these services. The applications hosted on the cloud creates additional challenges. This creates a lot of headache for the data teams that are responsible for Datawarehouses and Operational Data Stores. All these applications that are bought and developed are creating data silos that the leaders are not aware of. If we are able to fix the issues at the top in the application chain, the downstream application like Datawarehouses and Operational Data store might not have to go through the headaches.
You aren't alone Ravindra. The biggest problem with SaaS products and how companies use/ buy them is they don't integrate well with one another because a lack of attaching to a common data model/ structure. I've been through a lot of technology application implementations, and unless it is accompanied by a consulting firm or strong internal implementation team, the short-term gain you expect from the applications fall through after 3-6 months and it becomes a burden (or isn't used at all). I'm specifically thinking about this topic from a Data Technology/ SaaS Strategy article that I plan to put out in August 2024
Safe to say, I think this issue will continue to be prevalent, and companies need to understand that when they are buying SaaS applications, they need to push the vendors to show their work when it comes to systems integration and architecture
Great article! I am looking forward to keeping up with these posts. What I have seen recently from clients is they expect to throw the new shiny tool at the problem and their data quality woes will go away. However, as you have aptly pointed out in this article that data quality, failed data initiatives, etc. are a symptom of _many_ problems. From my experience, organizational leaders, for whatever reason don't seem to grasp the importance of stepping back and evaluating the problem.
Thanks Matt! I completely agree with you, and have seen the same thing from clients.
My hypothesis is that given the speed data teams are expected to work at and deliver at, combined with the lack of knowledge that executives/ org leaders have of the data capability, you get a culture of fail fast and throw something else at it. Long-term investment into hard, complex things are not done in data, hence everything you mention
Hello Dylan.
As others have noted here (and also those 'not' actually stating so), good article.
While you are the expert on this topic and I am not - I am an expert in strategy/tactics and AWS Cloud Architect/Engineering (and only recently obtained my GCP Pro & AWS Data Engineer certs) - I just have one li'l item I disagree with, a tad bit.
Data Quality... While I know that no one can obtain 90, 95% or better Data Quality coming in the door so to speak - but - working on getting higher/better Data Quality coming in the door 'can' be an input.
Correct?
To say that good Data Quality is only an output is a bit much. Of course that is just me.
But after seeing good and bad data quality as input, from the Mainframe days of yore (talking about Hollerith punch cards here) to Datawarehouses (Redshift and Big Query) today, I would dispute the statement that Data Quality is only an output.
Having 'some' good data quality input should be a goal that everyone wants (those that care that is) and work on improving that - polishing it up even better for output.
I learned this lesson from having learned to read, write and speak 5 foreign languages along with multiple computer languages:
mainframe assembler (which I still love but do not use today), to cobol, to fortran, pascal, c++ and now SQL, Python and a small bit of R.
I am one of the ilk who tries to pay attention to good data input as well as much better data output...
All in all, I agree with you, Data Quality is a significant part of the whole pie that needs to be seriously looked at - with others and not in silos as @ravindra refers to.
I can tell ya stories about data siloes in the Intelligence Community as well - big time. Just cannot give any details... ;-)