Tuesday, September 09, 2014

5 Agile Leadership Practices Where CIOs Can Help Data Scientists

I've compiled a number of posts on data landfills and other bad data practices and have made a commitment, at least on Social, Agile, and Transformation to begin providing solutions.

I've always felt that other disciplines would benefit from well established technology practices. Agile practices have enabled software development teams to find sponsors, prioritize work, change the culture, insure work gets done, and market their accomplishments. I think data scientists face similar challenges and should benefit from many agile practices that have helped transform software development organizations.

I say elements, because while software development is often a collaborative practice performed by teams, that isn't always the case in data analytics work. Data scientists may not be in the same organization (team) and are often working individually or in pairs on different analytics. So while many agile practices are relevant to data science work, they have to be adapted to the nature of how this work gets done.

Also, in this post I've started with the leadership practices and might cover management practices in a follow up post. Key agile leadership practices are below:

  • Sponsor work - Data scientists, data geeks, quants, data analysts, bi specialists - all go by different names in different organizations but business leaders don't always know how to best engage their capabilities or services. The CIO can lead the way by sponsoring analytics projects or drawing attention to a team or individual's capabilities. The CIO also has access to the organization and can help network departments that have high value data analytics work and are ready to partner with or hire data scientists. Sponsoring the work begins to establish an "Owner" role, similar to an agile product owner role, that can define a vision, articulate business value, and prioritize work.  

  • Address the culture - Becoming a data driven organization is not just about having data scientists, it requires a commitment top down and bottom up to leverage data in decision making. This is often a culture change that requires leaders to educate the organization and find ways to align on simple practices. One of them, is to educate the organization to ask questions. Agile is not just a process - its a culture change that requires teams and organizations to think agile. 

  • Establish practices for prioritization - Prioritization is a key practice for agile technology teams that have to align their efforts on a product release or development sprint to features and fixes that provide the highest business value. Data scientists face the same challenge in determining what questions to answer or analytics to prioritize. Leveraging agile practices and tools to help make the data scientists' workload transparent and establishing practices to prioritize work is a good place for CIOs to add value. 

  • Review results and ask questions - Agile development teams will demo their work after the sprint and answer questions from sponsors. Data scientists would benefit by adopting a similar practice by schedule analytics reviews where they can showcase a visualization, tell a story, and suggest follow up work. CIOs can help by promoting these sessions, attending, participating, and asking good questions. 

  • Get out of the away - Agile has its self organizing principles, enabling teams to have some authority around how they organize work to get things done. Data scientists also need a little bit of freedom to be who they are - scientists. Sometimes that means blazing a trail in new areas - new technologies, new data sources for example - to determine if they are useful to get a job done. Sometimes that means creating some work arounds, or creating "data processing debt" (more on this in another post, but this is the data analogy to technical debt) in order to get a job done on time.  
While I don't think these are new concepts, I haven't heard too many data scientists and their managers describe their culture or work with these practices. Similarly, while CIOs are more often consumed by Big Data platforms, I rarely hear them talk about aiding data scientists with basic practices. Common ground?
continue reading "5 Agile Leadership Practices Where CIOs Can Help Data Scientists"

Friday, September 05, 2014

Killing Bad Data Practices - Acknowledging The Problem is Half The Battle

I posted on LinkedIn earlier this week The Big Data Challenges All Organizations Face summing some symptoms and solutions to siloed databases and ungoverned data practices. If you work on data, manage the databases, or rely on it to make decisions then you probably understand bad data issues and can relate to some of the symptoms:

Do you email spreadsheets between coworkers to edit and review? Are there only a select few people in the organization capable of pulling or interpreting data out of core systems because of different data quality issues? Do presentations provide insights backed by data sources and assumptions? Does it seem like you have hundreds of reports and dozens of dashboards but none of them suit your day to day needs? Do you start a new analysis by cutting a new data set, or are you able to leverage defined tools to connect to predefined data repositories?
My solutions involve better governed self service BI programs, partnering with departments that have critical data needs like the CMO and marketing, and getting alignment on the data platforms needed for growth.

Need some examples? I'm sure you can site some examples, but these Excel horror stories collected by the European Spreadsheet Risk Interest Group should frighten any data scientist, data architect, or data driven business executive.

My next posts on this subject will be solutions focused. Remember, acknowledging there is a problem and recognizing its impact is half the battle.




continue reading "Killing Bad Data Practices - Acknowledging The Problem is Half The Battle"

Tuesday, August 19, 2014

Why is Data Sooo Messy and How to Avoid Data Landfills

I was surprised this morning to see an article about the "janitorial" work data scientists have to perform to be able to find "nuggets" in big data. Actually, my only surprise is that the story is in the NY Times and that they are covering the least glamorous side of the "sexiest" job.


Why is data so messy?


Let's start with the past. The history of data science starts with complicated data warehouse, expensive BI tools, hundreds if not thousands ETLs moving data all over the place, and bloated expectations. On the other extreme, many organizations have siloed databases, DBAs largely skilled at keeping the lights on (future post?), and spreadsheet jockeys performing analytics. The janitorial work data scientists are performing partially exists because of the mess of databases and derivative data sources previous generations left behind.

And I'm not sure this generation will get it better. As I reported just a couple of months ago, with great power comes even greater responsibility. All the technologies and tools data scientists have at their finger tips also have the power to create a new set of data stashes - informal places where data is aggregated - or buried data mines - places where analytics are performed, but not automated or transparent to future scientists. 

If data scientists, DBAs, and CIOs are not careful the data stashes and buried data mines can slowly transform into full blown data landfills. 

DBAs know what I'm talking about. It's a combination of data warehouses, reports, dashboards, and ETLs that no one wants to touch. No one understands who is using what reports or dashboards in what business process for what purpose or benefit. ETLs look like a maze of buried unlabeled pipes developed using a myriad of materials (programming approaches) and with no standards to help future workers separate out plumbing from filters and valves.

Build Foundations, Not Landfills!


Data scientists and their partners, data stewards, DBAs, business analysts, developers and testers need to instill some discipline - dare I say data governance - and balance their time mining for nuggets with practices that establish data and analytics foundations. For an upcoming post... Remember, big data is a journey.

Until then, here are a few things one can learn about data science from a fourth grade class and think twice about creating another data source!
continue reading "Why is Data Sooo Messy and How to Avoid Data Landfills"

Wednesday, August 06, 2014

Three Advanced Practices for Agile Development Organizations

Sometime ago I saw a question posted on a social media site, "Why is Agile hard to adopt?" My initial gut response was surprise. The basic practices of agile are relatively straightforward, so what's hard? Driving an agile culture, or getting an organization to "operate with agility", yes, that's not trivial and it takes time for teams and individuals to "think agile".  My conclusion is that agile is not hard to adopt, but maturing a disciplined agile practice that leads to an agile culture, operating with agility, or achieving a truly agile organization takes significant disciplines and practice maturity. 

Let me illustrate this with a few examples. Below are three key practice areas that agile organizations need to mature in order to operate with agility.
  1. Getting stories and requirements written and reviewed on time - Teams new to agile will often site that changing stories mid-sprint or having ill defined stories committed to by the team is a key barrier to getting stories done and accepted at the end of a sprint. This is particular detrimental if an organization has multiple agile teams working collaboratively, or if the organization has distributed agile teams with members at different locations and time zones. The simple and obvious answer is to get stories "locked" at the beginning of the sprint, but this is easier said than done. It implies the team has an agile planning practice with a cadence aimed to finish writing stories, complete having acceptance criteria defined, insure the product owner accepts the story, and provide sufficient time for the team to review, ask questions, and size the story. If you're writing stories up to the last possible day before commitment, or worse, committing to placeholder stories that get better defined during the sprint, then shifting to more disciplined agile planning practice takes work.

  2. Insuring QA is part of the team - Agile teams commit together and get things done together with quality best defined by acceptance criteria and organizational standards. But teams saying things like, "The story is done, we just need QA to review it after the sprint" are missing a key discipline in agile software development - quality is the criteria for defining done. It implies that QA members are part of the team reviewing stories, asking questions and sizing them. Why? Because some stories have more significant testing implications than development tasks. It also means that teams have disciplined practices to insure QA members can do their jobs during the sprint such as developing unit tests, finishing stories early in the sprint, insuring frequent code check-ins and pushes to QA environments, and investing in QA automation.

  3. Partnering with the Product Owner - Partnering may be an elusive, difficult to achieve relationship depending on the organizational structure, pressures that the product owner feels on delivery timelines and scope, individual perceptions on business value and having a shared understanding of what is most important for the business. Partnering implies balance, for example, that technology teams can respond to questions on why a story is expensive or why addressing an identified area of technical debt is important and important now. It also implies that the product owner invests time to express her vision, that he can respond to questions on why a particular feature is important and prioritized, that there is an open and healthy dialogue to explore multiple solutions to a prioritized feature so that tradeoffs are considered, and that there is reasonable priority applied to technical and support needs. Does the product owner think markets or need help targeting minimal viable product? If not, agile technology leaders have some teaching and mentoring to consider.
Hope this helps!

continue reading "Three Advanced Practices for Agile Development Organizations"

Tuesday, July 22, 2014

The Best Line of Code is the One You Didn't Have to Write!

This post isn't about code reuse or developing web services. Surely, you and your development organization understand the benefits of developing modular code, packing it in libraries, developing APIs and web services, insuring that test cases are automated, and hopefully starting to enable continuous delivery. Hopefully you have the architectural practices that recognize build once, leverage multiple times is critical to a software development organization's success in scaling to support multiple applications at higher levels of quality.

This is also not a post about code readability. Again, hopefully your organization has some basic methodologies on naming conventions, coding standards, code reviews, tools, and metrics to insure that code developed by one developer can be understood and maintained by other developers and teams.

How PaaS can Accelerate IT at Lower Investments


This post is about PaaS platforms. Platforms as a Service is a cloud computing platform or service. Whereas Infrastructure as a Service will give IT the low level infrastructure such as a Windows or Linux environment, PaaS platforms represent a computing container. Virtually all the major software vendors and cloud providers have PaaS offerings and Gartner has an extensive taxonomy of different service types.

The PaaS services that excite me as CIO enable light weight, ideally code-less applications. These are higher level environments above programmable database, application, or integration computing PaaS. The higher level PaaS platforms are often suited to specific types of applications such as workflow, analytics, data integration or document management. The platforms then provide tools to configure, develop business rules, or customize user interfaces without or with very minimal coding. The PaaS platforms that are most advanced enable a wide array of applications and can be developed with minimal technology skill set. In fact, some of them fit the "self service" category and can be developed by business users with proper training and IT provided governance practices. Mature platforms also include capabilities such as user directory integration, APIs, mobile and tablet views and standard methods to automate data integration. The most promising PaaS platforms demonstrate significant customer adoption, have proven scalability and performance records, and have low costs enabling IT teams to develop products off of them.

No Code = Speed to Market


In my experience, the best of these platforms accelerate time to market significantly as they enable teams to develop applications without a lot of software development (code) and testing. The very best ones are so light weight, they enable teams to be experimental and change implementations with minimal cost to unwind and rebuild.

Too good to be true? It's not, and the benefits are real, but it isn't trivial to achieve. The real issue is selecting the right platforms that offer the most flexibility for the expected needs with minimal functionality constraints and technology implementation complexities. You can't easily evaluate this by listening to sales people, reviewing analyst reports, or even doing some proof of concepts. I might have to develop a sequel to my top ten attributes of agile platforms to help identify strong contenders.

But for now, software developers should think beyond "good code" or even "great architecture" and think more about "smart solutions" that enable more capabilities with little or minimal code.


continue reading "The Best Line of Code is the One You Didn't Have to Write!"

Tuesday, July 15, 2014

Breakthrough Innovation: From Senseless Sensors to the Internet of Things and Everything

It's hard to escape the media hype on the Internet of Things, the Internet of Everything, or the Industrial Internet. Whether its valuation on the size of the industry ($7.1 trillion by 2020 according to IDC), estimates on the number of sensors(50 billion by 2020 according to Cisco), or futurists predicting the disruption and opportunities coming from IoT technologies that connect the physical and digital world, the media will leave you little doubt that this will be the next big, very big technology paradigm.

IoT Today


But today, what you are mostly seeing is more devices, more innovation in application context specific sensors, or very small networks of connected devices. Examples include home automation networks, wearable fitness devices, transit optimization, smart cities addressing growing energy needs, and beacons in retail stores. The costs of sensors have dropped, and the skills to develop the device and applications are more accessible, so entrepreneurs and corporations already in the device or sensor market can experiment and potentially break through with a market leading product.

Some of these devices will succeed, and the scale will create new technology challenges. Will network infrastructure keep up with the added bandwidth requirements for these devices? Will new security and privacy challenges created by these devices get addressed sufficiently before vulnerabilities are exposed?

Internet Enabled "Senseless Sensors" or Greater Intelligence?


Still, I think this is still the world of senseless sensors. The software in most of these devices largely have local context. Wearable gadgets largely benefit a single user and the parent corporation that gets access to the aggregation of all the data collected. Home automation connects devices in a home with no broader context around neighborhood. Beacons will enhance the experience in the store you are visiting and its parent, but do not provide a local context yet.

Now imagine a shopping mall that finds an intelligent way to pool data between retail outlets, aiming to keep shoppers spending time and money in the mall for longer time periods. What happens when cars communicate to neighboring ones to help avoid collisions? What happens when health monitoring sensors can be programmed to share selected data depending on context to family members or physicians?

Now, think what happens when these same sensors also have logic to respond to its environment. Your car alerts you to slow down, or your physician adjusts your medication levels. So the device not only measure and respond to local conditions, the rules implemented also include variables of greater context.

This to me, is the beginning of the promise of the internet of things.

Breakthrough Success Requires Partnership and Standards


Technology companies are now forming coalitions to pave the way to this future. There is the Open Internet Consortium led by Intel, Dell, and Samsung that is focused on sensor interoperability. Then there is the Industrial Internet Consortium led by IBM, AT&T, Cisco, and GE aimed to accelerate the growth of the industrial internet. In digital health there is an emerging battle between Google Fit, Apple HealthKit and the Samsung Digital Health Initiative all aiming to help control or share fitness data or "create a healthier world".

These coalitions and future partnerships will either accelerate IoT breakthough capabilities or create new barriers, or both. Only time will tell if device manufacturers and software developers will have multiple competing standards to contend with, or if these partnerships will establish an IoT data and integration backbone.

continue reading "Breakthrough Innovation: From Senseless Sensors to the Internet of Things and Everything"

Saturday, June 28, 2014

Friend or Foe? How Microsoft Excel 2013 Creates New Data Governance Challenges

How important is data quality to your Big Data strategy? I suspect that once you have business leaders asking good data driven questions and have the beginnings of big data tools implemented, data quality and governance will become a bigger business priority.

Microsoft has the most successful Big Data tool - it's called Excel. Most business users will jump into Excel first to do any kind of data discovery, analytical analysis, or visualization. Along the way, they may blend in multiple data sources, create formulas, cut/paste data into multiple worksheets, format the worksheet for presentation purposes, create pivots, or save the file into a private repository. Many of these steps make auditing the analysis or reusing the results in larger contexts a challenge for business managers, data scientists or technologists.

Recently, I had the opportunity to ask a Microsoft representative some questions on their Microsoft Excel 2013 and Office 365 products to see if their new functionality helps or hinders data quality practices.

Excel 2013 - With great power comes great responsibility


Excel 2013 comes with some new functions that make data scientists more self sufficient. Tools that were formally part of Microsoft Access are now finding their ways into Excel including creating Excel Data models. Pro versions of Excel also include Power Query to easily discover and connect to data from public and corporate data sources. Excel 2013 also combines in Power Pivot to perform analysis and visualizations as well as some new data quality functions. Excel 2013 Power Query also comes with a set of tools to perform some basic data merging and quality steps.

So with all these new functions, what's not to like? So much power in the hands of sophisticated data scientists who are able to do advanced analytics without IT's help.

The issue is, that Excel always made it too easy for business users to create splintered and derivative data sets. Pull data from enterprise databases, integrate other data sources, complete the analysis, and present findings - but no where in this workflow is there an easy way to incorporate the data, formulas, and other rules back into a central repository to be shared with others. This is a big issue for companies that are trying to leverage Big Data to transform their business or to become a more data driven organization. It is why I asked developers to Please Stop Creating Microsoft Access Databases because the work to integrate or cleanup a siloed database can be considerable.

The Unofficial Microsoft Response


So I asked the Microsoft rep for some answers, "How can businesses implement basic governance when the most popular business analytics tool makes it so easy to create derivative data sources?" Here were his answers

  • IT can revere engineer the Excel Worksheets - I stopped him quickly, almost laughing at this answer, because it's completely not realistic. Maybe your business users create simple spreadsheets, but if there are complex formulas, pivot tables, or copy/pasted data then good luck. Also, most organizations have many more spreadsheet jockeys versus DBAs, so it's highly unlikely that IT can keep up.
  • Saved spreadsheets to Office 365 are discoverable - That's a better answer. At least IT knows where files exist and how frequently they are used. In theory, better defined spreadsheets can get reused by others in the organization.
  • It's coming in Reporting Services - Well, I'm not sure what is coming... 

The new capabilities in Excel, if used without some discipline pose new data quality challenges. So why isn't Microsoft doing even more in their tools to insure analytics can be reproducible, audit-able and centralized? My sense is that is that this may not be in Microsoft's best interests. Microsoft has many competitors at the database, BI, Big Data, reporting and visualization technology spaces, and less (no?) competition for analytic tools that compete directly with Excel. So why make it easier to move the analytics outside of this tool?



continue reading "Friend or Foe? How Microsoft Excel 2013 Creates New Data Governance Challenges"

Share