- AI Blueprints
- Dr. Joshua Eckroth
- 2718字
- 2021-06-11 13:32:50
The AI workflow
Building and deploying AI should follow a workflow that respects the fact that the AI component fits in the larger context of pre-existing processes and use cases. The AI workflow may be characterized as a four step process:
- Characterize the problem, goal, and business case
- Develop a method for solving the problem
- Design a deployment strategy that integrates the AI component into existing workflows
- Design and implement a continuous evaluation methodology
To help you ensure the AI workflow is followed, we offer a checklist of considerations and questions to ask during each step of the workflow.
Characterize the problem
Given the excitement around AI, there is a risk of adding AI technology to a platform just for the sake of not missing out on the next big thing. However, AI technology is usually one of the more complex components of a system, hence the hype surrounding AI and the promise of advanced new capabilities it supposedly brings. Due to its complexity, AI introduces potentially significant technical debt, that is, code complexity that is hard to manage and becomes even harder to eliminate. Often, the code must be written to message inputs to the AI into a form that meets its assumptions and constraints and to fix outputs for the AI's mistakes.
Engineers from Google published an article in 2014 titled Machine Learning: The High-Interest Credit Card of Technical Debt (https://ai.google/research/pubs/pub43146), in which they write:
In this paper, we focus on the system-level interaction between machine learning code and larger systems as an area where hidden technical debt may rapidly accumulate. At a system level, a machine learning model may subtly erode abstraction boundaries. It may be tempting to re-use input signals in ways that create unintended tight coupling of otherwise disjoint systems. Machine learning packages may often be treated as black boxes, resulting in large masses of "glue code" or calibration layers that can lock in assumptions. Changes in the external world may make models or input signals change behavior in unintended ways, ratcheting up maintenance cost and the burden of any debt. Even monitoring that the system as a whole is operating as intended may be difficult without careful design.
Machine Learning: The High-Interest Credit Card of Technical Debt, D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young, presented at the SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop), 2014
They proceed to document several varieties of technical debt that often come with AI and machine learning (ML) technology and suggest mitigations that complement those covered in our AI workflow.
AI should address a business problem that is not solvable by conventional means. The risk of technical debt is too high (higher than many other kinds of software practices) to consider adding AI technology without a clear purpose.
The problem being addressed with AI should be known to be solvable. For example, until recent advances found in Amazon Echo and Google Home, speech recognition in a large and noisy room was not possible. A few years ago, it would have been foolish to attempt to build a product that required this capability.
The AI component should be well-defined and bounded. It should do one or a few tasks, and it should make use of established algorithms, such as those detailed in the following chapters. The AI should not be treated as an amorphous intelligent concierge that solves any problem, specified or unspecified. For example, our chatbot case study in Chapter 7, A Blueprint for Understanding Queries and Generating Responses, is intentionally designed to handle a small subset of possible questions from users. A chatbot that attempts to answer all questions, perhaps with some kind of continuous learning based on the conversations users have with it, is a chatbot that has a high chance of embarrassing its creators, as was the case with Microsoft's Tay chatbot (https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/).
In summary, the AI should solve a business problem, it should use established techniques that are known to be able to solve the problem, and it should have a well-defined and bounded role within the larger system.
Checklist
- The AI solves a clearly stated business problem
- The problem is known to be solvable by AI
- The AI uses established techniques
- The role of the AI within the larger system is clearly defined and bounded
Develop a method
After characterizing the problem to be solved, a method for solving the problem must be found or developed. In most cases, a business should not attempt to engage in a greenfield research project developing a novel way to solve the problem. Such research projects carry significant risk since an effective solution is not guaranteed within a reasonable time. Instead, one should prefer existing techniques.
This book covers several existing and proven techniques for a variety of tasks. Many of these techniques, such as planning engines, natural language part-of-speech tagging, and anomaly detection, are much less interesting to the AI research community than some newer methods, such as convolutional neural networks (CNN). But these older techniques are still quite useful. These techniques have "disappeared in the fabric," to use a phrase Dr. Reid Smith, Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), and I wrote in an article for AI Magazine titled, Building AI Applications: Yesterday, Today, and Tomorrow in 2017 (https://www.aaai.org/ojs/index.php/aimagazine/article/view/2709) (Building AI Applications: Yesterday, Today, and Tomorrow, R. G. Smith and J. Eckroth, AI Magazine, vol. 38, no. 1, pp. 6–22, 2017). What is sometimes called the "AI Effect" is the notion that whatever has become commonplace is no longer AI but rather everyday software engineering (https://en.wikipedia.org/wiki/AI_effect). We should measure an AI technique's maturity by how "boring" it is perceived to be, such as boring, commonplace heuristic search and planning. Chapter 2, A Blueprint for Planning Cloud Infrastructure, solves a real-world problem with this kind of boring but mature AI.
Finally, when developing a method, one should also take care to identify computation and data requirements. Some methods, such as deep learning, require a significant amount of both. In fact, deep learning is virtually impossible without some high-end graphics processing units (GPU) and thousands to millions of examples for training. Often, open source libraries such as CoreNLP
will include highly accurate pre-trained models so the challenge of acquiring sufficient data for training purposes can be avoided. In Chapter 5, A Blueprint for Detecting Your Logo in Social Media, we demonstrate a means of customizing a pre-trained model for a custom use case with what is known as "transfer learning."
Checklist
- The method does not require significant new research
- The method is relatively mature and commonplace
- The necessary hardware resources and training data are available
Design a deployment strategy
Even the most intelligent AI may never be used. It is rare for people to change their habits even if there is an advantage in doing so. Finding a way to integrate a new AI tool into an existing workflow is just as important to the overall AI workflow as making a business case for the AI and developing it. Dr. Smith and I wrote:
Perhaps the most important lesson learned by AI system builders is that success depends on integrating into existing workflows — the human context of actual use. It is rare to replace an existing workflow completely. Thus, the application must play nicely with the other tools that people use. Put another way, ease of use delivered by the human interface is the "license to operate." Unless designers get that part right, people may not ever see the AI power under the hood; they will have already walked away.
Building AI Applications: Yesterday, Today, and Tomorrow, R. G. Smith and J. Eckroth, AI Magazine, vol. 38, no. 1, Page 16, 2017
Numerous examples of bad integrations exist. Consider Microsoft's "Clippy," a cartoon character that attempted to help users write letters and spell check their document. It was eventually removed from Microsoft Office (https://www.theatlantic.com/technology/archive/2015/06/clippy-the-microsoft-office-assistant-is-the-patriarchys-fault/396653/). While its assistance may have been useful, the problem seemed to be that Clippy was socially awkward, in a sense. Clippy asked if the user would like help at nearly all the wrong times:
Clippy suffered the dreaded "optimization for first-time use" problem. That is, the very first time you were composing a letter with Word, you might possibly be grateful for advice about how to use various letter-formatting features. The next billion times you typed "Dear..." and saw Clippy pop up, you wanted to scream.
In a more recent example, most smartphone users do not use Apple Siri or Google Home, especially not in public (What can I help you with?: Infrequent users' experiences of intelligent personal assistants, B. R. Cowan, N. Pantidi, D. Coyle, K. Morrissey, P. Clarke, S. Al-Shehri, D. Earley, and N. Bandeira, presented at the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services, New York, New York, USA, 2017, pp. 43–12). Changing social norms in order to increase adoption of a product is a significant marketing challenge. On the other hand, to "google" something, which clearly involves AI, is a sufficiently entrenched activity that it is defined as a verb in the Oxford English Dictionary ("Google, v.2'" OED Online, January 2018, Oxford University Press, http://www.oed.com/view/Entry/261961?rskey=yiwSeP&result=2&isAdvanced=false). Face recognition and automatic tagging on Facebook have been used by millions of people. And we click product recommendations on Amazon and other storefronts without a second thought. We have many everyday workflows that have evolved to include AI.
As a general rule, it is easier to ask users to make a small change to their habits if the payoff is large; and it is hard or impossible to ask users to make a large change to their habits or workflow if the payoff is small.
In addition to considering the user experience, effective deployment of AI also requires that one considers its placement within a larger system. What kinds of inputs are provided to the AI? Are they always in the right format? Does the AI have assumptions about these inputs that might not be met in extreme circumstances? Likewise, what kinds of outputs does the AI produce? Are these outputs always within established bounds? Is anything automated based on these outputs? Will an email be sent to customers based on the AI's decisions? Will a missile be fired?
As discussed in the preceding section, AI isn't everything, often a significant amount of code must be written around the AI component. The AI probably has strong assumptions about the kind of data it is receiving. For example, CNNs can only work on images of a specific, fixed size – larger or smaller images must be squished or stretched first. Most NLP techniques assume the text is written in a particular language; running part-of-speech tagging with an English model on a French text will produce bogus results.
If the AI gets bad input, or even if the AI gets good input, the results might be bad. What kind of checks are performed on the output to ensure the AI does not make your company look foolish? This question is particularly relevant if the AI's output feeds into an automated process such as sending alerts and emails, adding metadata to photos or posts, or even suggesting products. Most AI will connect to some automated procedure since the value added by AI is usually focused on its ability to automate some task. Ultimately, developers will need to ensure that the AI's outputs are accurate; this is addressed in the final step in the workflow, Design and implement a continuous evaluation. First, however, we provide a checklist for designing a deployment strategy.
Checklist
- Plan a user experience, if the AI is user-facing, that fits into an existing habit or workflow, requiring very little change by the user
- Ensure the AI adds significant value with minimal barriers to adoption
- List the AI's assumptions or requirements about the nature (format, size, characteristics) of its inputs and outputs
- Articulate boundary conditions on the AI's inputs and outputs, and develop a plan to either ignore or correct out-of-bounds and bogus inputs and outputs
- List all the ways the AI's outputs are used to automate some task, and the potential impact bad output may have on that task, on a user's experience, and on the company's reputation
Design and implement a continuous evaluation
The fourth and final stage of the workflow concerns the AI after it is deployed. Presumably, during development, the AI has been trained and tested on a broad range of realistic inputs and shown to perform admirably. And then it is deployed. Why should anything change?
No large software, and certainly no AI system, has ever been tested on all possible inputs. Developing "adversarial" inputs, that is, inputs designed to break an AI system, is an entire subfield of AI with its own researchers and publications (https://en.wikipedia.org/wiki/Adversarial_machine_learning). Adversarial inputs showcase the limits of some of our AI systems and help us build more robust software.
However, even in non-adversarial cases, AI systems can degrade or break in various ways. According to The Guardian, YouTube's recommendation engine, which suggests the next video to watch, has begun showing extremist content next to kid-friendly videos. Advertisers for the benign videos are reasonably upset about unexpected brand associations with such content (fall back on. Rather, the AI trusts the data with absolute assuredness. The data is all the AI knows unless additional checks and balances are added to the code.
The environments in which AI is deployed almost always change. Any AI deployed to humans will be subjected to an environment in constant evolution. The kinds of words used by people leaving product reviews will change over time ("far out," "awesome," "lit," and so on (https://blog.oxforddictionaries.com/2014/05/07/18-awesome-ways-say-awesome/)), as will their syntax (that is, Unicode smilies, emojis, "meme" GIFs). The kinds of photos people take of themselves and each other have changed from portraits, often taken by a bystander, to selfies, thus dramatically altering the perspective and orientation of faces in photos.
Any AI that becomes part of a person's workflow will be manipulated by that person. The user will attempt to understand how the AI behaves and then will gradually adjust the way they use the AI in order to get maximum benefit from it.
Fred Brooks, manager of IBM's System/360 effort and winner of the Turing Award, observed in his book The Mythical Man-Month that systems, just before deployment, exist in a metastable modality – any change to their operating environment or inputs could cause the system to collapse to a less functional state:
"Systems program building is an entropy-decreasing process, hence inherently metastable. Program maintenance is an entropy-increasing process, and even its most skillful execution only delays the subsidence of the system into unfixable obsolescence."
The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition, F.P. Brooks, Jr., Addison Wesley, 2/E. 1995, Page 123
Perhaps there is no escape from the inevitable obsolescence of every system. However, one could presumably delay this fate by continuously monitoring the system after it is deployed and revise or retrain the system in light of new data. This new data can be acquired from the system's actual operating environment rather than its expected operating environment, which is all one knows before it is deployed.
The following checklist may help system builders to design and implement a continuous evaluation methodology.
Checklist
- Define performance metrics. These are often defined during system building and may be reused for continuous evaluation.
- Write scripts that automate system testing according to these metrics. Create "regression" tests to ensure the cases the system solved adequately before are still solved adequately in the future.
- Keep logs of all AI inputs and outputs if the data size is not unbearable or keep aggregate statistics if it is. Define alert conditions to detect degrading performance; for example, to detect whether the AI system is unusually producing the same output repeatedly.
- Consider asking for feedback from users and aggregate this feedback in a place that is often reviewed. Read Chapter 3, A Blueprint for Making Sense of Feedback, for a smart way to handle feedback.