Unlocking Foreign Policy Reform Recommendations
Thomas Scherer, Toby Weed, and Marisa Jurczyk | June 21, 2023
There is an abundance of recommendations for how to improve U.S. foreign policy but new reports rarely engage with the recommendations of the past. Which recommendations are new and which have been around for 10, 50, or even 100 years? What recommendations have disappeared from the reports? Is it because they were implemented, determined to be unhelpful, or were they simply forgotten?
As an organization dedicated to improving the processes and institutions of US foreign policy, we are dedicated to answering these questions. To do so, we are organizing this wealth of information using computational social science techniques. This will allow policymakers to quickly find recommendations for solving their problem, identify the evidence underlying each recommendation (or lack thereof), select recommendations that are immediately actionable, and understand how they have evolved over time.
We tested this idea with a pilot dataset of 169 hand-coded recommendations and found that recommendations can be sorted and classified in a way that allows us to study their origins, evolution, and effects.
It is tempting to ignore past efforts and start from scratch, or simply to follow the conventional wisdom. But doing so costs us hard-earned lessons both about what reforms to implement and about the challenges that implementation will raise. We continually neglect insights that could make the United States a far more effective international actor. Instead, we should approach the existing recommendations as a scientist would. We need to develop and test methods to distill the most valuable insights from a messy ecosystem of ideas. Such an approach is not common in the foreign policy community. To face the challenges of the 21st century, we must look to more effective methods.
To Understand Reform, We Need Data on Reform
The first step to bringing foreign policy reform discourse under data-scientific scrutiny is to acquire data. To advance the state-of-the-art—or perhaps to establish one—we are creating a structured dataset of foreign policy recommendations organized by and rated on key factors such as its specificity and evidence base.
We identified 21 reports on foreign policy reform authored since 2015. We selected six to code representing the range of author types—government, think tank, academic, and other research organizations. We read through each of the six reports and identified suggestions for action that the report clearly demarcated as recommendations. We identified 169 recommendations from the reports as shown:
Report Author | Report Name | Year | Recs |
---|---|---|---|
American Diplomacy Project | Blueprints for a More Modern U.S. Diplomatic Service | 2022 | 38 |
Quincy Institute | Responsible Statecraft Requires Remaking America’s Foreign Relations Tool Kit | 202 | 12 |
Council on Foreign Relations | Revitalizing the State Department and American Diplomacy | 2020 | 39 |
American Academy of Diplomacy | Strengthening the Department of State | 2019 | 67 |
Congressional Research Service | U.S. Department of State Personnel: Background and Selected Issues for Congress | 2018 | 9 |
Rand Corporation | Enhancing Next-Generation Diplomacy Through Best Practices in Lessons Learned | 2017 | 4 |
Categories of Reform
To organize the recommendations we sorted them into categories. We created a two-tiered taxonomy of both “general” and “specific” categories. To create general categories, we took fp21’s five pillars of the policy process and added “Organizational Structure” and “Interagency Process” as two other common areas of reform. Within these seven general categories, we created 49 specific topics based on fp21’s previous work and our own understanding of foreign policy recommendations.
For each recommendation, we coded a primary specific topic based on which specific topic best captures this recommendation and which is second best. If no category fit, coders created a category to help us identify gaps in the taxonomy. If multiple categories fit, coders were asked to select the secondary specific topic. All of the 169 recommendations received a primary category, and 50% received a secondary topic.
The results are useful for thinking about common topics. Figure 1 shows how many recommendations were tagged with a topic in each general category; for example, 102 recommendations addressed topics related to the ‘workforce’ category.
A recommendation can be coded as matching multiple specific topics. Figure 2 shows the number of recommendations for the ten most popular specific categories. The ‘Training & Guidance’ category was the most common, describing approximately one-third of the recommendations.
In this effort, we found several recommendations that clearly dealt with topics that were not well represented by any of those initially available in the codebook. Coders added the categories ‘Accountability’, ‘Culture’, ‘Learning’, and ‘Prioritization’ as appropriate secondary topics for these recommendations.
To give a sense of what these topics look like in practice, Table 1 gives examples drawn from three different reports of the three most common specific topics. This example highlights how different reports had different styles and structures for their recommendations. The full list of recommendations is available for review here.
Specific Category | Report | Recommendation Text |
---|---|---|
Training & Guidance | American Academy of Diplomacy | Provide additional leadership and professional training for both Foreign Service and Civil Service DS employees, including targeted training for domestic field offices and managerial/supervisory training for Regional Security Officers (RSOs) and Assistant Regional Security Officers (A/RSOs) given their responsibilities to oversee vast local workforces. |
Training & Guidance | American Academy of Diplomacy | We support the Department’s efforts to build a Civil Service training complement for the first time to increase training and professional development opportunities. This should be accompanied by efforts to build a true Civil Service mobility program to enhance career development opportunities for the Department’s vital Civil Service workforce. |
Training & Guidance | Council on Foreign Relations | [To enhance a State Department that today is fundamentally ill-equipped for this modern digital landscape, DOS should] increase opportunities for FSOs and specialists to do time-limited secondments and apprenticeships with American information and communications technology companies to develop relevant skills and reimagine their work; |
Hiring & Recruiting & Development | American Academy of Diplomacy | Significantly expand recruitment for employees who handle cyber, cloud, and mobility technology responsibilities. |
Hiring & Recruiting & Development | American Academy of Diplomacy | As Phase One argued, State Department pre-seniors (FS-01/ GS-15) could also be assigned to a geographically diverse set of universities or colleges that have historically not had much engagement with the State Department or the Foreign Service for a one-year master’s degree or certificate program. In coordination with the appropriate regional Diplomat in Residence and recruiters for the Diplomatic Reserve Corps (Blueprint #4), the Department employee would be available to meet with students to talk about career opportunities in the State Department and other foreign affairs agencies and conduct other outreach activities. A program like this would contribute to professional development of the State Department employee as well as to the Department’s domestic outreach, presence, and recruitment efforts. |
Hiring & Recruiting & Development | Council on Foreign Relations | [This will mean] increasing recruitment of both native Chinese speakers and students enrolled in top East Asian studies programs throughout the United States; |
Retention/Promotion/Pay | American Academy of Diplomacy | Redesign the performance appraisal system to reward high performance and select out chronic under-performers. |
Retention/Promotion/Pay | American Academy of Diplomacy | As the Department increases substantially its investment in education and training, it is vital also to ensure that the State Department culture more explicitly values such investments both on an individual and institutional basis. In particular, the assignment and promotion process must give more emphasis to training reports from both mandatory courses and voluntary educational assignments, including those that confer an advanced degree in an area relevant to the individual’s area of specialization or career path. (This topic is also addressed in Blueprint #3) |
Retention/Promotion/Pay | Council on Foreign Relations | [The secretary of state should] reach beyond senior DOS leadership to elevate career employees who have modeled leadership in challenging the status quo and accomplished U.S. national security goals in difficult and dangerous environments |
We reviewed our categorizations in several ways to assess their quality. One simple review is whether the most frequent words for each category are appropriate. Table 2 shows the 10 most common words for the three most frequent general categories. There is overlap, such as “Department” appearing on every list, but the unique words for each category are indeed appropriate.
Workforce | Organizational Structure |
Interagency Process + Outside Engagement |
---|---|---|
Train | Foreign | Senate |
Service | Department | Advice |
Foreign | Service | Congress |
Department | U.S. | Process |
State | Diplomat | Security |
Program | Chief | U.S. |
Employee | Fund | Consent |
Position | Office | Department |
Senior | Language | Nominee |
Provide | Develop | Position |
We also reviewed the data using text mining and find that recommendations in the same category are semantically similar (See Appendix). However, we also found that recommendations from similar sources are also similar, suggesting that future analysis should account for each report’s unique style.
Show us the Evidence!
Many well-intentioned ideas for reform fail upon first contact with reality. Which recommendations are supported by evidence that suggests they will work?
fp21 believes that all think tank reports must carefully account for the evidence underlying their recommendations, as in our DEI report. As a resource for future reform efforts, we rated the evidence base of each recommendation.
As shown in Figure 3, a third of recommendations were accompanied by some amount of supporting evidence. Over half the recommendations we were not able to identify any evidence provided by the authors to corroborate the purported benefits of the recommendation. The remaining recommendations did not require evidence, such as Congressional Research Service’s recommendations about Congressional oversight.
Specific Recommendations are More Actionable
Many reforms flounder because the people making the recommendations haven’t spent enough time thinking about implementation. There are always institutional, bureaucratic, and legal barriers, many of which may remain unseen upon cursory examination.
We rated the recommendations on whether they specified what exactly should be done, who should do it, and how they should do it. In many cases, the details were elsewhere in the report. The Congressional Research Service, however, included precise details and models directly in their recommendations. For example, they recommended:
“Congress may choose to pass additional provisions of law to direct the means through which the Department of State implements Impact Initiative modernization projects affecting Department of State personnel. It could model any such provision in part from Section 7081(b)(3) of P.L. 115-141. This measure provides for new conditions the department must consider as it weighs new major information technology investments, which could affect implementation of Impact Initiative’s Modernizing Information Technology and Human Resources Operations focus area.”
Figure 4 shows the frequency of specificity scores for the recommendations, with over half having significant specificity (2) and another third having some specificity (1).
Improving the Pilot
The goal of this project is to provide historical evidence to support the ongoing modernization of the State Department.
This pilot dataset demonstrates the viability of using computational social science techniques to identify, organize, and evaluate foreign policy recommendations. Foreign policy recommendations are easy to identify in most reports. They can then be organized into meaningful categories and evaluated for their specificity and evidence base.
A key task was to identify ways to scale up the coding process and to ensure that the codings are accurate. In this pilot, we explored the possibility of automating the process of identifying and extracting recommendations from reports and classifying those recommendations. In both cases, we determined that such automation is possible but will require a significant effort including a larger set of hand-coded recommendations to serve as a model. Part of this automation will have to include isolating the substance of a recommendation from the style, as we found that recommendations from the same report may appear semantically similar even when they are on different topics.
A key component to accurate codings is to have a codebook that accurately captures the different categories. Our initial codebook was simultaneously overly broad and under-specified as many categories were never used and one category—workforce—captured 60% of recommendations. To improve the codebook for a larger effort we would begin with unsupervised classification, meaning we would analyze an initial sample of recommendations to see how they naturally cluster and then use those clusters to inform our codebook along with our existing knowledge and theories.
Despite these areas for improvement, this pilot still showed the promise of collecting and collating the recommendations from a giant corpus of foreign policy reports into a well-organized dataset. By taming that giant corpus, future research and recommendations may stand on its proverbial shoulders and see further into new insightful territory.
Appendix: Graphical Representations of Text
As another check on whether our categorization is actually reflective of the contents of the recommendation, we compare our categorizations to a graphical representation of the recommendation text. We use a text mining toolset—the R package tm—to parse each recommendation into its component words and weight the words based on how much information they convey (‘the’ provides little, ‘training’ provides more). The result is a long list of values that can be thought of as a point in space where each word is its own dimension. To visualize these points, we squish the system into two dimensions that have little direct meaning but maintain the relative distances between the points. Thus, points that are close have more words in common to each other than they do to points further away.
The points in Figure 5 are color-coded by general category. As we would hope, points in the same category appear to cluster. That clustering is less obvious in Figure 6 which plots the ten most common specific categories, but they are still more grouped than not.
One concern is that recommendations from the same report may have similar substance but also similar style which may be enhancing our clustering. Figure 7 confirms this as recommendations from the same report are clearly grouped.
However, as long as there is still some separation based on substance, there are advanced techniques we can use to isolate and discard the similarities in style. If we just plot the recommendations from the American Diplomacy Project as in Figure 8, we do indeed see that within the same report the different recommendation categories do separate.