It turned out to be a wonderful Fall Semester at Georgia Tech through their Online Master in Computer Science (OMSCS) program. As of 2015, OMSCS is a unique MOOC (Massive Open Online Courses) partnership with Udacity. The most promising aspects of OMSCS is its accessible, affordable, and challenging curriculum. I personally applied to the program to dive deeper into machine learning, where it was difficult to do it on my own and when there are no additional stakes. Coursera is a great place to start to get a solid data science and analytics foundation, the courses range from free to super affordable.
Originally posted on LinkedIn Pulse on January 4, 2016.
Social Media Week Chicago 2015 in November featured a master class on business intelligence using social analytics. The three speakers on the panel were from start-up fashion publisher (Clique Media Group, Inc. or CMG), digital marketing agency (Tenthwave Digital), and retail pharmacy chain (CVS Health). The panel discussed ways to leverage social media to make business decisions, gain customer insights, and stay competitive. Insights in social media in this class were leveraged from Digimind’s social listening and analytics platforms, which are known as software as a service (SaaS) products. Read the complication of tweets and panel notes in this storify link. I was invited as a panelist to share successes at CMG, my slides can be viewed here.
It’s not news that there has been a nation wide hike in crimes across the United States, including Los Angeles. NPR episode on LAPD. Inequality and crimes against fellow humans are disheartening and could often seem impossible to resolve. Open data provides one resource for viewing impossible problems and collaborating on solutions. In this analysis, LA city domestic violence counts are viewed in time series and compared by areas in the city.
Data engineering is a crucial subset of the data science toolbox because access to data is required for high quality analysis and story telling. A data scientist must understand the tasks and time required for data engineering and be prepared to roll up her sleeves, which may include hammering out low-level scripts or developing company-wide software to create a fully functional data science environment. The data can set up successes or failures at an organization. A data scientist’s work is only as good as her access to data. This posts provides questions to evaluate time requirements for engineering the ideal environment, data access, and resources?
Lawyers and Data Scientists share a similar passion for discovery and uncovering the truth
This analysis is a reaction to Kansas City Star article, “Asian-Americans narrow wealth gap, new studies show,” which oversimplifies income and race trends. It aggregates “Asian-Americans” into a group and tells the story of averages. This is not uncommon in major coverage of demographics and Asian Americans. In demonstrating issues with disaggregation, data from U.S. Census dataset from UCI Machine Learning Library, here and here, are compared with the findings from a St. Louis (STL) Federal Reserve paper on The Demographics of Wealth. Demographic data aggregation tells the wrong story of income and race in the United States. There are cases where metrics should be aggregated but in those cases the advantages must be laid out.
Gone and vanishing are the days that data enthusiasts occupy dark quarters, not seeing the light of day or speak a cryptic language. Data science, the current fancy term for data analytics, mining, and predictive algorithms are giving data nerds more options. But even if data scientists are highly desired, where should they actually work and where can they thrive in the workplace?
Money can’t buy love, but it improves your bargaining position – Christopher Marlowe
The quote from Elizabethan tragedian playwright Christopher Marlowe was probably commenting on the state of love affairs in the 1500s. Money is a consistent alluding factor in politics and will forever complicate accountability. Money, power and love (aka the popularity contest portion) push contenders to the top. Without the right balance of resources and public perceived integrity, no one is winning any races and if you don’t win races, then there are “friends” no longer. Ethics guidelines and reporting for elections and public officials attempt to light up the “influence exchange” (analogy to stock exchange).
As Open Data continues to flow from governmental offices to raise the promise for transparency and engagement, the more the public has to roll up its sleeves to review and evaluate the information. Here, all directions to accountability is paved with good intentions and a significant number of wo(man) hours.What? Do you think insights will be handed to you.
This post adds to the evolution the City of Los Angeles’ Open Data sets and breaths human readability and actionability into the information. My first post on Los Angeles’ Influence Exchange explored the contribution of clients to lobbying firms by industry, where I manually created a mapping from client to industry, read more at The City of Los Angeles Influence Exchange (12/2014). It was not especially surprising to discover in the 12/2014 post that Real Estate clients conducted the most influencing activities with local government leaders. AND that is only on paper. In land use and transportation policy context, this makes sense since the land use (such as zoning, variances, and permits) decision-making is concentrated in local governments.
In this post, I added the locations or project geocodes of where influencing is occurring in the City of Los Angeles. The “CEC City Projects Agencies Lobbied by Registered” provides a well-populated “Location” field, with the local street address of the project or area paid by the client to influence. The Location field was used to pull the latitude and longitude from the Google Geocoding API. Then, I leveraged the folium package (python and leaflet) to map the projects by year in the City of Los Angeles. Below is an interactive map of projects influenced in 2014, labels of the points are project names or when project name was blank, the field was populated by “client last name.” Coming soon, 2013 influenced project data points and link to code.
Notice that project “location” default to Los Angeles City Hall in downtown Los Angeles (zoom into downtown Los Angeles to see for yourself ) and upon zooming out, there is one project location in Florida. It is not uncommon for outside individuals, companies, and organizations to spend money on city lobbying for a specific cause.
From the 2014 projects map, does there appear to be an imbalance of projects by geography? What is happening in your neighborhood? Location is a very important factor in influence since the Real Estate industry is pouring significantly more money into interactions with public leaders. With just a little more dedicated digging, Open Data and visualization can be converted to a cause and statistic/visual for advocacy and help the public and decision-makers pinpoint outliers and patterns more quickly.
Here are next steps for this data analysis, both note to self or to other Open Data wranglers:
- Describe projects by department (this field in the dataset is extra messy, since all relevant city departments lobbied are separated by |), such as count the times each city department is lobbied and then compare counts to department budgets and decisions
- Show the projects being lobbied changing over the year (LA Open Data only provides 2 years of data)
- Map projects by category and influence contribution (this breakdown can lead to interesting metrics, such as proportions such as influence by square mile, entitlements/permits by influence exchanged
Additional considerations for the LA: A Well Run City datasets as it relates to readability and accessibility for the public:
- Create common fields between related data sets so that the public may merge information within the Open Data portal
- Either clean or release a statement about each dataset with data cleansing suggestions, this is the first step anyone needs to when evaluating and reviewing any and all datasets
- Add “cleaned” dimensions and connections in the data that have value for the public
Read the first post in the LA Influence Exchange Series, here.
*Social media* has a many layers and, in the business and data senses, it is growing up nicely. Social sharing platforms provides developer access to their data, such as membership interactions and status updates, which can come as emotional outpourings, diatribes, celebrations, and affirmations. Armed with time (most difficult thing to come by on this list), stack overflow, reference materials, and an open source coding tool — anyone can quickly #oneup your *social media listening* skills. Not a bad skill to flaunt around, since positions managing and creating content on social media are increasing and relevant in every sector and job function. Now, adding the third word – listening – gives social media scouring, participating, and downloading another lift in professionalism.