5월, 2019의 게시물 표시

[PyData] Recreating, Understanding, and Visualizing FiveThirtyEight's... - Matthew Brems and Joseph Nelson

이미지
PyData DC 2018 Recreating, Understanding, and Visualizing FiveThirtyEight's Elections Forecast Think about forecasting results based on polls, how to visualize your results, and explain them. Now apply that to a flashy, real world example: the 2018 midterms. The core tutorial covers sampling, simulations, statistics, interactive visualization, and communicating results. Not to fear: this tutorial is ultra "hands-on" with every concept being immediately applied with NumPy, SciPy, and Pandas. Slides - https://github.com/josephofiowa/pydat... Code - https://github.com/josephofiowa/pydat... === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for da

[PyData] Cleaning and Tidying Data in Pandas - Daniel Chen

이미지
PyData DC 2018 Most of your time is going to involve processing/cleaning/munging data. How do you know your data is clean? Sometimes you know what you need beforehand, but other times you don't. We'll cover the basics of looking at your data and getting started with the Pandas Python library, and then focus on how to "tidy" and reshape data. We'll finish with applying customized processing functions on our data. === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

[PyData] ] Data visualization with plotly py: Version 3 and beyond - Jon Mease

이미지
PyData DC 2018 We will discuss and demonstrate some of the exciting features that have been added to version 3 of the plotly Python visualization library. These include ipywidgets integration for building ad-hoc dashboards in the Jupyter Notebook, support for exporting publication quality static images, Latex integration, and more. We will conclude with glimpse into the future direction of the project. Slides - https://drive.google.com/open?id=1V8P... === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Pyt

[todaycode오늘코드] [16/29] Pandas 기초 - 시계열 데이터(Time Series Data) 분석을 위한 판다스 Expanding and Rolling 이해하기

이미지
Pandas 기초 - 시계열 데이터(Time Series Data) 분석을 위한 판다스 Expanding and Rolling 이해하기 * 참고링크 : https://pandas.pydata.org/pandas-docs... https://pandas.pydata.org/pandas-docs... https://pandas.pydata.org/pandas-docs... * 실습코드 : https://github.com/corazzon/cracking-... * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [13/29] Pandas 기초 - 파이썬 판다스로 pd.concat([df1,df2]) 시리즈, 데이터프레임 합치기

이미지
* 실습코드 : https://github.com/corazzon/cracking-... Pandas 기초 - 파이썬 판다스로 pd.concat([df1,df2]) 시리즈, 데이터프레임 합치기 * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [12/29] Pandas 기초 - 파이썬 판다스 melt, pivot 으로 Tidy Data 만들기

이미지
* 실습코드 : https://github.com/corazzon/cracking-... 판다스는 파이썬에서 사용할 수 있는 엑셀과 비슷한 툴입니다. melt, pivot 으로 Tidy Data 만들기와 메소드 체이닝(Method Chaining)을 통해 sort_values, rename, sort_index, reset_index로 데이터 프레임 Reshaping 실습을 해봅니다. * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [11/29] Pandas 기초 - 파이썬 판다스 df.sort_values, rename, sort_index, reset_index로 데이터 프레임 Reshaping 강의

이미지
Pandas 기초 - 파이썬 판다스 df.sort_values, rename, sort_index, reset_index로 데이터 프레임 Reshaping 튜토리얼 강의 * 실습코드 : https://github.com/corazzon/cracking-... df.sort_values, rename, sort_index, reset_index 로 데이터 프레임 Reshaping * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [10] Pandas 기초 - 파이썬 판다스 assign 으로 새로운 컬럼 만들기, qcut으로 binning, bucketing 하기 튜토리얼 강의

이미지
Pandas 기초 - 파이썬 판다스 assign 으로 새로운 컬럼 만들기, qcut으로 binning, bucketing 하기 튜토리얼 강의 * 실습코드 : https://github.com/corazzon/cracking-... df.assign(Area=lambda df: df.Length*df.Height) df['Volume'] = df.Length*df.Height*df.Depth pd.qcut(df.col, n, labels=False) * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [9] Pandas 기초 - 파이썬 Handling Missing Data, fillna, dropna로 결측치 다루기 튜토리얼 강의

이미지
Pandas 기초 - Pandas Handling Missing Data 튜토리얼 강의 * 실습코드 : https://github.com/corazzon/cracking-... * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [8] Pandas 기초 - 파이썬 판다스로 apply 활용하기 lambda 익명함수 사용하기 튜토리얼 강의

이미지
Pandas 기초 - 파이썬 판다스로 apply 활용하기 lambda 익명함수 사용하기 튜토리얼 강의 * 실습코드 : https://github.com/corazzon/cracking-... * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [7] Pandas 기초 - 파이썬 판다스로 기본 통계 하기 value_counts, nunique, sum, count, mean, median 튜토리얼 강의

이미지
Pandas 기초 - 파이썬 판다스로 기본 통계 하기 value_counts, nunique, sum, count, mean, median 튜토리얼 강의 * 실습코드 : https://github.com/corazzon/cracking-... * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [6] Pandas 기초 - 일부 컬럼을 기준으로 데이터 가져오기 Subset Variables (Columns) 튜토리얼 강의

이미지
Pandas 기초 - 일부 컬럼을 기준으로 데이터 가져오기 Subset Variables (Columns) 튜토리얼 강의 * 실습코드 : https://github.com/corazzon/cracking-... * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [5] Pandas 기초 - head, tail로 데이터 미리보기 df.sample(frac=0.5), df.sample(n=10), df.nlargest, df.nsmallest

이미지
Pandas 기초 - head, tail로 데이터 미리보기 df.sample(frac=0.5), df.sample(n=10), df.nlargest, df.nsmallest 튜토리얼 강의 * 실습코드 : https://github.com/corazzon/cracking-... * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu... * 특정 비율과 갯수로 데이터를 샘플링 하기 df.sample(frac=0.5) Randomly select fraction of rows. df.sample(n=10) Randomly select n rows. * 인덱스의 순서로 데이터를 색인해 오기 df.iloc[ 10:20 ] Select rows by position. * 특정 컬럼에서 가장 큰 값과 작은 값 가져오기 df.nlargest(n, 'value') Select and order top n entries. df.nsmallest(n, 'value') Select and order bottom n entries.

[todaycode오늘코드] [4] Pandas 기초 - and, or, not, xor, any, all 연산 이해하기 튜토리얼 강의

이미지
[4] Pandas 기초 - and, or, not, xor, any, all 연산 이해하기 튜토리얼 강의 * 실습코드 : https://github.com/corazzon/cracking-... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu... * https://pandas.pydata.org/Pandas_Chea...

[todaycode오늘코드] [3] Pandas 기초 - 판다스 데이터프레임 비교연산자로 색인하기, drop_duplicates() 튜토리얼 강의

이미지
판다스 데이터프레임 비교연산자로 색인하기, drop_duplicates() * 실습코드 : https://github.com/corazzon/cracking-... * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[todaycode오늘코드] [2] Pandas 기초 - 판다스 데이터프레임 생성하고 데이터 가져오기 튜토리얼 강의

이미지
- 판다스 10분 완성소개, cheat sheet 보고 데이터프레임 만들어보기 * 실습코드 : https://github.com/corazzon/cracking-... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu... * https://pandas.pydata.org/Pandas_Chea...

[todaycode오늘코드] [1] Pandas 기초 - 판다스 10분 완성소개, cheat sheet 보고 데이터프레임 만들어보기 튜토리얼

이미지
- 판다스 10분 완성소개, cheat sheet 보고 데이터프레임 만들어보기 * 실습코드 : https://github.com/corazzon/cracking-... * https://pandas.pydata.org/Pandas_Chea... * 판다스 10분 완성 : https://dataitgirls2.github.io/10minu...

[PyData] Using Sockeye Neural Machine Translation in a Streaming Pipeline - Jeff Zemerick

이미지
PyData DC 2018 The world wide web contains text in many languages and modern systems often cannot be restricted to a single locale. Being able to make use of the text in other languages requires a pipeline that can scale. We'll describe and demonstrate how we can create a streaming pipeline to consume, preprocess, and translate the streaming text. === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanc

[PyData] Web Scraping with Beautiful Soup - Monica Puerto

이미지
PyData DC 2018 We will be using Beautiful Soup to Webscrape the IMDB website and create a function that will allow you to create a dictionary object on specific metadata of the IMDB profile for any IMDB ID you pass through as an argument. Slides: https://www.slideshare.net/PyData/pyd... === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendee

[PyData] Building Data Science - Caitlin Hudon

이미지
PyData Ann Arbor Meetup - May 13, 2019 Sponsored by NumFOCUS, TD Ameritrade, and MIDAS https://www.meetup.com/PyData-Ann-Arbor/ Before AI, before machine learning and pipelines, and before dashboards and BI, an organization starts with a pile of data, some business questions, and a few ideas on how to connect the two -- a greenfield, and an entry point for data science. Answering business questions and turning raw data into insights, models, and products means more than just writing code and doing analysis. A successful data science team needs tools, a communication strategy, thoughtful infrastructure, and a plan to deliver on their goals. This talk will cover how to tackle greenfield data science challenges from the perspective of the first data science hire in an organization, and how to build data science infrastructure from the ground up. —————————————————— Caitlin Hudon is Lead Data Scientist at OnlineMedEd, a startup in the Edtech space in Austin. Her 8+ years of appli

[PyData] Beautiful, Interactive, and Portable Maps using Folium and Live API Data - Ariel M'ndange-Pfupfu

이미지
PyData DC 2018 Leaflet.js, a JavaScript visualization library, makes maps that are as lightweight and flexible as they are interactive and web-friendly. Folium brings that capability to Python, so we can use our favorite tools to ingest, join and process location data from the Census, WMATA, and Walkscore APIs and end up with HTML/JS maps that can be rendered in a notebook or served programmatically via web app. Slides - https://GitHub.com/gotoariel/folium-demo === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limite

[PyData] Conda Forge - Community Driven Packaging That Works for You - Marius van Niekerk

이미지
PyData NYC 2018 Conda-forge is a community-driven, cross-platform effort to package the libraries we all use. Learn how conda-forge works and how you can use it as a resource to package and distribute your code and dependencies! === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use ca

[PyData] A data scientist's guide to production web apps with Flask - Ali Vanderveld

이미지
PyData NYC 2018 This is an overview of what I learned about web apps while building the Trending service for ShopRunner. It will focus on details that are critical to performant, production systems but that are often left out of the online literature aimed at data scientists. This will include dealing with updating data, enabling scrolling feeds, errors and the canonical use of HTTP status codes, and performance. === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences ai

[PyData] Two Years of Bayesian Bandits for E Commerce - Austin Rochford

이미지
PyData NYC 2018 At Monetate, we’ve deployed Bayesian bandits (both noncontextual and contextual) to help our clients optimize their e-commerce sites since early 2016. This talk is an overview of the lessons we’ve learned from both the processes of deploying real-time Bayesian machine learning systems at scale and building a data product on top of these systems that is accessible to non-technical users (marketers) === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences ai

[PyData] Run Numba functions in SQLite: WTF? - Phillip Cloud

이미지
PyData NYC 2018 This talk focuses on an extremely experimental library called slumba that attemps to bring the convenience and speed of numba to user-defined scalar, aggregate and window functions in a SQLite database. === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

[PyData] Using Python in Weather Forecasting - Stephan Siemen

이미지
PyData London Meetup #52 Tuesday, January 8, 2018 Sponsored & Hosted by Man AHL **** www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

[PyData] The Lifecycle of Artificial Intelligence with IBM's Deep Learning as a Service - Justin McCoy

이미지
PyData NYC 2018 Train, evaluate, and deploy deep learning models with cloud compute through API calls. === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

[PyData] The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski

이미지
PyData NYC 2018 TileDB is an open-source storage manager for multi-dimensional sparse and dense array data. It has a novel architecture that addresses some of the pain points in storing array data on “big-data” and “cloud” storage architectures. This talk will highlight TileDB’s design and its ability to integrate with analysis environments relevant to the PyData community such as Python, R, Julia, etc. Slides - https://www.slideshare.net/PyData/the... === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Pyt

[PyData] Crunching Your Data with CatBoost - the New Gradient Boosting Library - Vasily Ershov

이미지
PyData NYC 2018 A comprehensive tutorial on CatBoost ( http://catboost.ai ), a new open-source gradient boosting library, that outperforms existing publicly available implementations of gradient boosting in terms of quality, has the fastest applier algorithm and the fastest GPU training of all publicly available GBDT implementations. === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentati

Measuring Model Fairness - J. Henry Hinnefeld

이미지
PyData NYC 2018 Machine learning models are increasingly used to make decisions that affect people’s lives. With this power comes a responsibility to ensure that model predictions are fair. In this talk I’ll introduce several common model fairness metrics, discuss their tradeoffs, and finally demonstrate their use with a case study analyzing anonymized data from one of Civis Analytics’s client engagements. Slides - http://hinnefe2.github.io/talks/PyDat... === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) P

[PyData] Taking the Pain Out of Data Access - Martin Durant

이미지
PyData NYC 2018 Intake is a simple library providing a single interface for cataloging, describing and reading any kind of data. Catalogs give end-users an easy way to find data, locally, in a cloud service, or on an Intake server. Thus, Intake separates the definition of data sources from their use and analysis, so that Data Engineers and Data Scientists can get on with their respective jobs. Slides - https://docs.google.com/presentation/... === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia