אני שמח להזמין אתכם לאירוע השני בסדרת אירועי MDLI ops שיעניק דגש לצד הטכני יותר של העבודה היומיומית. אירועים אלו יעניקו במה לחברות ישראליות שפועלות בתחום, מה שיאפשר להן להגיע לקהל רחב יותר של משתמשים. במקביל אירועים אלו יאפשרו לחברי הקהילה לשתף מתודולוגיות עבודה נכונות או כלים אחרים מומלצים שהם חלק משגרת העבודה שלהם. האירוע הראשון כולל 3 הרצאות מגוונות ומעניינות שיעסקו כל אחת בכלי או בעיה נפוצה בעת אימון מודלים. האירוע יתקיים בתאריך ה-09.9.2020 בשעה 18:00 ויועבר בשידור חי לחברי הקבוצה. כדי שאוכל לדעת באיזה כלי לשדר את האירוע, מה שיושפע מכמות הנרשמים, חשוב להירשם בלינק המצורף כדי שאוכל לדעת כיצד להיערך לכך בהתאם.ניתן להירשם בלינק הבא.
1. Title: The ABC’s of Research: Cluster Resource MLOps
The term “MLOps”, coined less than 5 years ago, is still very much a working definition: As machine learning evolves, it moves out of the lab and into real world production environments. And so, just as traditional software-engineering practices needed adaptation to the world of machine and deep learning, the DevOps sphere extends and reforms to include its new clients and handle the appropriate workloads.
Because the ‘production’ aspect of DevOps requires most of the attention, relatively few of the new MLOps tools are targeted at research as well. To name a few – continuous integration (CI) of data, model version control, artifact handling, performance evaluation, etc.
A wise person once said that the ABC of any computational research is: Always Be Computing.
For research, thus, perhaps the single most relevant MLOp is the provisioning of available resources. This operation is required from the instant research “graduates” from the laptop of the researcher and develops the constant hunger for more GPU PODs.
Regretfully, because of the technical aspects involved, the management and allocation of GPUs for the ongoing tasks is a fruitful bed for dispute, waste, mishandling and even – nefarious, almost unspeakable, practices.
In this talk, I aim to provide the listener with basic tools to engage with resource MLOps and avoid the common pitfalls in various stages of the research lifecycle.
We will cover these topics:
- The researcher as a devops – how to manage my own resources
- The researcher vs devops – how to convey your exact needs to devops
- The researcher vs other researchers – how to make sure everyone gets their fair share
- Workload examples will mostly be from hyperparameter tuning
- Cameos from my favorite open-source MLOps tool – Allegro Trains Agent
About the speaker:
Dr. Ariel Biller recently assumed the role of evangelist at Allegro AI. Has more than a decade of experience in resolving resource management disputes in academia. He is passionate about open source tools, research and MLOps firstname.lastname@example.org
2. Effective Data Science Teamwork using Git
Promoting ML projects from “proof of concept” to “ongoing project” status reveals how difficult it is to work on them as a team. Researchers and engineers try various model types, feature engineering, and data sources in parallel. Recording, reviewing, and integrating what everyone did is difficult and error prone. The problems become worse as team size increases and teams become distributed.
The software development community built mountains of tooling to overcome these issues, with Git as the centerpiece. The same principles apply to ML, but the tools and workflows require adjustment.
In this talk, Guy will review the process of working together on data science projects, the existing challenges, and introduce a workflow, based on Git and other open source tools and formats, to address them. He will show how we can leverage versioning to commit reproducible work, create data-science-oriented pull requests to learn what really matters in each contributor’s work, and easily integrate these contributions into the product or other researcher’s ongoing work.
Presenter: Guy Smoilovsky, Co-Founder and CTO at DAGsHub
3.Title: Machine Learning + Big Data + Real Time = the new frontier
Traditional machine learning mostly focuses on prediction and regression. Large scale machine learning applications, however, usually perform ranking, information retrieval, recommendation, or deduplication. These allow companies to improve personalization, semantic search, images retrieval, and many more. Such applications, however, pose new technological, algorithmic, and scientific challenges. Specifically, real time operations of large collections of semantically rich vectors (aka embeddings). This talk will describe some successful applications, the challenges they pose, and how HyperCube's database unblocks them.
About the speaker:
Edo Liberty is the founder of HyperCube, a machine learning database. He was a Director of Research at AWS and the head of Amazon AI Labs, where he led the creation of Amazon SageMaker. Prior to that, he managed Yahoo’s research lab in New York and its scalable Machine Learning Platforms group, and co-founded an automatic video content recognition startup acquired by Vizio. Edo received his Computer Science PhD from Yale University, and later taught data mining and streaming algorithms as an Adjunct Professor at Tel Aviv University. He is the author of more than 75 academic papers and patents on topics such as ML, data mining, streaming algorithms, and optimization.