Microsoft’s 1.3 billion model surpasses the Lama 2

Rate this post

Microsoft Research has done it once again. After beating Meta’s LLaMA with phi-1 in July, the researchers have now introduced phi-1.5, a sophisticated language model with 1.3 billion parameters that outperforms Llama 2’s 7-billion parameter model on several benchmarks. Microsoft has decided to open source this model.

The phi-1.5 model, which contains as many as 1.3 billion parameters, has been carefully designed to excel in many domains, making it a suitable choice for a wide range of applications. It particularly shines when handling questions in a question-and-answer (QA) format, as well as chat interactions and code-related tasks.

Click here to view the open source model on Hugging Face.

phi-1 was trained on high-quality textbook data, while phi-1.5 is trained only on synthetic data. What sets phi-1.5 apart is its comprehensive training methodology, which includes a rich tapestry of data sources. The model’s learning journey is drawn from various data pools, including Python code snippets extracted from StackOverflow, code from competitive programming competitions, synthetic Python textbooks, and exercises generated by the powerful GPT-3.5-Turbo-0301.

Click here to read the paper: All you need are textbooks II: phi-1.5 Technical Report

Key Specifications of the phi-1.5 model:

  • Architecture: A transformer-based model with a focus on next-word prediction objectives
  • Dataset Size: Trained on a huge pool of 30 billion tokens
  • Training Tokens: The model demonstrated its skills on as many as 150 billion tokens
  • Accuracy: Uses fp16 precision standard
  • GPUs: Consumes the power of 32xA100-40G GPUs
  • Training Time: Achieved his remarkable abilities through 8 days of intensive training

The Microsoft Research Team, the brain power behind phi-1.5, asserts that the model has reached near-state-of-the-art performance levels in models with fewer than 10 billion parameters. Rigorous benchmark tests assessing general knowledge, language comprehension and logical reasoning have positioned the phi-1.5 as a strong contender.

Notably, phi-1.5 outperforms Meta’s Llama-2 7b in AGIEval scores and reaches parity with llama-2 7b in GPT4ALL’s benchmark suite as measured by the LM-Eval Harness.

Leave a Comment