Skip to content Skip to sidebar Skip to footer

How to Read in Data From .dat Pandas

                        In [1]:                        import            pandas            as            pd          
  • Titanic data

    This tutorial uses the Titanic information set, stored as CSV. The information consists of the post-obit information columns:

    • PassengerId: Id of every rider.

    • Survived: This characteristic have value 0 and 1. 0 for non survived and 1 for survived.

    • Pclass: There are 3 classes: Grade i, Grade two and Class 3.

    • Name: Name of passenger.

    • Sex: Gender of passenger.

    • Age: Age of rider.

    • SibSp: Indication that passenger take siblings and spouse.

    • Parch: Whether a rider is lonely or have family.

    • Ticket: Ticket number of passenger.

    • Fare: Indicating the fare.

    • Cabin: The cabin of passenger.

    • Embarked: The embarked category.

    To raw information

How do I read and write tabular data?¶

../../_images/02_io_readwrite.svg
  • I want to analyze the Titanic rider data, available every bit a CSV file.

                                        In [2]:                                    titanic                  =                  pd                  .                  read_csv                  (                  "data/titanic.csv"                  )                

    pandas provides the read_csv() function to read data stored as a csv file into a pandas DataFrame . pandas supports many unlike file formats or data sources out of the box (csv, excel, sql, json, parquet, …), each of them with the prefix read_* .

Brand sure to always take a check on the data after reading in the data. When displaying a DataFrame , the first and last v rows volition be shown by default:

                            In [three]:                            titanic              Out[3]:                                                          PassengerId  Survived  Pclass                                               Proper noun  ...            Ticket     Fare  Motel  Embarked              0              1         0       3                            Braund, Mr. Owen Harris  ...         A/5 21171   seven.2500    NaN         S              i              2         1       1  Cumings, Mrs. John Bradley (Florence Briggs Th...  ...          PC 17599  71.2833    C85         C              2              3         ane       3                             Heikkinen, Miss. Laina  ...  STON/O2. 3101282   7.9250    NaN         S              3              4         1       1       Futrelle, Mrs. Jacques Heath (Lily May Pare)  ...            113803  53.1000   C123         Due south              4              5         0       three                           Allen, Mr. William Henry  ...            373450   8.0500    NaN         S              ..           ...       ...     ...                                                ...  ...               ...      ...    ...       ...              886          887         0       2                              Montvila, Rev. Juozas  ...            211536  13.0000    NaN         S              887          888         1       one                       Graham, Miss. Margaret Edith  ...            112053  30.0000    B42         S              888          889         0       iii           Johnston, Miss. Catherine Helen "Carrie"  ...        W./C. 6607  23.4500    NaN         S              889          890         1       1                              Behr, Mr. Karl Howell  ...            111369  30.0000   C148         C              890          891         0       3                                Dooley, Mr. Patrick  ...            370376   7.7500    NaN         Q              [891 rows x 12 columns]            
  • I want to run across the first 8 rows of a pandas DataFrame.

                                        In [4]:                                    titanic                  .                  head                  (                  8                  )                  Out[4]:                                                                          PassengerId  Survived  Pclass                                               Proper name  ...            Ticket     Fare  Cabin  Embarked                  0            1         0       three                            Braund, Mr. Owen Harris  ...         A/5 21171   seven.2500    NaN         South                  1            2         1       one  Cumings, Mrs. John Bradley (Florence Briggs Th...  ...          PC 17599  71.2833    C85         C                  ii            iii         1       three                             Heikkinen, Miss. Laina  ...  STON/O2. 3101282   vii.9250    NaN         S                  iii            iv         i       one       Futrelle, Mrs. Jacques Heath (Lily May Pare)  ...            113803  53.1000   C123         S                  4            5         0       three                           Allen, Mr. William Henry  ...            373450   8.0500    NaN         South                  5            vi         0       iii                                   Moran, Mr. James  ...            330877   8.4583    NaN         Q                  6            7         0       1                            McCarthy, Mr. Timothy J  ...             17463  51.8625    E46         S                  7            8         0       3                     Palsson, Master. Gosta Leonard  ...            349909  21.0750    NaN         South                  [8 rows x 12 columns]                

    To meet the first Due north rows of a DataFrame , use the head() method with the required number of rows (in this case 8) as argument.

Annotation

Interested in the last North rows instead? pandas also provides a tail() method. For example, titanic.tail(10) will return the concluding x rows of the DataFrame.

A check on how pandas interpreted each of the column data types tin can be washed past requesting the pandas dtypes attribute:

                            In [5]:                            titanic              .              dtypes              Out[5]:                                          PassengerId      int64              Survived         int64              Pclass           int64              Proper noun            object              Sex             object              Age            float64              SibSp            int64              Parch            int64              Ticket          object              Fare           float64              Cabin           object              Embarked        object              dtype: object            

For each of the columns, the used data type is enlisted. The data types in this DataFrame are integers ( int64 ), floats ( float64 ) and strings ( object ).

Note

When asking for the dtypes , no brackets are used! dtypes is an aspect of a DataFrame and Series . Attributes of DataFrame or Series do not need brackets. Attributes represent a characteristic of a DataFrame / Series , whereas a method (which requires brackets) do something with the DataFrame / Series as introduced in the first tutorial.

  • My colleague requested the Titanic information as a spreadsheet.

                                        In [6]:                                    titanic                  .                  to_excel                  (                  "titanic.xlsx"                  ,                  sheet_name                  =                  "passengers"                  ,                  index                  =                  False                  )                

    Whereas read_* functions are used to read data to pandas, the to_* methods are used to store information. The to_excel() method stores the data equally an excel file. In the example here, the sheet_name is named passengers instead of the default Sheet1. By setting index=Imitation the row index labels are non saved in the spreadsheet.

The equivalent read function read_excel() will reload the data to a DataFrame :

                            In [seven]:                            titanic              =              pd              .              read_excel              (              "titanic.xlsx"              ,              sheet_name              =              "passengers"              )            
                            In [8]:                            titanic              .              head              ()              Out[viii]:                                                          PassengerId  Survived  Pclass                                               Name  ...            Ticket     Fare  Motel  Embarked              0            i         0       three                            Braund, Mr. Owen Harris  ...         A/5 21171   7.2500    NaN         Due south              one            2         i       i  Cumings, Mrs. John Bradley (Florence Briggs Thursday...  ...          PC 17599  71.2833    C85         C              2            3         1       iii                             Heikkinen, Miss. Laina  ...  STON/O2. 3101282   7.9250    NaN         Southward              3            iv         ane       1       Futrelle, Mrs. Jacques Heath (Lily May Peel)  ...            113803  53.thousand   C123         South              4            5         0       iii                           Allen, Mr. William Henry  ...            373450   8.0500    NaN         S              [5 rows x 12 columns]            
  • I'thou interested in a technical summary of a DataFrame

                                        In [9]:                                    titanic                  .                  info                  ()                  <form 'pandas.cadre.frame.DataFrame'>                  RangeIndex: 891 entries, 0 to 890                  Data columns (total 12 columns):                                      #   Column       Non-Nix Count  Dtype                                    ---  ------       --------------  -----                                                        0   PassengerId  891 non-goose egg    int64                                                        one   Survived     891 non-cipher    int64                                                        2   Pclass       891 not-naught    int64                                                        3   Name         891 non-null    object                                                        4   Sex          891 non-null    object                                                        5   Historic period          714 non-nix    float64                                      6   SibSp        891 non-null    int64                                                        7   Parch        891 non-null    int64                                                        eight   Ticket       891 non-null    object                                                        9   Fare         891 non-nix    float64                                      10  Motel        204 non-naught    object                                                        11  Embarked     889 non-null    object                                    dtypes: float64(two), int64(5), object(five)                  memory usage: 83.7+ KB                

    The method info() provides technical information about a DataFrame , so let'southward explain the output in more detail:

    • Information technology is indeed a DataFrame .

    • There are 891 entries, i.east. 891 rows.

    • Each row has a row label (aka the index ) with values ranging from 0 to 890.

    • The tabular array has 12 columns. Almost columns accept a value for each of the rows (all 891 values are non-null ). Some columns do accept missing values and less than 891 non-null values.

    • The columns Name , Sex , Cabin and Embarked consists of textual data (strings, aka object ). The other columns are numerical data with some of them whole numbers (aka integer ) and others are real numbers (aka float ).

    • The kind of data (characters, integers,…) in the different columns are summarized by listing the dtypes .

    • The approximate corporeality of RAM used to hold the DataFrame is provided as well.

REMEMBER

  • Getting data in to pandas from many different file formats or data sources is supported by read_* functions.

  • Exporting data out of pandas is provided by different to_* methods.

  • The caput / tail / info methods and the dtypes attribute are user-friendly for a first check.

To user guide

For a complete overview of the input and output possibilities from and to pandas, run across the user guide section about reader and writer functions.

cofieldpreclaid.blogspot.com

Source: https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html

Post a Comment for "How to Read in Data From .dat Pandas"